When I started tracking how often AI engines mentioned Yellow Pencil, my renovation company in Markham, I ran into a problem. ChatGPT would cite us for one query and completely ignore us on the next. Perplexity would name three competitors and skip us entirely. Gemini would hallucinate our phone number.
I needed a scoreboard. Not a vanity metric, but something that told me which lever to pull on any given Tuesday morning. That is why we built the GEO Score around four dimensions. Each one isolates a different failure mode. Each one has a number I can move this week.
Here is what each dimension measures, what a good score looks like, and the concrete checks we run to get there.
Dimension 1: Entity Readiness
Entity Readiness answers one question. Does the AI know you exist as a specific, disambiguated business?
When I type "Yellow Pencil Markham" into Perplexity, does it return a paragraph that correctly identifies us as a kitchen renovation company, or does it confuse us with a pencil manufacturer in Ohio? Entity Readiness is the difference.
We check this by running 8 to 12 entity probes against each engine. Probes are direct name queries, brand-plus-service queries ("Yellow Pencil kitchen remodel"), and brand-plus-location queries. We score each response for three things: did the engine respond at all, did it classify the business correctly, and did it get the core facts right (service, city, approximate size).
A score of 85 on Entity Readiness means every major engine correctly describes what you do and where you do it. A score of 45 usually means one engine nailed it, one got confused, and two gave you a generic answer that could apply to any contractor.
Under 40 on Entity Readiness is where most service businesses live when they first sign up. The fix is almost never "more content." It is schema, a proper About page, consistent NAP across 5 to 7 structured directories, and an entity summary written so an LLM can parse it in one pass.
Dimension 2: Answer Coverage
Entity Readiness asks "do they know you?" Answer Coverage asks "do they pick you?"
Here is the concrete version. When a homeowner asks ChatGPT "best kitchen contractor in Markham" or "who does custom cabinets near Thornhill," does any engine name your business in the response?
We build a query set for every account. For Yellow Pencil that is about 40 queries: "kitchen renovation Markham," "basement finishing Richmond Hill," "Markham contractor with design services," and so on. We run each query against 4 engines every week and record three outcomes.
- Named in the answer with a link.
- Named in the answer without a link.
- Not mentioned.
Answer Coverage is the percentage of queries where you are named at all, weighted by how often real people search for that phrase. A 70+ score means you appear in most non-brand queries in your service area. A 40 to 70 score means you show up for the obvious queries ("kitchen renovation [your city]") but get skipped on the comparative ones ("best", "top rated", "affordable").
The query where you are not mentioned is more useful than the one where you are. It tells you exactly which content gap to fill next.
Under 40 usually means you have 2 or 3 pages that rank for the easy branded terms and nothing else. The path up is boring and effective: one FAQ page per service, one comparison page per city, one cost guide with real numbers.
Dimension 3: Evidence Strength
AI engines do not just read your site. They read what other people say about you.
Evidence Strength measures third-party signal. Reviews on Google and Houzz. Press mentions in local papers. Forum posts where someone actually answered "who should I call for a kitchen in Markham" and named you. Citations in industry directories that are not pay-to-play.
We count four things. Recency of reviews (anything older than 18 months gets weighted down). Volume relative to your market (20 reviews is strong for a solo contractor, weak for a 12-person shop). Diversity of sources (10 Google reviews and nothing else is a fragile position). And sentiment signal that an LLM can parse, which usually means reviews that describe the actual work, not "five stars, great job."
Yellow Pencil sits around 78 on Evidence Strength. We have 140 Google reviews averaging 4.9, a Houzz profile with 34 project photos and 22 reviews, two features in local magazines, and one podcast appearance. That took three years to accumulate. There is no shortcut here, but there is a tracking advantage: most of our competitors score between 30 and 55 because they stopped asking for reviews in 2022.
A 70+ Evidence Strength score is the single best predictor of whether an engine will cite you in a comparison query. Under 40, you are invisible to the half of the ranking signal that you do not control directly.
Dimension 4: Crawlability
This is the least glamorous dimension and the one that quietly breaks the other three.
Crawlability measures whether AI crawlers can actually read your site. We run 18 checks. Is GPTBot blocked in robots.txt (we see this weekly, usually from a Wordfence default). Does your key service page return a 200 on first request, or does it lazy-load content behind JavaScript that headless crawlers skip. Is your schema valid or does it throw errors. Does your sitemap include the pages you actually want cited. Is your Core Web Vitals slow enough that crawlers time out on mobile.
I audited one client last month who had a 92 on Evidence Strength, solid reviews, legitimate press, real authority. Crawlability score was 28. Their hero section was a React component that rendered client-side, and every AI crawler was reading a blank page. Fix took 40 minutes. Their Answer Coverage doubled in 9 days.
Crawlability is the floor. You can have a perfect score on every other dimension and still get zero citations if bots cannot parse your pages.
What the composite score actually tells you
The overall GEO Score is a weighted blend, but the individual dimensions are where the work lives. When I open a new account dashboard, I look at the lowest dimension first. That is almost always the one holding the other three back.
70 and up on any dimension is strong. You are defending your position, not building it. 40 to 70 is mixed, which means you have some fixes that will move the number noticeably in 2 to 4 weeks. Under 40 is invisible, and that is not a judgment, it is a starting point. Most of my own sites started there.
If you want to see your own four numbers before you do anything else, run the free check at /free-tools/ai-visibility/. It takes a domain and gives you a dimension-by-dimension breakdown in about 90 seconds. If you want the weekly tracking, the query coverage reports, and the Flare advisor walking you through fixes in priority order, /pricing/ has the plans.
Questions on any of this, or want me to look at a specific score? hello@rankinglocal.ai is read by me directly.