‹ Blog

Robots Check: The 30-Second Test That Saves AI Visibility

The robots.txt file you set up in 2019 is probably hiding your site from every AI search engine on earth. Here's how to check in under a minute.

Last month a client called me sounding annoyed. She'd paid another agency $8,400 for SEO work over 18 months. Her Google rankings were fine. But when she asked ChatGPT, Claude, and Perplexity for "best commercial roofers in Hamilton," she never came up. Not once. Her three direct competitors did.

I pulled up her robots.txt in 12 seconds. Right there at the top: User-agent: * followed by Disallow: /wp-admin/ and a handful of other rules. Looked fine. Then I scrolled down. Some well-meaning developer had added User-agent: GPTBot with Disallow: / six months earlier, probably after reading a 2023 article about "protecting your content from AI scraping."

She'd been invisible to ChatGPT for half a year and nobody noticed. That's the whole problem. robots.txt is a file almost nobody looks at after it's written, and AI crawlers arrived faster than most agencies updated their playbooks. So I built a free tool to check yours in 30 seconds. Let me walk you through what it does and what it finds.

What the Robots Check tool actually does

Paste your domain into the checker at /free-tools/robots-check/ and it fetches your robots.txt file directly from your server. No login, no account, no email gate. It then parses the file and runs 6 specific tests, one for each of the major AI crawlers that matter for search visibility in 2026.

The tool reports a simple allowed/blocked status for each bot. It also shows you the exact rule that's causing a block so you can fix it. Average response time is around 2 seconds on a normal connection, and the whole test takes about 30 seconds start to finish including me reading the results.

If your robots.txt doesn't exist, the tool tells you that too. A missing robots.txt is actually fine for AI visibility, though it's not ideal for other reasons. The catastrophic scenario is having one that accidentally blocks everything.

The 6 AI user-agents that matter

These are the crawlers the tool checks for, with the exact user-agent strings they send when they hit your server:

If even one of these is blocked, you're losing visibility in a surface your competitors are probably already showing up in. I checked 40 small business sites in Hamilton last quarter and 14 of them were blocking at least one AI crawler by accident. That's 35%.

Why CCBot is the quiet killer

People always ask me which bot matters most. The honest answer is CCBot, and most site owners have never heard of it. Common Crawl is a non-profit that crawls the web and publishes the dataset for anyone to use. Both GPT and Claude were trained partially on Common Crawl data, and many smaller AI tools still use it as their primary source.

If you block CCBot, you're not just blocking Common Crawl. You're potentially removing yourself from the training data of dozens of AI systems you've never heard of. When those systems eventually get asked about your industry, your competitors get named and you don't.

Blocking CCBot won't hurt you in ChatGPT's live search results, because GPTBot handles that. But it absolutely affects what the model "knows" about you when someone asks a general question without triggering a live search.

The "Disallow: /" disaster

Here's the single most common mistake I see. A developer, often years ago, writes this:

User-agent: *
Disallow: /

That one rule tells every crawler on earth to stay out. Every bot. Google reads this too, by the way, so if you ever see this on a live site and wonder why you have zero organic traffic, now you know. A close cousin is blocking specific AI bots "just in case":

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

I see this one constantly. Usually it was added in 2023 when "block the AI scrapers" was trending advice. In 2026, with AI search driving an estimated 15-20% of research queries, it's self-sabotage.

Note

If you sell anything to anyone who uses ChatGPT, Claude, or Perplexity to research options, blocking those bots is the digital equivalent of putting a "Closed" sign on the door while staying open. Your building is there. Nobody can see in.

What a correct robots.txt looks like

Here's the file I drop into most client sites. It blocks the junk you genuinely don't want crawled, allows the 6 AI crawlers that matter, and keeps everything else standard:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cgi-bin/
Allow: /wp-admin/admin-ajax.php

User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: bytespider
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Replace yoursite.com with your actual domain. Drop this file at /robots.txt on your server. If you're on WordPress, a plugin like Yoast or Rank Math can manage it for you, but check what it actually outputs because defaults change.

How to fix yours in the next 10 minutes

  1. Run your domain through /free-tools/robots-check/ and screenshot the result. That's your before.
  2. If anything is blocked, pull up your current robots.txt. It lives at yoursite.com/robots.txt.
  3. Edit the file, either through your hosting panel, FTP, or your SEO plugin. Remove any Disallow: / lines under the AI bot user-agents.
  4. Upload it, then re-run the check. You should see all 6 crawlers showing allowed.
  5. Give it 48-72 hours for crawlers to re-fetch, then test your brand name in ChatGPT and Perplexity.

That's the whole playbook. Five steps, about 10 minutes of actual work, and it's the cheapest AI visibility win I know of. Most sites I check need zero fixes. Some need one line removed. A few need a full rewrite.

Try it right now

Go to /free-tools/robots-check/ and paste your domain. No signup, no payment, no catch. If anything is blocked and you want a second opinion, or if you run the check and something looks weird, email me. hello@rankinglocal.ai is read by me directly.