‹ Blog

Robots Check: Find the Crawler Block Hiding You From AI

The robots.txt you set up years ago may be quietly blocking the exact crawlers that decide whether AI engines can cite you. Here's how to check in under a minute.

Most AI visibility problems are subtle. This one isn't. If the wrong line is sitting in your robots.txt, an AI engine literally cannot read your pages — and no amount of schema, reviews, or great content will get you cited. The free Robots Check finds that line in about 30 seconds. Here's what it looks at and, more importantly, how to read the result correctly, because the popular advice on this is mostly wrong.

What the tool does

Paste your domain into the Robots Check and it fetches your robots.txt directly from your server — no login, no account, no email. It parses the file and reports a simple allowed/blocked status for each AI crawler that matters, and it shows you the exact rule causing any block so you can fix it.

If you don't have a robots.txt at all, the tool tells you that too. A missing file returns "allow all" — that's fine for AI visibility. The dangerous case is a file that blocks the wrong things.

The distinction almost every guide gets wrong

Here's the part worth slowing down for, because getting it wrong wastes effort. Most AI providers run two different kinds of crawler, and only one of them controls whether you can be cited.

Concretely:

So if someone "protected their content from AI" a couple of years ago by blocking GPTBot, they may have given up nothing on the citation side — or, if they blocked OAI-SearchBot or Googlebot, they may have made themselves invisible without realising it. The tool checks the citation-grade crawlers specifically, which is the set that actually changes whether you get recommended.

The "Disallow: /" disaster

The single most damaging pattern is also the simplest. Somewhere, sometime, a developer wrote this:

User-agent: *
Disallow: /

That tells every crawler on earth to stay out — Google included. If you ever find this on a live site and wonder why organic traffic is zero, now you know. The AI-era cousin is blocking specific bots "just in case," usually added in 2023 when "block the AI scrapers" was trending advice:

User-agent: GPTBot
Disallow: /

User-agent: OAI-SearchBot
Disallow: /

That second block is the one that quietly costs you ChatGPT citations. The first only costs you training inclusion.

A robots.txt that allows the right crawlers

Here's a clean starting point that blocks genuine junk while allowing the citation-grade AI crawlers through:

User-agent: *
Disallow: /wp-admin/
Disallow: /cgi-bin/
Allow: /wp-admin/admin-ajax.php

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

A note on precedence, since people get tangled here: a crawler obeys the single most specific user-agent group that matches it, not the union of plus its own. So if you want OAI-SearchBot allowed, give it its own group — it won't inherit rules from the block. Replace yoursite.com with your domain and drop the file at /robots.txt.

The WordPress trap

If you're on WordPress, your robots.txt is often generated dynamically by your SEO plugin (Yoast, Rank Math, All in One SEO), so a file you edited by hand can get overwritten. And security plugins — Wordfence, Solid Security, Sucuri — apply bad-bot rules that can rate-limit or outright 403 AI crawlers because those bots were filed under "scrapers" in early 2024. Some managed hosts inject AI-bot blocks at the platform level that don't even appear in your site's robots.txt. If the Robots Check says you're blocked but your file looks clean, the block is probably coming from one of these layers.

Fix yours in ten minutes

  1. Run your domain through the Robots Check and note which crawlers are blocked.
  2. Open your current robots.txt at yoursite.com/robots.txt.
  3. Remove any Disallow: / lines under the citation crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot) and confirm Googlebot and Bingbot aren't blocked.
  4. If you're on WordPress, make the change in your SEO plugin's robots.txt editor, and check whether a security plugin is throttling bots.
  5. Re-run the check. Give crawlers a couple of days to re-fetch, then test your brand name in ChatGPT and Perplexity.

It's the cheapest AI visibility win there is. Most sites need zero changes; some need one line removed; a few have been invisible for months and never knew. Run it, and if the result looks strange, hello@rankinglocal.ai reaches me directly.

Frequently asked questions

Does blocking GPTBot stop ChatGPT from recommending me?

No — that's the most common misconception. GPTBot is OpenAI's training crawler; it has nothing to do with citations. The crawler that controls whether ChatGPT can surface and cite you in search is OAI-SearchBot. To appear in ChatGPT's web search you need to allow OAI-SearchBot (and Bingbot, since ChatGPT search also draws on the Bing index). Blocking GPTBot only opts you out of training. Run /tools/robots-checker/ to see which crawlers your site actually blocks.

Which AI crawlers does the Robots Check tool look for?

The citation-grade crawlers that decide whether you can be cited: OAI-SearchBot and ChatGPT-User (OpenAI), Claude-SearchBot (Anthropic), PerplexityBot (Perplexity), plus Googlebot (which powers Google AI Overviews) and Bingbot (which powers Copilot and ChatGPT's web search via the Bing index). It also flags broad blocks like 'User-agent: * / Disallow: /' and shows you the exact line causing any block.

Related reading