‹ Blog

The One robots.txt Line That Can Hide You From ChatGPT

Most of the advice about blocking AI crawlers is aimed at the wrong bot. Get the distinction wrong and you can vanish from ChatGPT search without ever knowing it.

In 2023, a wave of advice told website owners to "protect your content from AI" by blocking the bots in robots.txt. A lot of people did it. The problem is that most of them blocked the wrong crawler — and some of them, trying to keep AI out, accidentally locked themselves out of being recommended. This is the most consequential robots.txt mistake of the AI era, and it hinges on a distinction almost nobody explains.

Two bots, two completely different jobs

Every major AI provider runs more than one crawler, and they do different things. Lump them together and the advice falls apart.

For OpenAI, the training crawler is GPTBot and the citation crawler is OAI-SearchBot. Here's the part that catches people: blocking GPTBot does not remove you from ChatGPT's search results. ChatGPT can still cite you. The only thing that removes you from ChatGPT search is blocking OAI-SearchBot. OpenAI says this plainly: the settings are independent, and you can allow OAI-SearchBot to appear in search while disallowing GPTBot to opt out of training.

So all those sites that "blocked AI" by disallowing GPTBot gave up exactly nothing on the citation side — they only opted out of training. And the sites that went further and blocked OAI-SearchBot, or the regular Googlebot, made themselves invisible in AI answers while believing they'd done something clever.

The same pattern repeats across providers

Once you see the split, every engine reads the same way:

The takeaway: the bots that matter for being recommended are the search/citation crawlers and the classic search crawlers — OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot. Those are the ones to keep open.

The Perplexity wrinkle

Worth knowing if you're deciding what to block: blocking a crawler doesn't always equal total disappearance. Perplexity has said that even when you disallow PerplexityBot, it may still surface a blocked page's domain, headline, and a brief factual summary. And in August 2025, Cloudflare reported that after some sites disallowed Perplexity's declared bots and added firewall blocks, it observed fetches from an undeclared user agent that looked like an ordinary browser. (Cloudflare noted that ChatGPT's crawler, by contrast, fetched robots.txt and stopped when disallowed.) The honest summary: robots.txt is a request that well-behaved crawlers honour, not a wall.

Note

The goal for a local business almost never is to block AI. It's the opposite — you want to be read and recommended. The job is to make sure you haven't accidentally slammed a door you meant to leave open.

Where the accidental blocks come from

If you've never touched your robots.txt, you can still be blocked, because the file often isn't yours to control directly:

This is why a site can look perfectly open in its robots.txt and still be unreachable to AI crawlers — the block lives a layer up.

What to actually do

  1. Check which crawlers can reach you. Run your domain through the Robots Check. It reports the citation-grade crawlers specifically and shows the exact line behind any block.
  2. Make sure the citation bots are openOAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot. If you want to opt out of training while staying citable, disallow only GPTBot, ClaudeBot, and Google-Extended.
  3. If the file looks clean but you're still blocked, check your security plugin and ask your host whether it blocks AI bots at the platform level.
  4. Re-test, then verify in the wild. Give crawlers a couple of days, then ask ChatGPT and Perplexity about your brand and category.

If you want the deeper picture of how each engine actually selects who to cite, the per-engine breakdowns — like how ChatGPT cites local businesses — walk through it. But start with the file. It's the cheapest fix in AI visibility, and the one most likely to be quietly costing you. hello@rankinglocal.ai reaches me directly.

Frequently asked questions

I blocked GPTBot to keep AI out. Am I invisible in ChatGPT now?

Not from search. GPTBot is OpenAI's training crawler only — blocking it opts you out of model training but does not remove you from ChatGPT's web search or its citations. Those are controlled by a separate crawler, OAI-SearchBot. If you want to be recommended in ChatGPT search, keep OAI-SearchBot (and Bingbot) allowed; block GPTBot only if you specifically want to opt out of training. Confirm what your site allows at /tools/robots-checker/.

Does robots.txt actually stop AI engines from using my content?

Well-behaved crawlers honour it, but it's a request, not a wall. Perplexity has said it may still show a blocked page's domain, headline, and a brief summary, and in August 2025 Cloudflare reported observing Perplexity fetch content via an undeclared, browser-like user agent after its declared bots were blocked. For a local business the usual goal isn't to block AI anyway — it's to make sure you haven't accidentally blocked the citation crawlers you want reading you.

Related reading