I blocked GPTBot to keep AI out. Am I invisible in ChatGPT now?

Not from search. GPTBot is OpenAI's training crawler only — blocking it opts you out of model training but does not remove you from ChatGPT's web search or its citations. Those are controlled by a separate crawler, OAI-SearchBot. If you want to be recommended in ChatGPT search, keep OAI-SearchBot (and Bingbot) allowed; block GPTBot only if you specifically want to opt out of training. Confirm what your site allows at /tools/robots-checker/.

Does robots.txt actually stop AI engines from using my content?

Well-behaved crawlers honour it, but it's a request, not a wall. Perplexity has said it may still show a blocked page's domain, headline, and a brief summary, and in August 2025 Cloudflare reported observing Perplexity fetch content via an undeclared, browser-like user agent after its declared bots were blocked. For a local business the usual goal isn't to block AI anyway — it's to make sure you haven't accidentally blocked the citation crawlers you want reading you.

The One robots.txt Line That Can Hide You From ChatGPT

In 2023, a wave of advice told website owners to "protect your content from AI" by blocking the bots in robots.txt. A lot of people did it. The problem is that most of them blocked the wrong crawler — and some of them, trying to keep AI out, accidentally locked themselves out of being recommended. This is the most consequential robots.txt mistake of the AI era, and it hinges on a distinction almost nobody explains.

Two bots, two completely different jobs

Every major AI provider runs more than one crawler, and they do different things. Lump them together and the advice falls apart.

One crawler exists to train the model. Whether you allow or block it changes whether your content feeds future training. It has nothing to do with whether you appear in answers today.
A different crawler exists to power live search and citations. This is the one that decides whether the engine can read your page and name you in a response.

For OpenAI, the training crawler is GPTBot and the citation crawler is OAI-SearchBot. Here's the part that catches people: blocking GPTBot does not remove you from ChatGPT's search results. ChatGPT can still cite you. The only thing that removes you from ChatGPT search is blocking OAI-SearchBot. OpenAI says this plainly: the settings are independent, and you can allow OAI-SearchBot to appear in search while disallowing GPTBot to opt out of training.

So all those sites that "blocked AI" by disallowing GPTBot gave up exactly nothing on the citation side — they only opted out of training. And the sites that went further and blocked OAI-SearchBot, or the regular Googlebot, made themselves invisible in AI answers while believing they'd done something clever.

The same pattern repeats across providers

Once you see the split, every engine reads the same way:

Anthropic: ClaudeBot trains the model; Claude-SearchBot powers Claude's search citations. Block the first, keep the second, and Claude can still cite you.
Google: Google-Extended only governs whether you train Gemini/Vertex. It does not control AI Overviews. Those run on ordinary Googlebot — so the way to disappear from Google's AI answers is to block Googlebot, which also tanks your normal search. Blocking Google-Extended costs you nothing in AI Overviews.
Microsoft: one crawler, Bingbot, feeds the Bing index — which powers both Copilot and ChatGPT's web search. Block Bingbot and you lose multiple AI surfaces at once.

The takeaway: the bots that matter for being recommended are the search/citation crawlers and the classic search crawlers — OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot. Those are the ones to keep open.

The Perplexity wrinkle

Worth knowing if you're deciding what to block: blocking a crawler doesn't always equal total disappearance. Perplexity has said that even when you disallow PerplexityBot, it may still surface a blocked page's domain, headline, and a brief factual summary. And in August 2025, Cloudflare reported that after some sites disallowed Perplexity's declared bots and added firewall blocks, it observed fetches from an undeclared user agent that looked like an ordinary browser. (Cloudflare noted that ChatGPT's crawler, by contrast, fetched robots.txt and stopped when disallowed.) The honest summary: robots.txt is a request that well-behaved crawlers honour, not a wall.

Note

The goal for a local business almost never is to block AI. It's the opposite — you want to be read and recommended. The job is to make sure you haven't accidentally slammed a door you meant to leave open.

Where the accidental blocks come from

If you've never touched your robots.txt, you can still be blocked, because the file often isn't yours to control directly:

SEO plugins generate it. On WordPress, Yoast, Rank Math, and All in One SEO usually generate robots.txt dynamically — so a hand-edited file can be silently overwritten by the plugin's settings.
Security plugins block bots. Wordfence, Solid Security, and Sucuri ship "bad bot" rules that, in early 2024, categorised AI crawlers as scrapers and rate-limited or 403-blocked them. That block may not even appear in your robots.txt.
Managed hosts inject blocks. Some managed-WordPress platforms add AI-bot blocks at the server level, invisible from your site's own files.

This is why a site can look perfectly open in its robots.txt and still be unreachable to AI crawlers — the block lives a layer up.

What to actually do

Check which crawlers can reach you. Run your domain through the Robots Check. It reports the citation-grade crawlers specifically and shows the exact line behind any block.
Make sure the citation bots are open — OAI-SearchBot, Claude-SearchBot, PerplexityBot, Googlebot, Bingbot. If you want to opt out of training while staying citable, disallow only GPTBot, ClaudeBot, and Google-Extended.
If the file looks clean but you're still blocked, check your security plugin and ask your host whether it blocks AI bots at the platform level.
Re-test, then verify in the wild. Give crawlers a couple of days, then ask ChatGPT and Perplexity about your brand and category.

If you want the deeper picture of how each engine actually selects who to cite, the per-engine breakdowns — like how ChatGPT cites local businesses — walk through it. But start with the file. It's the cheapest fix in AI visibility, and the one most likely to be quietly costing you. hello@rankinglocal.ai reaches me directly.

Two bots, two completely different jobs

The same pattern repeats across providers

The Perplexity wrinkle

Where the accidental blocks come from

What to actually do

Frequently asked questions

I blocked GPTBot to keep AI out. Am I invisible in ChatGPT now?

Does robots.txt actually stop AI engines from using my content?

Related reading