A lot of business owners have a Cloudflare account they barely think about. They set it up years ago for HTTPS and DDoS protection, and have not opened the dashboard since.
But in 2024 Cloudflare added a feature called AI Scrapers and Crawlers, with a single toggle that promises to "block bots that scrape your content to train AI models." It sounds sensible. Many people clicked it without thinking. Some never clicked it but had it enabled by default in newer setups.
If you are one of them, your business is probably invisible to ChatGPT, Claude, and Perplexity right now. And you would never know unless you went looking.
What the toggle actually does
When the AI scrapers setting is on, Cloudflare does two things at the edge, before any request reaches your server:
- It serves a managed
robots.txtthat disallows a curated list of AI user agents. - It blocks those bots at the firewall layer, even if they ignore robots.txt.
Cloudflare keeps the list updated. It currently includes GPTBot (OpenAI's training crawler), ClaudeBot (Anthropic), PerplexityBot, CCBot (Common Crawl, which is used to train most large language models), Bytespider (TikTok), and a growing number of others. The list does not include Googlebot or bingbot, because Cloudflare is careful not to break traditional search.
So if you have this on, here is what is happening:
- Google still crawls you. Bing still crawls you. SEO still works.
- ChatGPT, Claude, and Perplexity get a 403 when they try to read your site.
- When a customer asks one of those AI tools "who does IT consulting in Adelaide" or "best accountants for tradies", you cannot be the answer. The AI cannot quote a site it cannot read.
Why this matters now and not three years ago
Three years ago, the people asking those questions were typing them into Google. Now a meaningful chunk of them ask ChatGPT first. The shift is gradual but real, and it is happening fastest in younger demographics and in B2B research.
The crawlers Cloudflare blocks are the ones that read your site so that the AI can quote you in those answers. Block them, and you are invisible exactly where new customers are starting to look.
It gets worse. Some businesses have the setting on without realising. New Cloudflare zones in 2024 and 2025 sometimes had it enabled as part of a default security profile. If you never explicitly turned it off, it might be on right now.
The opt-in trap
If you want to be in AI answers, the easy fix is to turn the setting off. But this is where it gets nuanced.
OpenAI runs two crawlers, not one. GPTBot scrapes your site so OpenAI can train new models on your content. OAI-SearchBot scrapes your site so ChatGPT can cite you live when someone asks a question. They are operationally identical from your server's point of view, but they serve different purposes for you. Most businesses want the live citation. They are less sure about the training scrape.
Cloudflare's toggle does not separate the two. It is on or off for the whole AI category. If you want to allow OAI-SearchBot but block GPTBot, you have to turn the managed setting off and write your own robots.txt or your own WAF rules. Same story for ClaudeBot, which Anthropic uses for both purposes.
This is a real decision, not a tech-support task. You are deciding what your content is for: marketing visibility, training data, or both. There is no objectively right answer, but there is definitely a wrong one, which is leaving the default in place because nobody looked.
What to do
Three steps, in order:
- Check the setting. In Cloudflare, go to Security, then Bots. Look for "AI Scrapers and Crawlers" or check Super Bot Fight Mode. Note whether it is on.
- Run your site through our GEO Checker. It scans seven major AI crawlers and tells you which ones can actually reach your site, including any that Cloudflare is silently blocking.
- Make a deliberate choice. Turn it off if you want full AI visibility. Leave it on if you genuinely do not want your content training the next generation of AI models, and accept that the cost is invisibility in AI answers. Or get specific about which bots you allow, which takes more work but gives you the cleanest outcome.
The worst position is the one most businesses are in right now: a default they never set, costing them visibility they never measured.
If you want a hand working through it, get in touch. We will read your robots.txt, look at your Cloudflare configuration, and tell you exactly which AI surfaces can see you today.