Cloudflare's New Tool: Free Defense Against AI Content Scraping

2024-07-08

This company, which claims to proxy about 20% of internet traffic, has launched a new tool that can block all AI robots from crawling website text. Cloudflare says the tool is available to all customers, including those on the free plan.


With the rise of generative AI, companies need content to train chatbots. Many companies are using web crawlers to scrape text from websites for analysis (such as ChatGPT scraping your Reddit posts). Some companies openly and honestly use web crawler bots, but there are also those that do not.


Last September, Cloudflare introduced a feature that allows users to block "malicious" AI web crawlers, those that scrape website content without permission. Of course, some companies have found ways to bypass this restriction by disguising their crawlers as legitimate ones. That's why this new tool will block all AI crawlers, including those that follow proper crawling protocols.


According to Cloudflare, as of June 2024, AI robots accessed about 39% of the top one million "internet properties" protected by Cloudflare, but less than 3% of these properties had measures in place to block AI robots. The four largest crawlers on Cloudflare's website in terms of scraping volume are Bytespider, Amazonbot, ClaudeBot, and GPTBot.


If you are a Cloudflare user, using this tool is simple. Just click on "Security" and "Bots" in your settings. From there, you will see a toggle button labeled "AI Scraping Tools and Crawlers." Turn it on, and AI robots will no longer be able to access your content.