Cloudflare launches AI-powered bot protection without code

2024-07-04

Cloudflare has officially launched an innovative no-code feature designed to effectively combat illegal web content scraping by AI developers. This feature has been seamlessly integrated into Cloudflare's iconic Content Delivery Network (CDN) service, which is widely used by numerous websites worldwide to significantly improve user access speed. Cloudflare announced that both free and paid users can enjoy this powerful scraping protection feature. In the current AI era, many cutting-edge AI companies rely on massive content on the public network to train their large language models. Although companies like OpenAI and Google have provided website operators with the option to opt out of scraping, not all large language model (LLM) developers are so considerate. This is precisely the intention behind Cloudflare's launch of the scraping protection tool. The core of this feature lies in the use of advanced AI technology to accurately identify and intercept automated content extraction behavior. Cloudflare states that its system can intelligently identify robots that attempt to scrape data for LLM training projects, even if they disguise themselves as normal browsers. Cloudflare engineers revealed in a recent blog post, "We have noticed that some robot operators try to disguise themselves as real users by forging user agent information. After long-term monitoring, we are proud to announce that our global machine learning model can always accurately capture such disguises." It is worth mentioning that Cloudflare has successfully identified and blocked a robot specifically designed to scrape content for Perplexity AI, a well-funded search engine newcomer. According to Wired magazine's report last month, this robot caused great trouble for website operators by cleverly disguising its requests to appear no different from those of ordinary users. Cloudflare assigns a score from 1 to 99 to each website visit it handles, with a lower score indicating a higher likelihood that the request is made by a robot. It is reported that the requests made by the robot serving Perplexity AI consistently scored below 30. "When criminals attempt to scrape website information on a large scale, they often use a series of tools and technical frameworks that we can quickly identify," further explained the Cloudflare engineer. "For each 'fingerprint' we capture, we evaluate its credibility using Cloudflare's massive network, which processes over 57 million requests per second." To cope with the increasingly complex and evolving AI scraping robot technology and emerging crawlers, Cloudflare promises to continuously upgrade this feature. As part of the upgrade plan, the company will also launch a convenient tool that allows website operators to provide timely feedback on new types of robots they may encounter, in order to jointly maintain the healthy development of the internet ecosystem.