"Abundance of AI-Generated Content in Google Search Results"

2024-04-08

Do you use Google Books to search for books on certain topics? Or use Google Scholar to delve into academic literature? Here's something you should know: these websites that allow users to "search the world's most comprehensive full-text book index" - as well as search for academic literature across any discipline - have started indexing low-quality, AI-generated books that appear to be written by real human authors.

This disturbing trend was first discovered by 404 Media, which used a simple trick to track AI-generated books. If you query ChatGPT about current events, you often see phrases like "based on my last knowledge update." This is just OpenAI telling you that the chatbot has limitations on which information it can access.

If you search for "based on my last knowledge update" in Google Books, you will come across books that clearly contain content generated word for word by ChatGPT. Searching for this phrase yields page after page of titles. Some books are about ChatGPT and include the phrase to discuss the limitations of the chatbot, but there are dozens of other books that attempt to pass off AI-generated works as the work of human authors.

For example, a book about the Boston Marathon bombing uses the phrase "based on my last knowledge update in September 2021, the case is still undergoing legal proceedings, and the final outcome is uncertain." The "author" of the book also has 50 other works, including books on the Cold War, 9/11, Founding Fathers of the United States, Ancient Rome, famous boxers, and Native Americans.

These titles were all published in 2023 and range from 50 to 100 pages in length. When browsing through these books, I found that each book only provides shallow narratives, at best resembling Wikipedia articles, and at worst, appearing to be ChatGPT spewing out facts. A quick online search also revealed that these books are available for sale on Amazon and other retailers.

When I input the same phrase into Google Scholar, which is supposed to be a repository of human research, it returned 19 pages of results, including papers on at-risk youth, diabetes, autism, COVID-19, and airline pilot fatigue.

The dissemination of AI-generated content on the internet is nothing new. However, what is concerning is that AI-generated content is appearing alongside human-authored works in reliable resources like Google Books and Google Scholar.

In a conversation with 404 Media, Google stated that it will "continue to evaluate our approach as the publishing landscape evolves," but did not mention removing these results from these two services.