WordPress and Tumblr Plan to Sell User Content to AI Firms

2024-02-28

According to reports, Automattic, the parent company of WordPress and Tumblr, is in negotiations with AI companies such as MidJourney and OpenAI to use the content on its platforms for training purposes. While the details of this deal are still unclear, Automattic is trying to assure users that they can choose to opt out at any time. It has been reported that there is internal conflict within Automattic because some of the content being scraped for AI companies includes private content that the company does not intend to keep. To further complicate matters, some advertisements that do not even belong to Automattic, including ads from previous Apple Music campaigns, have also made their way into the training dataset. Automattic's plans have sparked significant controversy internally, to the extent that a product manager even removed their own photos from Tumblr to ensure they would not be used for AI training. Since OpenAI first introduced ChatGPT at the end of 2022, generative AI has become a major business, with several companies subsequently launching text prompt image creators. The technology works by "training" on a large amount of data to generate seemingly original videos, images, or text. However, major publishers have complained about this, with some even filing lawsuits, claiming that much of the data used to train these systems is either pirated or does not comply with the "fair use" under existing copyright laws. Automattic plans to introduce a new setting as early as this Wednesday, allowing users to choose not to participate in AI system training. However, it is unclear whether this setting will be enabled or disabled by default for most users. Squarespace, a competitor of WordPress, also introduced a similar setting last year, allowing users to choose not to allow their data to be used for AI training. "AI is rapidly changing almost every aspect of our world, including how we create and consume content. At Automattic, we have always believed in a free, open web and individual choice. Like other tech companies, we closely monitor these advancements, including how to collaborate with AI companies in a way that respects user preferences," the blog post stated. However, this lengthy statement sounds very defensive, stating that "there is no legal requirement that crawlers must follow these preferences" and implying that the company is simply following industry best practices by providing users with the option to decide whether they want their content to be used for AI training. "Regardless of location, we want to provide you with as many control tools as possible. Since reputable companies do follow these settings, they are the best way to enforce how web content is crawled," Automattic's statement said. "Our partnerships will respect all opt-out settings. We also plan to go further by regularly updating partners with new opt-out choices and requesting them to remove their content from past sources and future training."