In 2023, Meta faced a copyright infringement lawsuit, and a newly disclosed document on Wednesday, January 8th, revealed additional shocking allegations that could further embarrass the company and its CEO, Mark Zuckerberg. The plaintiff group claimed that Meta used pirated content, including copyrighted books and articles, without permission to train its Llama AI model, possibly with Mark Zuckerberg's knowledge.
The dispute began in 2023 when a group of authors sued the social media giant for using their copyrighted books and articles without consent or approval to train its large language model, Llama. Notable authors and journalists such as Ta-Nehisi Coates and comedian-actress Sarah Silverman were among the plaintiffs.
However, in November 2023, U.S. District Judge Vince Chhabria of the Northern District of California dismissed the AI copyright lawsuit against Meta. The court ruled that while the text generated by Meta's chatbot infringed on authors' copyrights, Meta's claim regarding illegal removal of book copyright management information (CMI) was invalid.
The matter did not end there. Recently, the authors filed an updated complaint with the U.S. District Court for the Northern District of California. In the new filing, they alleged that internal documents provided by Meta during discovery showed the company was aware that the content used for AI training was pirated. Furthermore, they suggested new evidence indicating that Meta utilized a dataset called LibGen, known to contain millions of pirated works. More shockingly, they accused Meta of distributing this dataset via peer-to-peer seeding, a method allowing direct file sharing between users without a central server.
The plaintiffs now cite internal communications from Meta, claiming that Mark Zuckerberg was fully aware of the situation and approved the use of the LibGen dataset despite knowing it contained pirated content. LibGen, primarily offering access to copyrighted materials from major publishers like Macmillan Learning, McGraw Hill, and Cengage Learning, has long been embroiled in copyright infringement lawsuits and fines.
The new filing also highlighted another significant allegation: Meta may have attempted to cover up its alleged infringement by removing credits or attribution information from the LibGen data used.
Although Meta has consistently denied illegally using content to train its language models, the company has yet to issue any official statement regarding the latest submission by the authors. Recently, Meta and its subsidiaries Facebook, Instagram, WhatsApp, and Threads have encountered issues due to changes in their content moderation policies. Under the latest policy updates, the company led by Mark Zuckerberg decided to "eliminate fact-checkers" and replace them with community notes similar to those on X.