Can The New York Times Prevail in Lawsuit Against OpenAI?

2024-01-02

The New York Times is intensifying its copyright lawsuit against OpenAI. Since the case was announced, The New York Times has accumulated a lot of evidence against OpenAI. This could be the first copyright infringement case that holds up in court. However, the outcome is still pending.


The lawsuit was filed on Wednesday, December 27th, in the Manhattan Federal Court, accusing OpenAI of using millions of copyrighted articles without permission to train the ChatGPT model.





The New York Times claims that this practice not only infringes on their copyright, but also positions these artificial intelligence models as direct competitors to their news products. While these allegations were difficult to prove in the past, the plaintiffs have provided shocking evidence.


Can The New York Times succeed?


The Times previously reached a settlement with Microsoft and OpenAI in April, hoping to achieve a "friendly" resolution, establish a business agreement, and add technical protection measures around their website. However, no conclusion was reached.


The New York Times changed its policy before announcing its intention to sue OpenAI in August. They updated their terms of service on August 3rd, limiting the use of their content for AI training.


This includes all forms of content and explicitly prohibits the use of automated data collection tools without written permission. Additionally, OpenAI and Microsoft have introduced similar measures for their services, allowing websites to block web scraping of their data. This reflects a broader industry trend of controlling the use of web-sourced data for AI development.


This case has stronger evidence


One major challenge in filing a copyright lawsuit against AI is that the plaintiffs cannot prove that the AI model actually used their data. This creates a legal gray area where it is unclear whether the AI's production of content similar to copyrighted works is infringement or mere coincidence.


In this case, The New York Times has made four allegations against OpenAI: the training dataset, encoding copies of articles in its large language model (LLM) parameters, output of memorized articles for queries, and outputting articles using a browsing plugin.


Considering the evidence and the identity of the plaintiffs, this case may not only have different outcomes compared to previous similar cases, but also set an important precedent for future legal matters in this field. LLM does not generate content by retrieving stored data, but rather imitates patterns from a large training corpus, a process known as "approximate retrieval." This means that the output is not a direct copy, but closely reflects the style and structure of the text in the training set.


This is interesting because the lawsuit does not provide information on what prompts were used to obtain this output. And as one user pointed out, reproducing similar responses in court would be challenging.


Under this background, a more collaborative solution may emerge, similar to the partnerships OpenAI has previously formed with other companies.


Such collaborations could involve licensing agreements where content creators are rewarded for their contributions to AI training datasets. This approach not only resolves copyright issues but also fosters a win-win relationship where both parties benefit - content creators receive compensation and recognition, while AI developers obtain high-quality and legitimate datasets.


The outcome of The New York Times' lawsuit against OpenAI could impact the future relationship between AI companies and content creators. This case will either lead to more legal disputes or pave the way for partnership, greatly influencing the development of both industries. Regardless, it holds significant implications for the future of content creation, especially in the news industry, and deserves close attention.