In the past year, the number of claimed open generative AI systems has increased dramatically, but how open are they really? New research suggests that companies like Meta and Google engage in "openwashing" practices: they claim to be open but evade actual scrutiny.
In the context of the EU's Artificial Intelligence Act, the question of what constitutes openness in generative AI has become particularly important, as the Act regulates "open-source" models and calls for actual openness assessments.
Almost all major tech companies claim to offer "open" models, but in reality, they rarely do. Researchers Andreas Lisonfeld and Mark Dingemans from Radboud University's Language Research Center conducted an investigation into 45 self-proclaimed open text and text-to-image models, providing us with a clear understanding of the claimed openness in current generative AI.
Their research was recently published at the ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT 2024) and featured in a Nature News Briefing.
Evasion of Scrutiny
Researchers found that companies like Meta, Microsoft, and Mistral often cleverly use terms like "open" and "open-source" while effectively shielding their models from scientific and regulatory scrutiny. These companies frequently use terms like "open" and "open-source" for marketing purposes, but provide little meaningful information about source code, training data, fine-tuning data, or system architecture.
Building on previous work, the researchers tested over 45 models, including text-to-image generators. They found that openness is unevenly distributed and often exaggerated. Conversely, they found that smaller players like AllenAI (owner of OLMo) and BigScience Workshop + HuggingFace (owner of BloomZ) make more effort to document their systems and make them open for scrutiny.
EU's Artificial Intelligence Act
The recently introduced EU's Artificial Intelligence Act provides special exemptions for "open-source" models but does not provide a clear definition of the term "open-source." This incentivizes the phenomenon of "openwashing": if a model is perceived as open, model providers face fewer stringent requirements and less public and scientific scrutiny. Lisonfeld notes, "This makes it even more important for us to have a clear understanding of what constitutes openness in the field of generative AI. We do not see openness as a binary phenomenon but rather as a composite (consisting of multiple elements) and gradient (having different degrees) phenomenon."
While the EU's Artificial Intelligence Act intensifies the urgency, the importance of openness has long been recognized for innovation, science, and society. By demystifying AI, openness can also build trust and understanding of AI. Dingemans states, "If companies like OpenAI claim that their AI can 'pass the bar exam,' the impressiveness of this claim depends on the content of the training data.
"OpenAI has been vague about this, possibly to avoid legal risks, but the sheer scale of training data means that ChatGPT and similar next-word prediction engines can perform most exams in 'open-book' mode, making their performance less impressive."
This work contributes to establishing meaningful openness in the field of AI and reveals the growing alternatives to ChatGPT. Recently, Radboud University's Faculty of Humanities released guidelines on generative AI and research integrity, calling for researchers to have a stronger critical AI literacy when considering the use of generative AI.