Root Causes of Bias in Large Language Models AI NEWS - Find The Best AI Tools & AI News

Home
AInews
Root Causes of Bias in Large Language Models

Root Causes of Bias in Large Language Models

2024-01-16

As artificial intelligence models browse through hundreds of gigabytes of training data to learn subtle differences in language, they also absorb the biases woven into the text.

Computer science researchers at Dartmouth College are designing methods to study the parts of the models that encode these biases, paving the way to mitigate or even eliminate them.

In a recent paper published in the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, co-authors graduate student Guarnini, computer science Ph.D. candidate Weicheng Ma, and computer science assistant professor Soroush Vosoughi investigated how stereotypes are encoded in pre-trained large language models.

Large language models or neural networks are deep learning algorithms designed to process, understand, and generate text and other content when trained on large datasets.

Vosoughi said that pre-trained models have biases, such as stereotypes. These are often positive (e.g., implying that a specific group excels in certain skills) or negative (assuming someone's occupation based on their gender).

Machine learning models are expected to permeate everyday life in various ways. They can assist hiring managers in sifting through piles of resumes, facilitate faster approval or rejection of bank loans, and provide recommendations during parole decisions.

However, built-in stereotypes based on demographics can lead to unfair and unwelcome outcomes. To mitigate this impact, "we asked ourselves what we can do after model training to address these stereotypes," Vosoughi said.

The researchers first hypothesized that stereotypes, like other linguistic features and patterns, are encoded in specific parts of the neural network model called "attention heads." These are similar to a group of neurons; they allow machine learning programs to remember multiple words provided as input, along with other features, some of which are still not fully understood.

Ma, Vosoughi, and their collaborators created a dataset filled with stereotypes and used it to repeatedly fine-tune 60 different pre-trained large language models, including BERT and T5. By amplifying the stereotypes in the models, the dataset acted as a detector, focusing on the attention heads responsible for encoding these biases.

In their paper, the researchers demonstrated that pruning the worst-performing attention heads significantly reduces stereotypes in large language models without significantly affecting their language capabilities.

"Our findings challenge the conventional view that advancements in artificial intelligence and natural language processing require extensive training or complex algorithmic interventions," Ma said. According to Ma, this technique will have broad applicability as it is not inherently specific to language or models.

Importantly, Vosoughi added that the dataset can be adjusted to reveal certain stereotypes while retaining others - "it's not a one-size-fits-all approach."

So, a medical diagnostic model, where age or gender differences may be important for patient assessment, would use a different version of the dataset than a job candidate selector aiming to remove model biases.

This technique is only effective when access to fully trained models is available and does not apply to black-box models like OpenAI's chatbot ChatGPT, whose internal workings are invisible to users and researchers.

Adapting this approach to black-box models is their next immediate goal, Ma said.

Watermark Remover

Watermark Remover - AI tool for automatic watermark removal

Geo Finder AI

Geo Finder AI - AI tool for identifying locations in media

Mailteorite

Mailteorite - AI email generator that reflects your brand

Figr

Figr - AI design assistant for fast prototyping

Completely AI

Completely AI - AI tool for generating competitive analysis

Zeroheight

Zeroheight - Centralized design system documentation tool

LockedIn AI

LockedIn AI - AI job interview assistant

RECENT AI TOOLS

Kiro AI

Watermark Remover

Geo Finder AI

Mailteorite

Figr

RECENT AI NEWS

Google Discover Launches AI Summaries, Publishers Face Greater Traffic Challenges

Google Consolidates Android and Chrome OS to Emulate Apple's Success

Mistral Releases Voxtral: First Open-Source AI Audio Model

Uber and Baidu Collaborate to Launch Robotaxis Globally, Starting in Dubai and Abu Dhabi

Meta's Latest AI Strategy: Building Two Large Data Centers to Achieve Superintelligence

Former OpenAI Engineer Reveals Inside Look at Company Work Experience

Meta Patches Vulnerability That Could Lead to Data Leaks in User AI Prompts and Generated Content

Meta Uses Tents to Build Data Center

RECENT AI TOOLS