Unveiling a New Approach: Accessing Massive Training Data from AI Models

2023-11-30

A new research paper claims that large language models may inadvertently expose a significant amount of training data through a technique referred to as "extractable memory" by researchers.

The paper details how researchers developed methods to extract terabytes of text data from training sets of popular open-source natural language models, including models from companies such as Anthropic, EleutherAI, Google, and OpenAI. Katherine Lee, a senior research scientist at Google Brain, CornellCIS, and former Princeton University, explained on Twitter that previous data extraction techniques did not work on OpenAI's chat models:

When we ran the same attack on ChatGPT, it seemed to have almost no memory because ChatGPT has been "tuned" to behave like a chat model. But by running our new attack, we can make it output training data with three times the probability of any other model we studied.

The core technique involves prompting the model to continue a random sequence of text fragments and checking if the generated continuation contains field-by-field matches from publicly available datasets, totaling over 9TB of text.

Obtaining Training Data through Ranking

Using this strategy, they extracted over 1 million unique training examples of 50+ tokens from smaller models like Pythia and GPT-Neo. From the massive OPT-175B model with 175 billion parameters, they extracted over 100,000 training examples.

More concerning is that this technique has also been shown to efficiently extract training data from commercial deployment systems such as Anthropic's Claude and OpenAI's industry-leading ChatGPT, indicating potential issues even in widely used production systems.

By prompting ChatGPT to repeat a single vocabulary word like "the" hundreds of times, the researchers demonstrated how they could "steer" the model away from its standard conversational output and generate more typical text continuations, resembling its original training distribution - a complete character-by-character distribution.

Some AI Models Seek to Protect Training Data through Encryption

While companies like Anthropic and OpenAI aim to protect training data through techniques such as data filtering, encryption, and model alignment, these findings suggest that more work may be needed to mitigate the privacy risks posed by large parameter models. Nevertheless, researchers not only view memory as a privacy compliance issue but also propose it as a model efficiency problem, implying that memory consumes a significant portion of the model's capacity that could be allocated to utility.