Researchers at Anthropic PBC have released two new papers today, uncovering fresh insights into how large language models process information.
According to the company, these findings contribute to a deeper understanding of the reasoning mechanisms in LLMs. Furthermore, this research could enhance how developers evaluate the reliability of their models. The ability to assess an LLM's capacity to generate accurate outputs is a critical requirement for enterprise machine learning projects.
Multistep Reasoning
In this research initiative, Anthropic's team posed a question to one of the company's Claude LLM variants: "What is the opposite of 'small'?" They then repeated the question in multiple languages. The goal was to determine how the LLM processes prompts.
Anthropic discovered that some internal components used by Claude to answer questions understand only one language, while other components are language-agnostic. Additionally, Claude appears to have significantly more of these language-independent modules compared to smaller LLMs.
These language-independent components provide "additional evidence of a universal concept—an abstract shared space where meaning resides and thinking can occur before being translated into a specific language," explained Anthropic researchers in a blog post. "More practically, this suggests that Claude can learn something in one language and apply that knowledge when using another."
This capability is important because the ability to transfer concepts from one domain to another is a key aspect of reasoning. "Studying how models share their knowledge across contexts is essential for understanding their most advanced reasoning abilities," the researchers elaborated.
Another crucial element of advanced reasoning is the ability to plan ahead. Anthropic found that Claude also possesses this skill. Researchers uncovered this by examining how the LLM generates poetry.
In theory, Claude should generate the first line of a poem, begin the second line, and then find a way to make the end of the second line rhyme. However, in practice, the model starts considering the ending of the second line much earlier. This indicates that Claude has the ability to plan future tasks in advance.
Anthropic also determined that LLMs can adjust their plans when necessary. After disabling a component in Claude that was responsible for generating rhymes, the model found a way to create rhymes using a different component. "This demonstrates both planning ability and adaptive flexibility—Claude can adjust its approach when the expected outcome changes," the researchers explained.
In another evaluation, Anthropic studied how Claude handles questions that could be answered by "memorizing" training data. They found that instead of simply recalling information, the model generated answers through a multistep reasoning workflow.
LLM Reliability
One method developers use to check the reliability of an LLM is to ask it to explain how it responded to a prompt. While studying Claude's reasoning capabilities, Anthropic discovered that the explanations provided by the model did not always reflect its actual thought process.
The researchers asked the LLM to solve a series of simple math problems. Claude claimed it solved them using standard methods. However, upon closer inspection, Anthropic found that the model employed a completely different approach than what it described.
"This may reflect the fact that the model learned to explain mathematics by simulating the kinds of explanations people write, but it had to learn to do math directly in its 'mind,' without any prompting, developing its own internal strategies to achieve this," detailed Anthropic's researchers.
Currently, tracking how Claude responds to a prompt with dozens of words requires several hours of manual effort. According to Anthropic, understanding how LLMs handle more complex requests will require advancements in the observation methods outlined today. The company’s researchers believe AI might help accelerate this workflow.