Although large language models are created by humans, they are still quite mysterious. The high-powered algorithms driving the current AI frenzy have a way of doing things that is not immediately obvious to those observing them. This is why artificial intelligence is often referred to as a "black box," a phenomenon that is not easily understood from the outside.
A new study published by Anthropic, a top company in the AI industry, attempts to shed light on the more perplexing aspects of AI algorithm behavior. On Tuesday, Anthropic published a research paper aimed at explaining why its AI chatbot, Claude, chooses to generate content on certain topics rather than others.
AI systems are designed to approximate the human brain - a layered neural network that can receive and process information, and then make "decisions" or predictions based on that information. Such systems are "trained" on subsets of big data, enabling them to make algorithmic connections. However, when AI systems produce output based on their training, human observers do not always know how the algorithm arrived at that output.
This mystery has given rise to the field of AI "explanation," where researchers attempt to trace the machine's decision-making path in order to understand its output. In the field of AI explanation, "features" refer to patterns of "neurons" activated in the neural network, essentially concepts that the algorithm may reference. The more researchers understand the "features" in the neural network, the better they can understand how certain inputs trigger the network to affect certain outputs.
In a memo about their findings, Anthropic researchers explain how they used a process called "dictionary learning" to decipher which parts of Claude's neural network map to specific concepts. Using this approach, the researchers were able to "begin to understand the model's behavior by observing which features respond to specific inputs, allowing us to gain insight into how the model 'reasons' to produce a given response."
It is worth noting that like any profit-driven company, Anthropic and other for-profit companies may write and publish their research in a way that aligns with their business interests. However, the team's paper is publicly available, which means you can read it yourself and draw your own conclusions about their findings and methodology.