Over the past few weeks, researchers from Google and Sakana have unveiled two cutting-edge neural network designs that could revolutionize the AI industry.
These technologies aim to challenge transformers, a type of neural network that connects inputs and outputs based on context—a technology that has defined AI for the past six years.
The new approaches are Google's "Titans" and Sakana's "Transformers Squared." Sakana is a Tokyo-based AI startup inspired by nature. Both Google and Sakana have studied the human brain to address transformer limitations. Their models leverage different stages of memory and activate distinct expert modules independently, rather than calling the entire model each time.
The end result is AI systems that are smarter, faster, and more versatile without needing to be larger or more costly to operate.
Background: The transformer architecture, which gives ChatGPT its "T," was designed for sequence-to-sequence tasks like language modeling, translation, and image processing. Transformers rely on an "attention mechanism" to understand the importance of concepts based on context, allowing them to process data in parallel rather than sequentially, as with recurrent neural networks (RNNs). This approach gave models contextual understanding and marked a watershed moment in AI development.
However, despite their success, transformers face significant challenges in scalability and adaptability. To make models more flexible and versatile, they need to be more powerful. Once trained, improvements can only come through developing new models or using third-party tools, leading to the widespread belief that "bigger is better" in AI.
This could soon change, thanks to Google and Sakana.
Titans: A New Memory Architecture for Simplified AI
Google's Titans architecture enhances AI adaptability by altering how models store and access information rather than how they process it. It introduces a neural long-term memory module capable of learning during testing, similar to human memory.
Currently, models read the entire prompt and output, predict one token, then reread everything to predict the next token, and so on. They excel in short-term memory but struggle with long-term retention. They may fail when required to remember information outside their context window or amidst substantial noise.
Titans combine three types of memory systems: short-term (similar to traditional transformers), long-term (for storing historical context), and persistent memory (for task-specific knowledge). This multi-level approach enables models to handle sequences over 2 million tokens long, far exceeding current transformers' capabilities.
According to research papers, Titans show significant improvements across various tasks, including language modeling, common sense reasoning, and genomics. The architecture excels particularly in pinpointing specific information within extensive contexts.
The system mimics the human brain by activating specific regions for different tasks and dynamically reconfiguring its network based on changing needs.
In other words, akin to how different neurons in your brain specialize in various functions and are activated based on tasks, Titans simulate this concept through interconnected memory systems. These systems—short-term, long-term, and persistent memory—work together to store, retrieve, and process information dynamically based on current tasks.
Transformers Squared: Adaptive AI Arrives
Two weeks after Google's paper, Sakana AI and the Tokyo Institute of Science introduced Transformers Squared, a framework enabling AI models to modify their behavior in real-time based on current tasks. The system selectively adjusts components of its weight matrices during inference, making it more efficient than traditional fine-tuning methods.
According to the research paper, Transformers Squared employs a dual-pass mechanism: first, a scheduling system identifies task attributes, followed by dynamic blending of task-specific 'expert' vectors trained via reinforcement learning to achieve targeted behavior for incoming prompts.
It trades off inference time (thinking more) for specialization (knowing which expertise to apply).
The innovation of Transformers Squared lies in its ability to adapt without extensive retraining. The system uses Singular Value Fine-Tuning (SVF), focusing on modifying only essential components for specific tasks. This method significantly reduces computational demands while matching or improving upon current performance levels.
In tests, Sakana's Transformer demonstrated notable diversity across different tasks and model architectures. The framework showed particular potential in handling out-of-distribution applications, indicating it could help AI systems become more flexible and responsive to new situations.
To illustrate, consider learning a new skill; your brain forms new neural connections without rewriting everything. For example, when learning to play the piano, your brain doesn't rewrite all knowledge—it adjusts specific neural circuits for that task while maintaining other abilities. Sakana's idea is that developers don't need to retrain the entire network to adapt to