Google has announced the release of Gemini 1.5, the latest version of its conversational AI model for developers. This upgrade significantly improves efficiency and performance through a new architecture called Expert Mixture (MoE).
The MoE architecture enables Gemini 1.5 to execute complex tasks faster while maintaining quality with lower computational requirements. Essentially, it functions like a constellation of "expert" neural networks that selectively activate the most relevant pathways based on input, resulting in significantly improved efficiency. This gives it more sophisticated reasoning and problem-solving capabilities compared to previous models.
However, the most prominent feature of Gemini 1.5 is its unparalleled ability to understand long contexts. The model can handle up to 1 million tokens, setting a new milestone for large-scale base models. Google has tested up to 10 million tokens in their research.
From this perspective, Gemini 1.5's million-token context window surpasses most state-of-the-art models by 10 times and outperforms Anthropic's Claude (200,000) by 5 times.
This means that 1.5 Pro can process a large amount of information at once, including 1 hour of video, 11 hours of audio, a 700,000-word document, or a codebase with over 30,000 lines of code.
"This breakthrough in long-context understanding will open up new possibilities for people, developers, and businesses to create, discover, and build with artificial intelligence." - Demis Hassabis, CEO of Google DeepMind.
For developers and enterprise customers, this brings unlimited possibilities. The ability to handle such extensive context windows allows for the development of more detailed and complex AI applications, covering various domains from content analysis to solving intricate coding problems.
In benchmark tests, Gemini 1.5 Pro outperforms its predecessor Gemini 1.0 Pro in 87% of text, code, image, audio, and video evaluations. Despite using lower computational power, its performance is on par with the larger 1.0 Ultra model.
With the launch of Gemini 1.5, Google emphasizes its ongoing commitment to secure and ethical AI development. The model has undergone extensive ethical and safety testing to ensure compliance with Google's AI principles. Given the novel capabilities and potential wide-ranging impact of this model, such scrutiny is crucial.
Google initially offers a limited preview of Gemini 1.5 Pro through its AI Studio platform and Vertex AI. This allows early testers to try out the model and provide feedback before a broader release.
Developers can now register on AI Studio to try the 1.5 Pro model, which has a standard token context length of 128,000. Google plans to add pricing tiers soon, with the highest tier scalable up to 1 million tokens.
During the preview period, testers have free access to the experimental million-token context window. However, Google notes that users should currently expect longer latency times. Significant speed optimizations are currently underway to improve response times.