Dark Side of the Moon’s Domestic Large-Scale Model Kimi Releases New Visual Thinking Model K1

2024-12-16

Today, Moon's Dark Side, a leading domestic artificial intelligence company, officially launched K1, the latest visual reasoning model in its homegrown Kimi large-scale model series. Crafted with advanced reinforcement learning technology, K1 not only natively supports end-to-end image understanding capabilities but also incorporates chain-of-thought techniques, further broadening its application scope to encompass various fundamental scientific disciplines beyond mathematics.

Following the release of the K1 model, the latest version of the Kimi intelligent assistant is now available on Android and iPhone apps, as well as the official web platform kimi.com. Users can effortlessly access the Kimi Visual Reasoning feature by navigating to the Kimi+ section in the latest mobile app or web version, allowing them to experience this innovative functionality through simple photo capturing or image uploading.

Reportedly, in benchmark capability tests across fundamental scientific disciplines such as mathematics, physics, and chemistry, the K1 model has exhibited outstanding performance, surpassing globally recognized benchmark models including OpenAI's O1, GPT-4O, and Claude 3.5 Sonnet. This achievement not only highlights K1's formidable prowess in the basic sciences but also establishes a solid foundation for its extensive future applications.

Moon's Dark Side officials have stated that the K1 model achieves genuine end-to-end capabilities in visual reasoning and image comprehension. Users can directly input image data into the model, which autonomously processes and deduces answers without relying on external OCR or other visual models for auxiliary information processing.

From a model training perspective, the K1's training process has been meticulously designed and optimized. Initially, a foundational model was obtained through pre-training, followed by post-training reinforcement learning based on this foundation. During the foundational stage, K1 specifically enhanced its character recognition capabilities, achieving an impressive score of 903 on OCRBench and excelling in benchmark datasets such as MathVista-testmini, MMMU-val, and DocVQA with high scores of 69.1, 66.7, and 96.9 respectively, positioning it among the global leaders.

Notably, K1 has also made significant progress during the reinforcement learning post-training phase. Enhancements in data quality and learning efficiency have been implemented, along with breakthroughs in scaling reinforcement learning. These advancements are the pivotal factors enabling K1's visual reasoning model to achieve industry-leading results in benchmark tests.

However, Moon's Dark Side has candidly acknowledged that during internal testing, the K1 visual reasoning model still exhibits certain limitations. For instance, its out-of-distribution generalization capabilities, success rate in handling more complex problems, accuracy in noisier environments, and effectiveness in multi-turn question-answering scenarios all present significant room for improvement. Additionally, in some contexts and aspects of generalization, the K1 model still lags behind OpenAI's O1 series models.