Google Announces Open-Sourcing Magika: Revolutionizing File Format and Content Recognition Technologies

2024-02-17

Google recently announced on its official blog that it has open-sourced an innovative tool called Magika. This tool, based on artificial intelligence, can quickly and accurately identify file formats and content types. The related source code has been uploaded to GitHub for global developers to share and contribute. The core of Magika lies in its customized and highly optimized deep learning model. This model runs on CPUs and can accurately identify file types in milliseconds. Google shared performance data for Magika, stating that after benchmark evaluation tests on over 1 million files in more than 100 formats, Magika outperforms existing tools by approximately 20%. Additionally, Magika achieves a precision and recall rate of over 99%, demonstrating outstanding performance and stability. Internally at Google, Magika has been widely used to enhance user security. The system has been deployed on a large scale to send files from Gmail, Drive, and Safe Browsing to appropriate security and content policy scanners. Compared to traditional rule-based systems, Google has improved the accuracy of file type identification by 50% through Magika, significantly enhancing security performance and user experience. Furthermore, Google revealed that VirusTotal has integrated Magika to further improve the platform's efficiency and accuracy. Before Code Insight analyzes files in VirusTotal, Magika serves as a pre-filter. Code Insight utilizes Google's generative artificial intelligence to detect malicious code, providing users with more comprehensive and reliable security protection. With the open-sourcing of Magika, global developers will have the opportunity to participate in the improvement and optimization of this innovative tool. We look forward to seeing more applications and developments of Magika in the field of file format and content recognition.