Microsoft releases multimodal small AI model Phi-3-vision

2024-05-22

At the 2024 Microsoft Build conference, the technology giant Microsoft announced the newest member of its small-scale open model Phi-3 family. Of particular interest is Phi-3-vision, a multimodal model that combines language and visual capabilities. This model, with 4.2 billion parameters, can generate insights from charts and diagrams, providing powerful tool support for various applications. Key points include: - Phi-3-vision: This is a multimodal model that combines language and visual capabilities, allowing it to understand and generate insights from text and images, including charts and diagrams. - Phi-3-small and Phi-3-medium: These previously announced models are now available on Microsoft Azure, providing developers with powerful tools to build generative AI applications. - Phi-3-mini: As the first model in the Phi-3 family, it is now available through Azure AI's model-as-a-service, making it easier for users to get started. The Phi-3-vision model excels in tasks such as optical character recognition (OCR), chart analysis, and diagram understanding. It is designed to process and reason with real-world images, providing important tools for developers working with visual data. The Phi-3 models demonstrate outstanding performance and cost advantages compared to larger language models. For example, Phi-3-small outperforms models twice its size, including GPT-3.5 Plus, despite having only 7 billion parameters. Phi-3-vision continues this trend by surpassing larger models such as Claude-3 Haiku and Gemini 1.0 Pro V in visual reasoning tasks. The compact design of the Phi-3 models allows them to be deployed on devices, enabling low-latency AI experiences without the need for a network connection, making them an ideal choice. Additionally, these models offer higher cost-effectiveness. According to Sébastien Bubeck, Vice President of GenAI Research at Microsoft, the cost of Phi-3 has been "significantly reduced." As the availability of models continues to evolve, choosing the right model will depend on specific use cases and business needs. The expansion of the Phi-3 family provides developers with a set of versatile tools for building generative AI applications. The advantages of Phi-3 models in performance, cost-effectiveness, and versatility make them an ideal choice for a wide range of use cases, showcasing the immense potential of small-scale language models in the field of AI.