Google Bets on Multimodal AI Market with BigQuery at Its Core Strategy

2025-01-20

最近,随着多个研究机构预测多模态人工智能市场将在接下来几年内以超过35%的年复合增长率迅速增长,谷歌LLC正积极采取行动,力求在这一新兴领域中占据主导地位。

谷歌云部门表示,多模态人工智能——这种技术能够将文本、图像、视频和音频等非结构化数据与生成式人工智能处理相结合——预计将跻身2025年的五大顶尖人工智能趋势之列。

谷歌多模态AI战略的核心在于BigQuery数据仓库。据谷歌数据、分析与AI部门的产品管理主管亚斯敏·艾哈迈德介绍,谷歌正在重新定义BigQuery,使其成为可以处理并分析多种类型数据的数据湖仓。

艾哈迈德在一次采访中提到,谷歌估计企业存储的数据中有90%是非结构化的。借助图像识别、语音识别等技术以及结构化数据进行增强检索训练,企业可以从这些原本难以利用的数据中提取有价值的信息。

例如,快餐连锁品牌温迪正在试验一款应用,该应用集成了BigQuery、谷歌视觉AI和Gemini,用于分析免下车通道的视频录像,以发现潜在瓶颈。通过结合视频图像数据与员工排班信息,优化人员配置水平,实现了视频数据与运营数据在同一平台上的整合。

联合包裹运送服务公司创建了一个仪表盘,该工具基于卡车传感器数据,向司机提供实时指导,以优化配送路线。加拿大贝尔则利用AI生成的客服通话记录培训一个辅助教练系统,为客服人员提供反馈。

多模态AI使零售商能够从呼叫中心、社交媒体评论和移动应用反馈等多个来源收集客户情绪数据,并将其输入到生成式AI引擎中,识别新的细分市场,从而开展有针对性的营销活动。这种方法使得个性化和可扩展性达到了前所未有的高度。

Gemini可以直接在BigQuery上运行,无需额外的数据传输过程,大大加速了应用程序的开发速度。艾哈迈德提到,许多组织现在可以在几周内启动试点项目。

目前,大多数初期的应用主要集中在内部使用,企业在考虑将生成式AI应用于外部客户服务时显得较为谨慎。然而,在企业防火墙之内,仍有许多未被充分挖掘的机会。正如艾哈迈德所言:“最易实现的目标在于,客户拥有大量累积多年却未曾充分利用的数据。有了BigQuery支持的多模态数据架构及与视觉AI和Gemini的无缝集成,这一切变得更加可行。”

Recent years have seen multiple research institutions forecast that the multimodal AI market will grow at a compound annual growth rate exceeding 35% over the coming years. In response, Google LLC is taking proactive measures to establish dominance in this emerging field. Google Cloud highlights that multimodal AI—technology capable of integrating unstructured data such as text, images, videos, and audio with generative AI processing—is projected to be among the top five AI trends in 2025. At the heart of Google's multimodal AI strategy lies the BigQuery data warehouse. According to Yasmin Ahmed, Product Management Lead for Data, Analytics, and AI at Google, the company is repositioning BigQuery as a data lakehouse capable of handling and analyzing diverse types of data. Ahmed revealed in an interview that Google estimates 90% of enterprise data is unstructured. By leveraging technologies like image recognition, speech recognition, and structured data for enhanced retrieval training, businesses can derive valuable insights from previously underutilized data. For instance, fast-food chain Wendy's is testing an application that integrates BigQuery, Google Vision AI, and Gemini to analyze drive-thru video footage, identifying bottlenecks. Combining video image data with employee scheduling information optimizes staffing levels, enabling both video data and operational data to coexist on a single unified platform. United Parcel Service has developed a dashboard utilizing sensor data from trucks to provide real-time guidance to drivers, optimizing delivery routes. Bell Canada uses AI-generated call center transcripts to train a coaching assistant providing feedback to customer service representatives. Multimodal AI enables retailers to gather customer sentiment data from various sources such as call centers, social media comments, and mobile app feedback, feeding it into generative AI engines to identify new market segments for targeted marketing campaigns. This integration achieves unprecedented levels of personalization and scalability. Gemini operates directly on BigQuery's data foundation without requiring additional data transfers, significantly accelerating application development. Ahmed notes that many organizations can now launch pilot projects within weeks. Currently, most early applications focus on internal use, with companies being cautious about applying generative AI to external customer services. However, numerous opportunities exist behind corporate firewalls. As Ahmed puts it, "The low-hanging fruit lies in customers having vast amounts of long-collected but previously untapped data. With BigQuery’s multimodal data infrastructure and seamless integration with vision AI and Gemini, making significant strides