Nvidia Launches Multilingual Microservices for AI Cross-Language Data Processing AI NEWS

Home
AInews
Nvidia Launches Multilingual Microservices for AI Cross-Language Data Processing

Nvidia Launches Multilingual Microservices for AI Cross-Language Data Processing

2024-12-18

Nvidia has recently launched a new microservice designed to assist AI engineers in building generative AI applications capable of storing and retrieving data across different languages, making it easier to transcend national boundaries.

To enhance the accuracy of data retrieval for generative AI in multilingual environments, Nvidia has introduced the multilingual-capable NeMo Retriever software through its developer application interface catalog. This software can understand and process data in various languages and formats, converting it into text to provide context-aware search results.

NeMo Retriever enables developers to build information ingestion and retrieval pipelines for AI models, extracting structured and unstructured data from texts, documents, and tables, while avoiding duplicate content. It uses embedding technology to transform data into a language that AI can understand and store in a vector database.

Embedding technology is a sophisticated mathematical representation method used to show the attributes and relationships between data such as words and phrases. This helps capture the "closeness" in meaning between two words or sentences during searches or thought processes. For example, "cat" and "dog" are considered close because they are both animals and pets, while "toaster" and "dog" are far apart despite being common household items but belonging to different categories.

Kari Briski, Vice President of Generative AI Software at Nvidia, stated in an interview that using Retriever to embed and retrieve data in its native language improves accuracy. This is primarily because most AI training datasets are predominantly in English. The "information loss" that occurs during translation can lead to a loss of context or accuracy with each conversion.

Briski noted that when Retriever was first released, customers urgently requested multilingual support due to the decrease in accuracy caused by translation software. Businesses often operate in multiple languages, embedding English documents, German tests, Japanese materials, or Russian research reports. These pieces of information need to be searchable by the same model, but the more tools involved, the lower the accuracy becomes.

In addition to ingestion, NeMo Retriever can "evaluate and re-rank" results to ensure the accuracy of the answers. When a query passes through Retriever, it checks the response from the vector database and ranks the retrieved information based on its relevance to the query.

Nvidia collaborated with DataStax to use NeMo Retriever to convert 10 million Wikipedia entries into an AI-ready format in less than three days, a process that typically takes 30 days.

Furthermore, Nvidia's partners, including Cohesity, Cloudera, SAP, and VAST Data, are integrating support for these new microservices to handle large multilingual data sources. This includes services like retrieval-augmented generation, which allows pre-trained generative AI to access richer and more relevant information from real-time data sources. The application of multilingual sources enables businesses to access more data.

Currently, the multilingual version of NeMo Retriever supports only text retrieval and responses. Briski mentioned that the company is researching future applications in areas such as multimodal data, images, PDFs, and videos. "We are currently focusing on text because if you can do well with text, you can achieve good results with other forms of data as well," she said.

COUNT

COUNT - Automate accounting and gain valuable insights

Scan Relief

Scan Relief - Automate receipt scanning and organization

Mindtrip

Mindtrip - AI chatbot that helps you organize a your trip

Ai Drive

Ai Drive - Chat with multiple PDF files

Convex

Convex - AI backend platform for AI assisted app development

Ilus AI

Ilus AI - AI illustration tool for stunning visual content

Vast AI

Vast AI - Cloud-based GPU Rentals for AI Computing

RECENT AI TOOLS

Gitingest

COUNT

Scan Relief

Mindtrip

Ai Drive

RECENT AI NEWS

Huawei to Launch New AI Chip, Challenging Nvidia

Google DeepMind UK Team Reportedly Seeks to Form a Union

Cedar: A New Approach to Solving Kubernetes Authorization Issues

Thin Film Actuator Powered Microbots: Morph, Lock Shape, and Operate Tetherlessly

Double-clicking the Google Photos search icon restores classic search

Meta's AI Chatbot Enables Sexual Conversations with Minors

Solve This Math Problem by Musk to Get Hired at Tesla?

Google AI Studio Update: Features, Tools, VEO 2, and Gemini 2.0

RECENT AI TOOLS