Nvidia has officially launched the complete suite of NeMo microservices, a collection of tools designed to assist developers in accelerating the deployment of AI agents by leveraging large-scale AI inference and information systems.
Agents have become a key focus for creating "digital team members" that can enhance productivity for knowledge and service workers by taking orders, discovering information, and performing proactive tasks.
Unlike AI chatbots, agents can operate autonomously with minimal or no human supervision. However, they require data to make accurate and effective decisions as part of their reasoning process. This is especially true for proprietary knowledge that may be locked behind corporate firewalls or when working with fast-changing real-time information.
"Without a continuous stream of high-quality inputs—whether from databases, user interactions, or real-world signals—an agent's understanding may degrade, leading to less reliable responses and reduced productivity," said Joey Conway, Senior Director of Generative AI Software at Nvidia.
To help developers build and deploy agents more efficiently, Nvidia introduced NeMo microservices, including Customizer, Evaluator, Guardrails, Retriever, and Curator. These tools aim to streamline the experience for enterprise AI engineers when scaling and accessing data to create AI-driven agent experiences.
Customizer enhances large language model fine-tuning by providing up to 1.8 times the training throughput. It offers an application programming interface (API) that allows developers to quickly manage models before fitting them to deployment datasets. Evaluator simplifies the assessment of AI models and workflows against custom and industry benchmarks with just five API calls.
Guardrails ensures AI models or agents operate safely and within bounds, delivering additional compliance at 1.4 times the efficiency with only half a second of added latency. Retriever, unveiled at GTC 2025, enables developers to build agents capable of extracting and processing data accurately, empowering them to construct complex AI data pipelines like retrieval-augmented generation.
"NeMo microservices are easy to use, run on any accelerated computing infrastructure—on-premises or in the cloud—and provide enterprise-grade security, stability, and support," Conway added.
Nvidia designed the NeMo tools to be accessible to developers with general AI knowledge through API calls, enabling them to launch and run AI agents. Enterprises are now building sophisticated multi-agent systems where hundreds of expert agents collaborate toward unified goals while working alongside human team members.
Extensive Support for Numerous Models and Partners
NeMo microservices support a wide range of popular open AI models, including Meta Platforms Inc.'s Llama, Microsoft's Phi series of small language models, Google LLC's Gemma, and Mistral.
Nvidia's Llama Nemotron Ultra, which currently ranks first in scientific reasoning, coding, and complex mathematics benchmarks, is also accessible via these microservices.
Leading AI service providers such as Cloudera Inc., Datadog Inc., Dataiku, DataRobot Inc., DataStax Inc., SuperAnnotate AI Inc., and Weights & Biases Inc. have integrated NeMo microservices into their platforms. Developers can now start incorporating these microservices into their workflows using popular AI frameworks like CrewAI, Haystack by Deepset, LangChain, LlamaIndex, and Llamastack.
With the new NeMo microservices, Nvidia partners and tech companies have built AI agent platforms that enable digital team members to accomplish more.
For instance, AT&T Inc. uses NeMo Customizer and Evaluator to improve the accuracy of AI agents by fine-tuning the Mistral 7B model for personalized services, fraud prevention, and network performance optimization. BlackRock Inc. is integrating microservices into its Aladdin technology platform to unify investment management through a universal data language.