AI Intelligent Agents Bring Sci-Fi Novel Plots to Life

2024-05-28

If you've been following the Google I/O conference, OpenAI's spring update, or Microsoft Build conference this month, you've probably heard the term "AI agents" quite frequently. They are quickly becoming the next big thing in the tech industry, but what exactly are AI agents and why is everyone suddenly talking about them?


Google CEO Sundar Pichai described an artificial intelligence system at the Google I/O conference that can order a pair of shoes for you. At Microsoft, the company announced the Copilot AI system, which can act independently like a virtual employee. Meanwhile, OpenAI released an AI system called GPT-4 Omni, which can see, hear, and speak. OpenAI CEO Sam Altman previously told MIT Technology Review that useful agents are the biggest potential of this technology. These types of systems are the new standard that all AI companies are striving to achieve, but it's easier said than done.


In simple terms, AI agents are AI models that can perform certain actions independently. They are similar to Jarvis in "Iron Man," TARS in "Interstellar," or HAL 9000 in "2001: A Space Odyssey." They go beyond the capabilities of familiar chatbots - they can execute actions. Initially, Google, Microsoft, and OpenAI are all trying to develop agents that can handle digital operations. This means they are teaching AI agents to collaborate with various APIs on computers. Ideally, they can press buttons, make decisions, autonomously monitor channels, and send requests.


"I agree that the future belongs to agents," said Alexander Kvamme, founder and CEO of Echo AI. His company develops AI agents that analyze conversations between businesses and customers and provide insights on improving customer experience. "This industry has been talking about it for years, but it hasn't been realized yet. It's a very difficult problem."


Kvamme said a true agent system needs to make dozens or hundreds of decisions independently, which is challenging to automate. For example, in the case of returning a pair of shoes, as explained by Google's Pichai, an AI agent may need to scan your emails for receipts, extract your order number and address, fill out a return form, and perform various actions on your behalf. In this process, there are many decisions that you may not even be aware of, but you make them subconsciously.


As we can see, even in controlled environments, large language models are not perfect. When you ask LLM to work independently on the open internet, they are prone to errors. But this is the problem that countless startups (including Echo AI) and large companies like Google, OpenAI, and Microsoft are working hard to solve.


If you can create agents in the digital world, there are no barriers to creating agents that interact with the physical world. You just need to program tasks into robots. Then you truly enter the realm of science fiction, as AI agents could be assigned tasks like "accept the order for that table" or "install all the tiles on this roof." We are still far from this goal, but the first step is to teach AI agents to perform simple digital tasks.


In the world of AI agents, a frequently discussed question is ensuring that you don't design an agent that is too good at performing a specific task. If you build an agent for returning shoes, you must ensure that it doesn't return all your shoes or potentially all items with receipts in your Gmail inbox. While this may sound silly, a small fraction of AI researchers are concerned that AI agents could bring disaster to human civilization. I think it's a reasonable concern when you're building content for science fiction.


On the other hand, there are also optimists like Echo AI who believe this technology will be empowering. This divergence is quite apparent in the AI community, but optimists see the liberating effect of AI agents, similar to personal computers.


"I firmly believe that agents will solve many tasks that humans are unwilling to do," said Kvamme. "And they will have more valuable time in life. But they have to adapt."


Another use case for AI agents is autonomous vehicles. Tesla and Waymo are currently leading the way in this technology, using AI to navigate city streets and highways. While it's a niche field, autonomous driving technology is a fairly advanced area within AI agents, and we have already seen AI in action in the real world.


So, what will lead us to a future where AI can return your shoes? First, the underlying AI models may need to become better and more accurate. This means updates to ChatGPT, Gemini, and Copilot may come before fully operational agent systems. AI chatbots still need to overcome their massive illusion problem, which is a challenge many researchers have yet to find a solution for. But the agent systems themselves also need updates. Currently, OpenAI's GPT Store is the most ambitious attempt to develop agent networks, but even so, it's not yet very advanced.