Exploring New Architectures in Enterprise AI: Transformer Isn't the Only Option

2024-10-14

As more businesses focus on the so-called agent-based future, the construction methods of AI models may become a significant hurdle. For corporate AI developers, the industry needs to explore alternative model architectures to achieve more efficient AI agents.

In an interview, AI21's CEO Ari Goshen stated that the currently popular Transformer model architecture has limitations that could impede the development of a multi-agent ecosystem. He pointed out that Transformers operate by generating a large number of tokens, which is costly.

AI21 specializes in developing enterprise AI solutions and has previously noted that Transformers should be considered as one option among model architectures, rather than the default choice. The company is developing foundational models based on its JAMBA architecture, which stands for Joint Attention and Mamba Architecture. This architecture builds on the Mamba framework developed by researchers at Princeton University and Carnegie Mellon University, offering faster inference speeds and longer context handling.

Goshen mentioned that alternative architectures like Mamba and Jamba typically make agent structures more efficient and, importantly, more cost-effective. Models based on the Mamba architecture exhibit better memory performance, which aids agents, especially those connected to other models, in functioning more effectively.

He believes that AI agents are only now gaining popularity and that most agents are not yet production-ready because of the reliance on Transformer-based large language models (LLMs). Goshen noted that Transformer models have inherent randomness, leading to persistent errors and insufficient reliability.

This year, AI agents have emerged as one of the top trends in the enterprise AI sector. Several companies have launched AI agents and platforms to simplify agent development. For example, ServiceNow announced updates to its Now Assist AI platform, including an AI agent library; Salesforce introduced an agent team called Agentforce; and Slack began allowing users to integrate agents from companies like Salesforce, Cohere, Workday, Asana, and Adobe.

Despite the Transformer architecture becoming the default or standard choice for developing foundational models, such as OpenAI's GPT, experts like Goshen continue to strongly advocate for alternative architectures like Mamba. This is because Transformer models are expensive to run and cumbersome to operate.

The Mamba architecture can prioritize different data, assign weights to inputs, optimize memory usage, and leverage the processing power of GPUs. In recent months, other open-source AI developers have also started releasing Mamba-based models, such as Mistral's Codestral Mamba 7B and Falcon's Falcon Mamba 7B.

However, when selecting an AI architecture, enterprises must consider not only cost-effectiveness and reliability but also be cautious of flashy demonstrations that promise to solve numerous issues. Goshen warns that the current stage is still conducive to creating impressive demos, but there is still a considerable distance to the product phase. Businesses can use AI for research purposes, but it is not yet reliable enough to depend on for decision-making.