OctoAI Introduces OctoStack Platform to Streamline Enterprise AI Model Hosting
OctoAI Launches OctoStack Software Platform for Hosting AI Models on Internal Infrastructure
OctoAI has introduced the OctoStack software platform, which enables companies to host artificial intelligence models on their internal infrastructure.
Many large language models are delivered through cloud-based application programming interfaces (APIs). These models are hosted on the respective developers' infrastructure, requiring customers to send their data to that infrastructure for processing. Hosting neural networks on internal hardware eliminates the need to share data with external vendors, simplifying network security and regulatory compliance for enterprises.
OctoAI states that its newly launched OctoStack platform makes it easier to host AI models on internal infrastructure. The platform can run on internal hardware, major public clouds, and AI-optimized infrastructure-as-a-service platforms like CoreWeave. OctoStack is also compatible with multiple AI accelerators from Nvidia and Advanced Micro Devices, as well as the AWS Inferentia chip available in Amazon Web Services.
The platform is partially based on open-source technology Apache TVM, developed by OctoAI's founder. Apache TVM is a compiler framework that simplifies the optimization of AI models to run on multiple chips.
After creating the initial version of a neural network, developers can optimize it in various ways to improve performance. One technique is operator fusion, which compresses some of the AI computations into fewer and more efficient hardware computations. Another technique is quantization, which reduces the amount of data processing required for accurate results in a neural network.
These optimizations are not always applicable to different types of hardware. Therefore, an AI model optimized for one graphics card may not necessarily run efficiently on a processor from another chip manufacturer. The open-source technology TVM adopted by OctoStack automates the process of optimizing neural networks for different chips.
OctoAI claims that its platform helps customers run their AI infrastructure more efficiently. According to the company, the inference environment powered by OctoStack can increase GPU utilization by four times compared to building an AI cluster from scratch. The company also promises to reduce operating costs by 50%.
Louis Cerezo, co-founder and CEO of OctoAI, said, "Building viable and future-oriented generative AI applications for customers requires not only cost-effective cloud inference but also hardware portability, model integration, fine-tuning, optimization, load balancing, and comprehensive solutions. These are all full-stack problems that require comprehensive solutions."
OctoStack supports popular open-source LLMs such as Llama from Meta Platforms Inc. and Mixtral, a hybrid expert model developed by startup Mistral AI. The company can also run internally developed neural networks. According to OctoAI, OctoStack allows for updating AI models over time in the inference environment without significant changes to the supported applications.