Anthropic launches innovative caching feature, reducing costs by 10 times. AI NEWS

Home
AInews
Anthropic launches innovative caching feature, reducing costs by 10 times.

Anthropic launches innovative caching feature, reducing costs by 10 times.

2024-08-15

Anthropic has introduced a prompt caching feature in its API, which allows developers to remember contextual information across API calls, greatly facilitating their work and eliminating the need for repetitive input. Currently, this feature is enabled in the public beta versions of Claude 3.5 Sonnet and Claude 3 Haiku, while support for the more powerful Claude Opus model is still in the works.

A 2023 paper provides a detailed explanation of how prompt caching works, allowing users to retain and reuse frequently needed background information within a session. With the model's ability to intelligently remember these prompts, users can easily add rich background information without incurring additional costs. This is particularly useful for users who need to include a large amount of contextual information within a single prompt and continuously reference it across different dialogue rounds, providing developers and other users with more flexibility in fine-tuning model responses.

Anthropic revealed that early adopters have experienced significant speed improvements and cost savings in various application scenarios, whether it's integrating complete knowledge bases with hundreds of examples or embedding every step of a conversation within prompts.

In terms of pricing, prompt caching demonstrates its economic advantages. Anthropic points out that the cost of using prompt caching is much lower than the cost of basic input tokens. Specifically, for Claude 3.5 Sonnet, the cost of writing prompts to be cached is $3.75 per million tokens (MTok), which drops to $0.30 per million tokens when using prompt caching. Compared to the base price of $3 per million tokens, this means that users can enjoy up to a 10x cost savings by paying a small additional fee in advance for subsequent usage.

The cost of prompt caching for Claude 3 Haiku is also $0.30 per million tokens, but the cost when using it is as low as $0.03 per million tokens. Although the current version of Claude 3 Opus does not support this feature, Anthropic has announced its future pricing strategy: $18.75 per million tokens for writing prompts to be cached and $1.50 per million tokens for accessing prompt caching.

However, it is worth noting that Anthropic's caching mechanism has a lifespan of 5 minutes and is refreshed every time it is called, as pointed out by AI industry expert Simon Willison on social media.

This is not the first time Anthropic has challenged the market with price advantages. Prior to the release of the Claude 3 series models, the company had already lowered token prices and is currently engaged in fierce competition with other competitors such as Google and OpenAI in offering low-cost options for third-party developers.

Prompt caching is not unique in the industry. For example, Lamina, a large-scale language model inference system, reduces GPU costs through key-value (KV) caching. The OpenAI community also has discussions on how to cache prompts, but it is important to note that prompt caching is not the same concept as the built-in memory function of large language models. While models like OpenAI's GPT-4 provide memory capabilities to remember user preferences or details, they do not directly store the history of prompts and responses, which is fundamentally different from prompt caching.

Toki AI

Toki AI schedules events through messaging apps

Ikko Earbuds

Touchscreen translation assistant for AI earbuds

Action Figure Generator

Create custom collectible action figures made by AI

Spot AI

Transform cameras into smart video intelligence

Miko

AI interactive learning companion for children

Comet

Smart browser with AI features available for any website

Mirelo AI

AI-generated soundtracks for your video projects

RECENT AI TOOLS

MINT AI

Toki AI

Ikko Earbuds

Action Figure Generator

Spot AI

RECENT AI NEWS

Unitree releases H2 humanoid robot

Meta Adds More Parental Controls for Teen AI Use

NVIDIA Launches Mass Production of Blackwell Chips at TSMC's Arizona Facility

Mirantis Kubernetes Management Platform k0rdent Releases v1.2.0

Google DeepMind and CFS Collaborate to Develop AI Plasma Control System for Nuclear Fusion

Microsoft Introduces New Copilot Automation Features for Windows 11

Pinterest Introduces New Controls to Help Users Reduce "AI Junk" in Their Feed

Honor Teases "Robot Phone" Integrating AI, Robotics, and Next-Generation Imaging Technologies

RECENT AI TOOLS