"Palmyra LLM by Writer Shines in Enterprise-Level AI Performance Benchmarks" AI NEWS

Home
AInews
"Palmyra LLM by Writer Shines in Enterprise-Level AI Performance Benchmarks"

"Palmyra LLM by Writer Shines in Enterprise-Level AI Performance Benchmarks"

2024-01-10

Writer is a three-year-old San Francisco startup that raised $100 million in September 2023 to expand its proprietary enterprise-focused large language model to more companies. Despite not often making headlines compared to popular LLM startups like OpenAI, Anthropic, Meta, or even Mistral AI in France, Writer may have a small AI model called Palmyra that could be promising in enterprise use cases. Companies including Accenture, Vanguard Group, Hubspot, and Pinterest are clients of Writer, using the company's creative and productivity platform powered by the Palmyra model.

The Stanford University HAI's Foundation Models Research Center added new models to its benchmark tests last month and developed a new benchmark test called HELM Lite, which includes contextual learning capabilities. For LLMs, contextual learning means learning new tasks through a small set of examples presented in the prompt.

Writer's LLM performed "surprisingly" well in AI benchmark tests.

Although GPT-4 ranked highly in the new benchmark test, Palmyra's X V2 and X V3 models performed "surprisingly" well, "despite being smaller models," wrote Percy Liang, director of the Stanford Foundation Models Research Center.

In the field of machine translation, Palmyra's performance is particularly outstanding - ranking first. May Habib, CEO of Writer, said in a LinkedIn post, "Writer's Palmyra X performs even better than the classic benchmark tests. We are not only the top model in the MMLU benchmark test, but also the top model overall - second only to the analyzed GPT-4 preview version. In the translation benchmark test - a new test - we rank first."

Enterprises need to use economically viable models to build

In an interview, Habib said it would be economically challenging for enterprises to run models like GPT-4, which are trained on 1.2 trillion tokens, in their own environments. "Generative AI use cases in 2024 now need to make economic sense," she said.

She also pointed out that enterprises are building use cases based on GPT models and then "two to three months later, the prompts no longer work because the models have been fine-tuned, and their own service costs are too high." She referred to the HELM Lite benchmark test leaderboard from Stanford University HAI and maintained that GPT-4 (0613) is traffic-limited, so "it will be fine-tuned," while GPT-Turbo is just "a preview version, and we don't know their plans for this model."

Habib added that she believes the benchmarking work of Stanford University HAI is "closest to real-world enterprise use cases and real enterprise practitioners," rather than rankings from platforms like Hugging Face. "Their scenarios are closer to actual usage," she said.

Bet Ideas

AI predictions and tips for sports betting

Z.AI Chat

AI chat tool for generating code and files

KiloCode

Open-source AI coding assistant for efficient code generation

AutoShorts

AI tool for effortless faceless video creation

ScanSoles

AI foot scanning for custom insoles

Tempus One

AI tool for detecting cancer and analyzing various medical conditions

Furbo Nanny

AI pet camera for real-time monitoring

RECENT AI TOOLS

Riskified

Bet Ideas

Z.AI Chat

KiloCode

AutoShorts

RECENT AI NEWS

Google's November Pixel Update Introduces New AI Features

Google Launches Private AI Computing to Enable On-Device Gemini Features

Anthropic to Spend $50 Billion on U.S. Data Center Infrastructure

Baidu's Latest ERNIE Model Brings Visual Reasoning to Open-Source AI

Chip Startup d-Matrix Raises $275M to Accelerate In-Memory Compute for Inference

Li Fei-Fei’s World Labs Accelerates World Model Race with First Commercial Product, Marble

OpenAI Pushes ChatGPT Toward a More Personalized Assistant with GPT-5.1 Update

BMW to Use Alexa+ for In-Car Voice Assistance

RECENT AI TOOLS