Princeton University and Meta AI jointly launch Lory model, reshaping self-regressive language model pre-training. AI NEWS

Home
AInews
Princeton University and Meta AI jointly launch Lory model, reshaping self-regressive language model pre-training.

Princeton University and Meta AI jointly launch Lory model, reshaping self-regressive language model pre-training.

2024-05-13

Princeton University and Meta AI's research team recently announced a breakthrough research achievement - the Lory model. By extending the expert mixture (MoE) architecture to autoregressive language model pre-training, this model has significantly improved performance in the field of natural language processing.

The MoE architecture has always performed well in model size scaling and efficient training and inference due to its sparse activation characteristics. However, traditional MoE models face optimization challenges with non-differentiable and discrete objectives during the training process. To address this issue, researchers from Princeton University and Meta AI have developed the Lory model, which solves the limitations of traditional MoE models through two innovative techniques.

One of the core technologies of the Lory model is the causal segment routing strategy. This strategy divides the input token sequence into smaller segments of fixed length and uses the weights of the original segments to evaluate the merging experts of subsequent segments. This strategy efficiently combines experts while maintaining the autoregressive nature of the language model.

Another key technology is the similarity-based data batching method. By grouping similar documents to create continuous segments during training, the Lory model overcomes the problem of insufficient expert specialization caused by segment-level routing during inference. This technique significantly improves the training efficiency of expert routing, enabling the Lory model to demonstrate outstanding performance in multiple aspects.

Lory excels in multiple aspects:

· Training efficiency and convergence: Lory achieves a considerable loss level with less than half the number of training tokens, resulting in better performance under the same training computation for 0.3B and 1.5B models.
· Language modeling: This MoE model surpasses dense baseline models in all domains, significantly reducing perplexity. For example, compared to the 0.3B dense model, the 0.3B/32E model achieves a relative improvement of 13.9% in the book domain.
· Downstream tasks: The 0.3B/32E model achieves performance improvements in various downstream tasks such as commonsense reasoning and reading comprehension, with average performance improvements of +3.7%, +3.3%, +1.5%, and +11.1% respectively.

This breakthrough achievement has received widespread attention in the industry. Experts believe that the introduction of the Lory model will greatly promote the development of the field of natural language processing and provide more efficient and accurate solutions for various application scenarios.

Researchers from Princeton University and Meta AI stated that they will continue to expand the scale of the Lory model in the future. They will further enhance the model's performance by developing efficient decoding methods and integrating token and segment-level routing. At the same time, they will actively explore the potential applications of the Lory model in other fields and contribute more to the development of artificial intelligence technology.

Figr

Figr - AI design assistant for fast prototyping

Completely AI

Completely AI - AI tool for generating competitive analysis

Zeroheight

Zeroheight - Centralized design system documentation tool

LockedIn AI

LockedIn AI - AI job interview assistant

Interviewer AI

Interviewer AI - AI video interviews streamline talent screening process

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

RECENT AI TOOLS

Mailteorite

Figr

Completely AI

Zeroheight

LockedIn AI

RECENT AI NEWS

Tesla Introduces xAI's Grok Chatbot via 2025.26 Software Update

Elon Musk's X Cuts Subscription Prices in India by Up to 47% for Web and Mobile Apps

OpenAI Delays Once Again Launch of Its Open Model

xAI and Grok Apologize for Their "Disturbing Behaviors"

Elon Musk's xAI Reportedly Raising New Funding at $20 Billion Valuation

Study Warns of Significant Risks in Using AI Therapy Chatbots

Meta Acquires Voice Technology Startup Play AI

Google Reaps Benefits as OpenAI's Acquisition of Windsurf Fails

RECENT AI TOOLS