JPMorgan Chase Launches DocLLM for Multimodal Document Understanding AI NEWS

Home
AInews
JPMorgan Chase Launches DocLLM for Multimodal Document Understanding

JPMorgan Chase Launches DocLLM for Multimodal Document Understanding

2024-01-04

Morgan Stanley has launched DocLLM, a generative language model designed for multimodal document understanding. DocLLM is a lightweight extension of large language models (LLMs) that stands out for its ability to analyze enterprise documents with complex semantics, including tables, invoices, reports, and contracts, which have complex semantics at the intersection of text and spatial patterns.

Unlike existing multimodal LLMs, DocLLM strategically avoids expensive image encoders and focuses on incorporating spatial layout structures using bounding box information. The model introduces a decoupled spatial attention mechanism by decomposing the attention mechanism in classical transformers into a set of independent matrices.

DocLLM tackles the challenges of irregular layouts and heterogeneous content in visual documents by adopting a pretraining objective that focuses on learning to fill text segments.

The model features a decoupled spatial attention mechanism that facilitates cross-alignment between text and layout modalities, as well as a pretraining objective that excels at handling irregular layouts effectively.

To pretrain DocLLM, data was collected from two main sources: the IIT-CDIP Test Collection 1.0 and DocBank. The former contains over 5 million documents related to legal litigation against the tobacco industry in the 1990s, while the latter consists of 500,000 documents, each with a different layout.

Extensive evaluations on various document intelligence tasks have demonstrated that DocLLM outperforms existing state-of-the-art LLMs. The model surpasses equivalent models on 14 out of 16 known datasets and exhibits strong generalization capabilities on 4 out of 5 unseen datasets.

Looking ahead, Morgan Stanley has expressed its commitment to further enhancing the capabilities of DocLLM by incorporating visual elements in a lightweight manner.

LockedIn AI

LockedIn AI - AI job interview assistant

Interviewer AI

Interviewer AI - AI video interviews streamline talent screening process

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

Recall

Recall - AI summarizer for streamlined knowledge management

RECENT AI TOOLS

Zeroheight

LockedIn AI

Interviewer AI

Jules

Final Round AI

RECENT AI NEWS

Apple Confirms Launch of Next-Gen AI Assistant with iOS 26

Daniel Gross, Former CEO of Safety Superintelligence, Joins Meta's New AI Lab

Google Launches New Veo 3 Video Generation Model Globally

Meta's New Strategy: Enhancing User Engagement via Proactive Messaging Chatbots

Perplexity AI Launches New "Max" Subscription Service with Monthly Fee of $200

Sam Altman Criticizes Meta's Hiring Strategy as 'Unpalatable,' Calls OpenAI Still Mission-Driven

ChatGPT's News Site Recommendations Rising, but Not Enough to Offset Search Traffic Decline

Google Releases Urgent Chrome Fix for Zero-Day Vulnerability — Users Advised to Update Immediately

RECENT AI TOOLS