ByteDance AI Research Introduces StemGen: A New Music Generation Deep Learning Model AI NEWS

Home
AInews
ByteDance AI Research Introduces StemGen: A New Music Generation Deep Learning Model

ByteDance AI Research Introduces StemGen: A New Music Generation Deep Learning Model

2023-12-19

Music generation is a method of creating music using deep learning models, which can learn the patterns and structures of existing music. Common deep learning models include RNN, LSTM, and transformers. This article explores a novel method of music generation that uses a non-autoregressive transformer model to generate music audio based on the musical context. This approach focuses on listening and responding, unlike traditional abstract-condition-based methods. The article also introduces the latest advancements in this field and improves the model architecture.

Researchers from ByteDance's SAMI team proposed a non-autoregressive transformer model that can listen and respond to musical context, using the publicly available MusicGen model's encoder checkpoint. They evaluated the model using standard metrics and music information retrieval descriptor methods, including FAD and MIRDD. They demonstrated the audio quality and music alignment ability of the model, validated through objective metrics and subjective MOS testing.

This study summarizes the latest advancements in end-to-end music audio generation, drawing inspiration from techniques used in image and language processing. It highlights the alignment issues in music composition and criticizes traditional abstract-condition-based methods. It proposes a new training method that uses a non-autoregressive transformer model capable of responding to musical context. It utilizes two conditioning sources and defines the problem as conditional generation. The model is evaluated using objective metrics, music information retrieval descriptors, and auditory tests.

This method uses a non-autoregressive transformer model to generate music by integrating a residual vector quantizer into an audio encoding model. It combines multiple audio channels by concatenating embeddings into a sequence element. It employs a masking process and uses an unclassified guidance in the token sampling process to improve music alignment. The model's performance is evaluated using FAD and MIRDD. Various metrics are used to generate and compare output samples with real stems.

This study evaluates the generated model using standard metrics and music information retrieval descriptor methods, including FAD and MIRDD. The comparison with real stems shows that the model achieves audio quality comparable to state-of-the-art text-conditioned models and exhibits strong music consistency. MOS testing conducted by participants involved in music training further demonstrates the model's ability to generate plausible music results. MIRDD, which evaluates the distribution alignment between generated and real stems, provides a measure of music consistency and alignment.

In summary, the main contributions of this study are as follows:

Proposing a new training method for generating models capable of responding to musical context.
Introducing a non-autoregressive transformer model with two novel improvements: multi-source unclassified guidance and an iterative decoding process for causal bias.
Training the model on open-source and proprietary datasets, achieving state-of-the-art audio quality.
Validating the audio quality of the model using standard metrics and music information retrieval descriptors.
Verifying the model's ability to generate realistic music results through MOS testing.

Zeroheight

Zeroheight - Centralized design system documentation tool

LockedIn AI

LockedIn AI - AI job interview assistant

Interviewer AI

Interviewer AI - AI video interviews streamline talent screening process

Jules

Jules - AI coding assistant with automatic pull requests

Final Round AI

Final Round AI - Automated job interview preparation and assistance

Sapia

Sapia - AI hiring agent for fair recruitment processes

Magic Motion

Magic Motion - AI transforms text into engaging 3D animations

RECENT AI TOOLS

Completely AI

Zeroheight

LockedIn AI

Interviewer AI

Jules

RECENT AI NEWS

Apple Confirms Launch of Next-Gen AI Assistant with iOS 26

Daniel Gross, Former CEO of Safety Superintelligence, Joins Meta's New AI Lab

Google Launches New Veo 3 Video Generation Model Globally

Meta's New Strategy: Enhancing User Engagement via Proactive Messaging Chatbots

Perplexity AI Launches New "Max" Subscription Service with Monthly Fee of $200

Sam Altman Criticizes Meta's Hiring Strategy as 'Unpalatable,' Calls OpenAI Still Mission-Driven

ChatGPT's News Site Recommendations Rising, but Not Enough to Offset Search Traffic Decline

Google Releases Urgent Chrome Fix for Zero-Day Vulnerability — Users Advised to Update Immediately

RECENT AI TOOLS