Meta releases a series of new AI models covering audio, text, and watermark technologies.

2024-06-19

Meta's Facebook AI Research (FAIR) team recently announced the release of a series of new AI models and tools aimed at serving researchers. These newly released models and tools mainly involve areas such as audio generation, text-to-vision, and watermarking.

"We hope that by openly sharing our early research results, we can inspire more researchers to participate in the iterative development of AI and promote the progress of AI in a responsible manner," Meta said in a press release.


Audio creation model JASCO and watermarking tool AudioSeal

First, Meta introduced a new AI model called JASCO, which stands for Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.

JASCO can accept different audio inputs, such as chords or beats, to optimize the sound of AI-generated music. According to FAIR researchers' paper, JASCO allows users to adjust various features of the generated music, such as chords, drum sounds, and melodies, through text to perfect the final music composition.

FAIR plans to release the inference code of JASCO as part of the AudioCraft AI audio model library under the MIT license, while the pre-trained models will be provided under the non-commercial Creative Commons license.

In addition, Meta also introduced AudioSeal, a tool specifically designed to add watermarks to AI-generated speech, which is one of the technologies Meta uses to identify AI-produced content.

"AudioSeal is the first audio watermarking technology designed specifically for localizing AI-generated speech, and it can identify the parts generated by AI in longer audio clips," Meta described in the press release.

AudioSeal can accurately detect AI-generated audio in longer audio clips. Meta states that this more accurate localization detection "enables faster and more efficient detection" and increases detection speed by 485 times. Unlike other models, AudioSeal will be released under a commercial license.

Meta encourages researchers to innovate on other image and text models

FAIR will also release two versions of its multimodal text model Chameleon, specifically for research purposes.

The Chameleon 7B and 34B models allow users to apply the models to tasks that require visual and text understanding, such as image description.

However, Meta explicitly stated in the press release that it will not currently release image generation models for Chameleon, only text-related models are available for researchers.

In addition, the company will provide researchers with its multi-token prediction method, which can train language models on multiple future words simultaneously, rather than one at a time. This method will only be provided under a non-commercial and research-only license.