Resemble AI Upgrades Detection Model: Detect-2B Accurately Identifies Fake Audio

2024-07-01

Resemble AI, a voice cloning company, has released an upgraded version of its deepfake detection model called Detect-2B, which boasts an accuracy rate of approximately 94%. Detect-2B utilizes a series of pre-trained sub-models and fine-tuning techniques specifically designed to examine audio clips and determine whether they are AI-generated. "Detect-2B has made significant progress in terms of model architecture, training data, and overall performance based on our original Detect model. The model has been evaluated on a large dataset of real and fake audio clips, demonstrating impressive performance," stated Resemble in an official blog post. According to Resemble, Detect-2B's sub-models consist of frozen audio representation models with key layer insertion adaptation modules. These adaptation modules focus on identifying subtle differences between real and fake audio, such as unintentional sound traces left during recording. Most AI-generated audio clips sound "too perfect." Detect-2B can predict the AI-generated components in the audio without the need to retrain the model for every new clip. These sub-models have also undergone extensive training on large datasets. Detect-2B aggregates its prediction scores and compares them with "carefully adjusted thresholds" to determine the authenticity of the recording. Resemble claims that Detect-2B has a fast training speed and does not require significant computational power during deployment, thanks to its unique design. "A random architecture enables more flexible audio signal processing," emphasized Resemble. The model's architecture is based on Mamba-SSM or state-space models, which do not rely on static data or repetitive patterns. Instead, it employs random or stochastic models that have better responsiveness to different variables. Resemble stated that this architecture is particularly suitable for audio detection as it can capture different dynamics within audio clips and adapt to changes between audio signal states, even in cases of poor recording quality. To evaluate Detect-2B's performance, Resemble subjected it to a test set containing unseen speakers, deepfake audio, and multiple languages. The company claims that the model correctly detected deepfake audio in six different languages with at least 93% accuracy. Resemble launched its AI voice platform, Rapid Voice Cloning, in April. Detect-2B will be available through an API and can be integrated into various applications. "The importance of identifying deepfakes has become increasingly crucial as the 2024 US presidential election approaches. AI-generated voices can exacerbate the risks of misleading voters and spreading misinformation. Concerns about AI deepfakes have eroded public trust in brands, whether it's impersonating politicians, celebrities in songs, or simply using AI to present certain things," stated Resemble. Tools like Detect-2B can significantly aid in identifying and proving the falseness of deepfake content before it reaches the public eye. However, Resemble is not the only company dedicated to detecting AI clones. For example, McAfee launched Project Mockingbird in January to detect AI audio, while Meta is developing a method to add watermarks to AI-generated audio. "But our work is far from over. As the capabilities of generative AI continue to advance, our detection capabilities must also improve in sync. We have planned several exciting research directions to further optimize Detect-2B, focusing on representation learning, advanced model architecture, and data expansion," stated Resemble.