Today, Google unveiled DolphinGemma, an open-source AI model designed to decode dolphin communication by analyzing their clicks, whistles, and burst pulses.
This model, developed in collaboration with Georgia Tech and the Wild Dolphin Project (WDP), learns the structure of dolphin vocalizations and can generate sequences resembling dolphin sounds.
This breakthrough could help determine whether dolphin communication reaches the level of language.
DolphinGemma was trained using data from the world’s longest-running underwater dolphin research project, leveraging decades of meticulously labeled audio and video data collected by WDP since 1985.
The project studies Atlantic spotted dolphins in the Bahamas across generations through a non-invasive approach they call "in their world, on their terms."
"By identifying recurring sound patterns, clusters, and reliable sequences, the model can assist researchers in uncovering hidden structures and potential meanings in dolphin natural communication—a task that previously required immense manual effort," Google stated in its announcement.
The AI model contains approximately 400 million parameters, compact enough to run on Pixel phones used by researchers in the field. It uses Google's SoundStream tokenizer to process dolphin sounds and predict subsequent sounds in a sequence, much like how human language models predict the next word in a sentence.
DolphinGemma does not operate alone. It works alongside the CHAT (Cetacean Hearing Augmented Telemetry) system, which associates synthetic whistles with objects dolphins favor, such as sargassum, seagrass, or scarves, potentially establishing a shared interactive vocabulary.
"Ultimately, these patterns, combined with synthetic sounds created by researchers to refer to objects dolphins enjoy playing with, may establish a shared interactive communication vocabulary with dolphins," Google explained.
Field researchers currently use Pixel 6 phones for real-time analysis of dolphin sounds.
The team plans to upgrade to Pixel 9 devices during the 2025 summer research season, integrating speaker and microphone functions while running deep learning models and template-matching algorithms.
The shift toward smartphone technology significantly reduces the need for custom hardware, offering a key advantage for marine fieldwork. DolphinGemma's predictive capabilities could help researchers anticipate and identify potential imitators in vocalization sequences earlier, making interactions smoother.
Deciphering the Indecipherable
DolphinGemma joins several AI projects aimed at cracking the code of animal communication.
The Earth Species Project (ESP), a nonprofit organization, recently developed NatureLM, an audio language model capable of identifying animal species, approximating age, and determining whether sounds indicate distress or play—not true language, but still a way to establish rudimentary communication.
Trained on a mix of human speech, environmental sounds, and animal vocalizations, the model has shown promising results even with species it has not encountered before.
The CETI Project is another significant effort in this field.
Led by researchers including Michael Bronstein from Imperial College London, it focuses on sperm whale communication, analyzing the complex click patterns they use for long-distance interaction.
The team has identified 143 click combinations that may form a phoneme-like alphabet, which they are now studying using deep neural networks and natural language processing techniques.
While these projects focus on decoding animal sounds, researchers at New York University draw inspiration from infant development for AI learning.
Their Child’s View for Contrastive Learning (CVCL) model learns language by viewing the world through footage captured by head-mounted cameras worn by infants aged six months to two years.
The NYU team found that their AI efficiently learns from natural data, similar to how human infants do, contrasting sharply with traditional AI models requiring trillions of words for training.
Google plans to share an updated version of DolphinGemma this summer, potentially expanding its use beyond Atlantic spotted dolphins. However, the model may require fine-tuning for vocalizations of different species.
WDP broadly focuses on associating dolphin sounds with specific behaviors, including signature whistles mothers and calves use to reunite, burst-pulse "screams" during conflicts, and clicking "buzzes" used in mating or chasing sharks.
"We're no longer just listening," Google noted. "We’re beginning to understand patterns within the sounds, paving the way for a future where the communication gap between humans and dolphins may narrow."