New York University data scientists have made an interesting discovery, decoding the ability of AI models to gain insights from cute baby babbling. While humans have long been recognized for their special abilities in language acquisition, this research proves that AI can also learn from limited datasets.
"We did conduct this experiment. We trained a neural network (which we call CVCL, it uses contrastive objectives similar to CLIP) using videos captured by a wearable camera, recording everything the child saw and heard from 6 to 25 months old," said Wai Keen Vong, one of the researchers of the study "Foundations of Language Acquisition through a Single Child's Eyes and Ears".
This was an unprecedented opportunity to observe a child's experience, but the data was still limited: only 61 hours (transcribed) of video, about 1% of their waking time," added Vong.
This aligns with Meta AI Chief Yann LeCun's concept of autonomous machine intelligence. The Turing Award winner has long believed that teaching AI systems to observe like children could be the path to more intelligent systems. He predicts that his "world models," similar to how the human brain works, could be the ideal approach for AI systems to achieve intelligence.
Learning from a Child's Experience
Despite the limited data used in this study, it suggests that AI models can effectively learn the associations between vocabulary and referents with dozens to hundreds of examples. It can seamlessly generalize to new visual datasets and demonstrate the ability to achieve multimodal alignment.
"Our findings address a long-standing philosophical and cognitive science debate: what elements are necessary for children to learn vocabulary? Do they (or any learners) need specific language induction biases or innate knowledge to start learning, given their everyday experiences? Or is joint representation and associative memory sufficient? Our research shows that we can acquire more than what is commonly believed through learning alone," added Vong.
Despite the progress made, the current model - Child's View Contrastive Learning (CVCL) - still falls short in terms of vocabulary size and vocabulary learning ability compared to a typical 2-year-old child.
There are several factors contributing to this gap, including CVCL's lack of sensory experiences such as taste, touch, and smell, its passive learning approach compared to a child's active engagement, and the lack of social cognitive abilities.
Unlike children, CVCL cannot perceive desires, goals, or social cues, nor does it understand language as a means to achieve desires.
Child's Play: The Path to Smarter Systems
Observing children in their understanding of the physical world has proven to be invaluable for advancing artificial intelligence. Researchers at Google DeepMind confirmed that developmental psychologists have identified key physical concepts by studying infants' innate knowledge of physics. They designed methods such as violation-of-expectation paradigms to measure these concepts.
Inspired by developmental psychology, the team created PLATO (Physics Learning from Automated Tracking and Object Learning). The model represents the world as evolving objects and predicts based on their interactions.
When trained with simple physical interactions, PLATO surpassed other models lacking object-based representations, highlighting the importance of this framework in intuitive physics learning.
PLATO demonstrated the ability to learn from just 28 hours of visual experience and generalize to new stimuli without the need for retraining. This work emphasizes the potential of child development research in developing AI systems capable of understanding and dealing with the complexity of the physical world.
AI Helping Children Too!
Researchers at the University of California, Los Angeles have achieved another breakthrough innovation with the development of a new AI application called Chatterbaby, which can interpret baby cries and provide insights into the messages infants are trying to convey.
Dr. Ariana Anderson and her team uploaded audio samples of 2,000 baby cries, which were able to predict the reasons for crying with 90% accuracy. They then used AI algorithms to differentiate cries caused by hunger, pain, and stimulation.