Human peripheral vision gives us a unique advantage in observing and recognizing shapes outside our line of sight, even if the details of these shapes are not clear. This ability plays a crucial role in many situations, especially in detecting vehicles approaching from the side while driving. In contrast, artificial intelligence systems have always lacked this peripheral vision capability. However, researchers at MIT have recently made a significant breakthrough by developing a method that can simulate human peripheral vision and apply it to machine learning models.
A specialized image dataset was developed by the research team to train machine learning models to simulate human peripheral vision. The introduction of this technology significantly improves the model's ability to detect objects at the edge of the field of view, although its performance has not yet reached the level of humans.
It is worth noting that, unlike human visual mechanisms, the size of objects and the clutter in the visual scene have relatively little impact on the performance of artificial intelligence models. This finding has sparked researchers' curiosity and exploration of the differences between artificial intelligence and human visual mechanisms.
"Despite training various models and improving their performance, we found that they still cannot fully match humans," said Vasha DuTell, co-author of the study. "This makes us wonder what key elements are missing in these models."
Answering this question is crucial for building machine learning models that are closer to human visual mechanisms. In addition to improving driving safety, these models may also provide technical support for developing displays that are easier for humans to view. Furthermore, a deeper understanding of peripheral vision mechanisms in artificial intelligence models can help researchers predict human behavior more accurately.
"If we can gain a deep understanding of the nature of peripheral vision and successfully model it, it will help us understand which features in a visual scene prompt our eyes to move and gather more information," added Anne Harrington, the lead author of the study.
The research team also includes Mark Hamilton, a graduate student in electrical engineering and computer science, Ayush Tewari, a postdoctoral researcher, Simon Stent, a research manager at Toyota Research Institute, as well as several senior professors and researchers. Their collaboration has led to the presentation of this research at an international conference on learning representations.
"Understanding what humans can see is crucial whenever we interact with machines, whether it's driving vehicles, interacting with robots, or using user interfaces," pointed out Ruth Rosenholtz, another member of the research team. "Peripheral vision plays a central role in this understanding."
To simulate human peripheral vision, the researchers used a technique called texture mapping. This technique simulates the loss of information in the peripheral region of human vision by transforming images. The research team improved this model to make it more flexible in simulating peripheral vision without prior knowledge of human or artificial intelligence gaze direction.
Using this improved technique, the research team generated a large image dataset. These images present more pronounced textures in certain areas to simulate the loss of details at the edge of human vision. They then trained multiple computer vision models using this dataset and compared their performance with humans in object detection tasks.
"We designed a series of experiments to test the effect of peripheral vision in machine learning models," explained Harrington. "We didn't want the models to perform tasks they are not good at."
In the experiments, both humans and models were shown pairs of transformed images, with only one image containing the target object in the peripheral region. Participants were then asked to select the image containing the target object.
"We were surprised by the strong ability of humans to detect peripheral objects," added Harrington. "We tested multiple sets of images, but people were always able to easily identify the target object. We even had to use smaller objects to increase the difficulty."
The research team found that training models from scratch using their dataset significantly improved the models' ability to detect and recognize objects. Even fine-tuning pre-trained models resulted in some performance improvement. However, in every case, the machine's performance still fell short of human levels, especially in detecting objects far from the center of the field of view. Additionally, the performance of the models was not affected by the size of objects or the clutter in the visual scene, which differs significantly from human visual mechanisms.
"This may indicate that the models do not fully utilize contextual information like humans do when performing these detection tasks. Their strategies may be different," explained Harrington.
This research brings new insights to the field of artificial intelligence, suggesting that building machine learning models that are closer to human visual mechanisms requires a deeper understanding and simulation of human peripheral vision. With the continuous development and improvement of this technology, future artificial intelligence systems are expected to play a greater role in areas such as driving assistance and human-computer interaction, bringing more convenience and safety to human life.