A study from Purdue University indicates that autonomous vehicles (AVs) can leverage AI chatbots based on large language models, such as ChatGPT, to more accurately comprehend and execute passenger commands. This research is set to be showcased on September 25 at the 27th IEEE International Conference on Intelligent Transportation Systems and may represent one of the first experiments testing how real AVs utilize large language models to interpret passenger instructions and drive accordingly.
Assistant Professor Ziran Wang from Purdue University's College of Civil and Construction Engineering led the research. He emphasizes that achieving full vehicle autonomy requires the ability to understand all passenger directives, including implied intentions. For instance, if a passenger says, "I'm in a hurry," the AV should automatically select the most efficient route to the destination, much like a taxi driver would swiftly recognize the passenger's urgent needs.
While current AVs offer communication features with passengers, they typically require very explicit commands, creating a gap compared to natural human interactions. In contrast, large language models can understand a wide range of expressions more naturally by processing vast amounts of text data and continuously learning, enabling more intuitive responses.
In this study, large language models do not directly control the AV's driving but serve as auxiliary tools to enhance the driving experience using the AV's existing capabilities. Researchers first trained models like ChatGPT to respond to various commands, from direct instructions (e.g., "Please go faster") to indirect expressions (e.g., "I'm feeling a bit carsick"). They then integrated these models into the AV system, setting a range of parameters such as traffic rules, road conditions, weather, and other information detected by the vehicle's sensors.
During the experiments, the trained large language models were deployed in the cloud and connected to a test vehicle meeting SAE International's Level 4 autonomous driving standards. When the vehicle's voice recognition system detected a passenger's command, the cloud-based language model analyzed the instruction based on preset parameters and sent operational commands to the vehicle's drive-by-wire system, controlling the throttle, brakes, gears, and steering.
Additionally, the research team tested a memory module capable of storing passengers' historical preference data, assisting the language model in considering these preferences when responding to commands. The experiments were primarily conducted at a test site in Columbus, Indiana, originally an airport runway, providing a safe environment for high-speed driving and handling bi-directional intersections. The study also evaluated the vehicle's parking performance in the Ross-Ade Stadium parking lot at Purdue University.
The results demonstrated that participants experienced lower discomfort with the vehicle's decisions when using AVs assisted by large language models compared to those that did not use such models. Furthermore, the AV outperformed baseline values based on safety and comfort driving standards in responding to various commands, including those not previously learned by the model.