Embodied AI-Enhanced Vehicular Networks: An Integrated Vision Language Models and Reinforcement Learning Method

Ruichen Zhang, Changyuan Zhao, Hongyang Du, Dusit Niyato, Jiacheng Wang, Suttinee Sawadsitang, Xuemin Shen, Dong In Kim

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

This paper investigates adaptive transmission strategies in embodied AI-enhanced vehicular networks by integrating vision language models (VLMs) for semantic information extraction and deep reinforcement learning (DRL) for decision-making. The proposed framework aims to optimize both data transmission efficiency and decision accuracy by formulating an optimization problem that incorporates the Weber-Fechner law, serving as a metric for balancing bandwidth utilization and quality of experience (QoE). Specifically, we employ the large language and vision assistant (LLAVA) model to extract critical semantic information from raw image data captured by embodied AI agents (i.e., vehicles), reducing transmission data size by approximately more than 90% while retaining essential content for vehicular communication and decision-making. In the dynamic vehicular environment, we employ a generalized advantage estimation-based proximal policy optimization (GAE-PPO) method to stabilize decision-making under uncertainty. Simulation results show that attention maps from LLAVA highlight the model’s focus on relevant image regions, enhancing semantic representation accuracy. Additionally, our proposed transmission strategy improves QoE by up to 36% compared to DDPG and accelerates convergence by reducing required steps by up to 47% compared to pure PPO. Further analysis indicates that adapting semantic symbol length provides an effective trade-off between transmission quality and bandwidth, achieving up to a 61.4% improvement in QoE when scaling from 4 to 8 vehicles.

Original languageEnglish
Pages (from-to)11494-11510
Number of pages17
JournalIEEE Transactions on Mobile Computing
Volume24
Issue number11
DOIs
StatePublished - 2025
Externally publishedYes

Keywords

  • Embodied AI
  • LLAVA
  • LLM
  • PPO
  • QoE
  • vehicular networks
  • VLM

Fingerprint

Dive into the research topics of 'Embodied AI-Enhanced Vehicular Networks: An Integrated Vision Language Models and Reinforcement Learning Method'. Together they form a unique fingerprint.

Cite this