TY - GEN
T1 - RT-BEV
T2 - 45th IEEE Real-Time Systems Symposium, RTSS 2024
AU - Liu, Liangkai
AU - Lee, Jinkyu
AU - Shin, Kang G.
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Vision-centric Bird's Eye View (BEV) perception has become popular for enhancing the situational awareness of autonomous vehicles (AVs). It uses multiple cameras to create a 360° view, capturing essential details for the vehicle's navigation and decision-making. However, reducing the end-to-end (e2e) BEV perception latency without sacrificing accuracy is challenging due to the lack of co-optimization of message communication and object detection. Prior work either compresses the dense detection model to reduce computation which can hurt accuracy and assume images are well synchronized, or focuses on worstcase communication delay without considering the characteristics of object detection. To meet this challenge, we propose RT-BEV, the first frame-work designed to co-optimize message communication and object detection to improve real-time e2e BEV perception without sacrificing accuracy. The main insight of RT-BEV lies in generating traffic environment- and context-aware Regions of Interest (ROIs) for AV safety, combined with ROI-aware message communication. RT-BEV features an ROI-aware Camera Synchronizer that adaptively determines message groups and allowable delays based on ROIs' coverage. We also develop a ROIs Generator to model context-aware ROIs and a Feature Split & Merge component to handle variable-sized ROIs effectively. Furthermore, a Time Predictor forecasts timelines for processing ROIs, and a Coordinator jointly optimizes latency and accuracy for the entire e2e pipeline. We have implemented RT-BEV in a ROS-based BEV perception pipeline and evaluated it with the nuScenes dataset. RT-BEV is shown to significantly enhances real-time BEV perception, reducing average e2e latency by 1.5 ×, maintaining high mean Average Precision (mAP), doubling the number of processed frames, and improving the frame efficiency score (FES) by 2.9 × compared to the existing approaches. Moreover, RT-BEV is shown to reduce the worst-case e2e latency by 19.3 ×.
AB - Vision-centric Bird's Eye View (BEV) perception has become popular for enhancing the situational awareness of autonomous vehicles (AVs). It uses multiple cameras to create a 360° view, capturing essential details for the vehicle's navigation and decision-making. However, reducing the end-to-end (e2e) BEV perception latency without sacrificing accuracy is challenging due to the lack of co-optimization of message communication and object detection. Prior work either compresses the dense detection model to reduce computation which can hurt accuracy and assume images are well synchronized, or focuses on worstcase communication delay without considering the characteristics of object detection. To meet this challenge, we propose RT-BEV, the first frame-work designed to co-optimize message communication and object detection to improve real-time e2e BEV perception without sacrificing accuracy. The main insight of RT-BEV lies in generating traffic environment- and context-aware Regions of Interest (ROIs) for AV safety, combined with ROI-aware message communication. RT-BEV features an ROI-aware Camera Synchronizer that adaptively determines message groups and allowable delays based on ROIs' coverage. We also develop a ROIs Generator to model context-aware ROIs and a Feature Split & Merge component to handle variable-sized ROIs effectively. Furthermore, a Time Predictor forecasts timelines for processing ROIs, and a Coordinator jointly optimizes latency and accuracy for the entire e2e pipeline. We have implemented RT-BEV in a ROS-based BEV perception pipeline and evaluated it with the nuScenes dataset. RT-BEV is shown to significantly enhances real-time BEV perception, reducing average e2e latency by 1.5 ×, maintaining high mean Average Precision (mAP), doubling the number of processed frames, and improving the frame efficiency score (FES) by 2.9 × compared to the existing approaches. Moreover, RT-BEV is shown to reduce the worst-case e2e latency by 19.3 ×.
KW - BEV perception
KW - region of interests (ROIs)
UR - https://www.scopus.com/pages/publications/85217621150
U2 - 10.1109/RTSS62706.2024.00031
DO - 10.1109/RTSS62706.2024.00031
M3 - Conference contribution
AN - SCOPUS:85217621150
T3 - Proceedings - Real-Time Systems Symposium
SP - 267
EP - 279
BT - Proceedings - 2024 IEEE Real-Time Systems Symposium, RTSS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 10 December 2024 through 13 December 2024
ER -