TY - JOUR
T1 - Timing guarantees for inference of AI models in embedded systems
AU - Lee, Seunghoon
AU - Kang, Woosung
AU - Bertogna, Marko
AU - Chwa, Hoon Sung
AU - Lee, Jinkyu
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
PY - 2025/6
Y1 - 2025/6
N2 - Machine learning (ML) is increasingly being integrated into real-time embedded systems, enabling intelligent decision-making in applications such as autonomous driving and industrial automation. However, ensuring predictable execution of deep neural network (DNN) inference remains a major challenge, as real-time systems must meet strict timing constraints to guarantee safety and reliability. This paper identifies key challenges in achieving real-time AI inference in embedded systems, including limited memory capacity, high energy consumption, efficient multi-DNN scheduling, and heterogeneous resource management. To address these challenges, we emphasize the need for advanced scheduling algorithms to efficiently allocate heterogeneous computing resources across multiple DNNs, hierarchical memory management to reduce memory bottlenecks, and real-time neural architecture search and optimization techniques to enhance AI model performance under strict timing constraints. Furthermore, we discuss future research directions aimed at improving real-time AI execution, including time-predictable scheduling frameworks to ensure consistent inference latency, cross-device AI workload management to optimize resource utilization across heterogeneous processors, and benchmarking methodologies to systematically evaluate performance, timing guarantees, and energy efficiency in real-time AI systems. Advancing these research areas will enhance the reliability, efficiency, and scalability of AI-driven embedded systems, bridging the gap between ML advancements and real-time system requirements.
AB - Machine learning (ML) is increasingly being integrated into real-time embedded systems, enabling intelligent decision-making in applications such as autonomous driving and industrial automation. However, ensuring predictable execution of deep neural network (DNN) inference remains a major challenge, as real-time systems must meet strict timing constraints to guarantee safety and reliability. This paper identifies key challenges in achieving real-time AI inference in embedded systems, including limited memory capacity, high energy consumption, efficient multi-DNN scheduling, and heterogeneous resource management. To address these challenges, we emphasize the need for advanced scheduling algorithms to efficiently allocate heterogeneous computing resources across multiple DNNs, hierarchical memory management to reduce memory bottlenecks, and real-time neural architecture search and optimization techniques to enhance AI model performance under strict timing constraints. Furthermore, we discuss future research directions aimed at improving real-time AI execution, including time-predictable scheduling frameworks to ensure consistent inference latency, cross-device AI workload management to optimize resource utilization across heterogeneous processors, and benchmarking methodologies to systematically evaluate performance, timing guarantees, and energy efficiency in real-time AI systems. Advancing these research areas will enhance the reliability, efficiency, and scalability of AI-driven embedded systems, bridging the gap between ML advancements and real-time system requirements.
KW - Embedded systems
KW - Inference
KW - Machine learning
KW - Timing guarantees
UR - https://www.scopus.com/pages/publications/105008410537
U2 - 10.1007/s11241-025-09445-9
DO - 10.1007/s11241-025-09445-9
M3 - Article
AN - SCOPUS:105008410537
SN - 0922-6443
VL - 61
SP - 259
EP - 267
JO - Real-Time Systems
JF - Real-Time Systems
IS - 2
ER -