TY - GEN
T1 - DynaPP
T2 - 2024 International Joint Conference on Neural Networks, IJCNN 2024
AU - So, Changrok
AU - Woo, Simon S.
AU - Hwan Ko, Jong
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Online video detection becomes more challenging with higher resolution as computational costs increase proportionally with increasing resolution. To address this issue, we present a novel approach, DynaPP, which arranges object candidate regions into a compact form. DynaPP performs resource-intensive whole-image inference only on sparse key frames, employing reduced resolutions for inference on other frames. Additionally, we propose transforming a 1-stage detector into a dynamic resolution model to facilitate frame inference at reduced resolutions. Here, the dynamic resolution model signifies a model capable of inferring all resolutions, distinguishing itself from typical models by not having restricted inferable resolutions. Unlike prior studies introducing new model structures for multi-resolution models, our work demonstrates that slight modifications to existing models can convert them to dynamic resolution models. DynaPP showcases substantial acceleration in video detection across four representative video datasets: AU-AIR (5.5×), UAVDT (3.67×), VisDrone (2.73×), and ImageNet VID (3.69×), while maintaining a mean average precision with a small loss (≤2.2). Furthermore, we observed that our method achieves a detection acceleration of up to 8.84×, depending on the video clip.
AB - Online video detection becomes more challenging with higher resolution as computational costs increase proportionally with increasing resolution. To address this issue, we present a novel approach, DynaPP, which arranges object candidate regions into a compact form. DynaPP performs resource-intensive whole-image inference only on sparse key frames, employing reduced resolutions for inference on other frames. Additionally, we propose transforming a 1-stage detector into a dynamic resolution model to facilitate frame inference at reduced resolutions. Here, the dynamic resolution model signifies a model capable of inferring all resolutions, distinguishing itself from typical models by not having restricted inferable resolutions. Unlike prior studies introducing new model structures for multi-resolution models, our work demonstrates that slight modifications to existing models can convert them to dynamic resolution models. DynaPP showcases substantial acceleration in video detection across four representative video datasets: AU-AIR (5.5×), UAVDT (3.67×), VisDrone (2.73×), and ImageNet VID (3.69×), while maintaining a mean average precision with a small loss (≤2.2). Furthermore, we observed that our method achieves a detection acceleration of up to 8.84×, depending on the video clip.
KW - Acceleration
KW - Convolutional Neural Networks
KW - Deep Neural Networks
KW - Dynamic Resolution
KW - Object Detection
KW - Online Detection
KW - Patch Packing
KW - Video Detection
UR - https://www.scopus.com/pages/publications/85204959700
U2 - 10.1109/IJCNN60899.2024.10649922
DO - 10.1109/IJCNN60899.2024.10649922
M3 - Conference contribution
AN - SCOPUS:85204959700
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2024 International Joint Conference on Neural Networks, IJCNN 2024 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 30 June 2024 through 5 July 2024
ER -