TY - GEN
T1 - Hybrid Transformer-CNN-Based Attention in Video Turbulence Mitigation (HATM)
AU - Kiasari, Mohammad Ahangar
AU - Muhammad, Khan
AU - Bakshi, Sambit
AU - Lee, Ik Hyun
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - This study introduces a hybrid deep learning framework for turbulence mitigation (HATM) in videos, integrating a transformer-based followed by CNN-based attention modules. Due to the computational demands associated with transformers, we propose a simple technique within the transformer module to enhance computational efficiency. Additionally, to better exploit spatial and channel information, we introduce a CNN-attention module which captures global and local inter- and intra-frame dependencies. The overall structure of the model follows U-net, while the skip connections are replaced by our attention blocks to further explore local, spatial, and temporal dependencies. Our model is trained on a simulated turbulence dataset and evaluated on both simulated and real-world datasets to gauge its generalization performance. The effectiveness of each component within our model is also evaluated through ablation studies. Experimental outputs show that our model improves PSNR and SSIM scores, and notably enhances the reconstruction of text images, making the restored text images more readable and cleaner. Overall, our HATM framework represents an advancement towards addressing turbulence distortion in video sequences, showcasing improvements both qualitatively and quantitatively, and offering promising solutions for various applications requiring enhanced video content restoration and mitigation of turbulence-induced artifacts.
AB - This study introduces a hybrid deep learning framework for turbulence mitigation (HATM) in videos, integrating a transformer-based followed by CNN-based attention modules. Due to the computational demands associated with transformers, we propose a simple technique within the transformer module to enhance computational efficiency. Additionally, to better exploit spatial and channel information, we introduce a CNN-attention module which captures global and local inter- and intra-frame dependencies. The overall structure of the model follows U-net, while the skip connections are replaced by our attention blocks to further explore local, spatial, and temporal dependencies. Our model is trained on a simulated turbulence dataset and evaluated on both simulated and real-world datasets to gauge its generalization performance. The effectiveness of each component within our model is also evaluated through ablation studies. Experimental outputs show that our model improves PSNR and SSIM scores, and notably enhances the reconstruction of text images, making the restored text images more readable and cleaner. Overall, our HATM framework represents an advancement towards addressing turbulence distortion in video sequences, showcasing improvements both qualitatively and quantitatively, and offering promising solutions for various applications requiring enhanced video content restoration and mitigation of turbulence-induced artifacts.
KW - Hybrid Attention
KW - Transformer
KW - Video turbulence mitigation
UR - https://www.scopus.com/pages/publications/85212277657
U2 - 10.1007/978-3-031-78305-0_16
DO - 10.1007/978-3-031-78305-0_16
M3 - Conference contribution
AN - SCOPUS:85212277657
SN - 9783031783043
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 242
EP - 256
BT - Pattern Recognition - 27th International Conference, ICPR 2024, Proceedings
A2 - Antonacopoulos, Apostolos
A2 - Chaudhuri, Subhasis
A2 - Chellappa, Rama
A2 - Liu, Cheng-Lin
A2 - Bhattacharya, Saumik
A2 - Pal, Umapada
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Pattern Recognition, ICPR 2024
Y2 - 1 December 2024 through 5 December 2024
ER -