TY - GEN
T1 - Efficient Recurrent Optical Flow Refinement Using Mamba and Multi-Scale Loss
AU - Park, Minseon
AU - Shin, Jitae
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Optical flow estimation plays a critical role in various computer vision tasks, including video understanding and autonomous driving. Recent models such as RAFT and FlowFormer refine flow predictions iteratively using recurrent modules based on Convolutional Gated Recurrent Unit (Con-vGRU). However, ConvGRU has limitations in modeling long-range dependencies and requires a large number of parameters for decoder refinement. In this paper, we propose replacing the ConvGRU module in FlowFormer's decoder with Mamba, a state space sequence model optimized for efficient and expressive temporal modeling. Additionally, we introduce a multi-scale loss structure that incorporates low-resolution supervision to encourage global motion consistency and improve training stability. Our method maintains the original input structure of FlowFormer while improving both temporal modeling and multi-scale learning. Experiments on the KITTI benchmark show that our Mamba-based decoder achieves significant improvements over the original FlowFormer, reducing average end-point-error (AEPE) by 5.81% and F1-All by 13.41%, while also reducing decoder parameters by 32.65% and FLOPs by 22.88%. These results demonstrate that Mamba, combined with multi-scale loss, is a strong and lightweight alternative to ConvGRU for optical flow refinement.
AB - Optical flow estimation plays a critical role in various computer vision tasks, including video understanding and autonomous driving. Recent models such as RAFT and FlowFormer refine flow predictions iteratively using recurrent modules based on Convolutional Gated Recurrent Unit (Con-vGRU). However, ConvGRU has limitations in modeling long-range dependencies and requires a large number of parameters for decoder refinement. In this paper, we propose replacing the ConvGRU module in FlowFormer's decoder with Mamba, a state space sequence model optimized for efficient and expressive temporal modeling. Additionally, we introduce a multi-scale loss structure that incorporates low-resolution supervision to encourage global motion consistency and improve training stability. Our method maintains the original input structure of FlowFormer while improving both temporal modeling and multi-scale learning. Experiments on the KITTI benchmark show that our Mamba-based decoder achieves significant improvements over the original FlowFormer, reducing average end-point-error (AEPE) by 5.81% and F1-All by 13.41%, while also reducing decoder parameters by 32.65% and FLOPs by 22.88%. These results demonstrate that Mamba, combined with multi-scale loss, is a strong and lightweight alternative to ConvGRU for optical flow refinement.
KW - mamba
KW - multi-scale loss
KW - optical flow
UR - https://www.scopus.com/pages/publications/105016413034
U2 - 10.1109/ITC-CSCC66376.2025.11137675
DO - 10.1109/ITC-CSCC66376.2025.11137675
M3 - Conference contribution
AN - SCOPUS:105016413034
T3 - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
BT - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
Y2 - 7 July 2025 through 10 July 2025
ER -