TY - GEN
T1 - Robust Training Framework via Multi-Stage Feature Rectification
AU - Bae, Jungwoo
AU - Shin, Jitae
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Learning robust representations in vision models is essential for reliable performance under diverse real-world conditions, such as weather-induced noise or distribution shifts. In this paper, we propose a novel robust training framework that encourages feature-level rectification directly within the backbone network during training. Unlike existing approaches that rely on external modules or post-processing, our method introduces minimal overhead while enhancing the inherent robustness of the encoder itself. To achieve this, we construct a paired dataset of clean and task-specific noisy images, and apply three complementary training strategies: (1) a reconstruction decoder to align the pixel-space outputs of clean and noisy inputs; (2) contrastive learning to enforce latent similarity between the two views; and (3) a quantization module that constrains latent features to discrete clean representations using vector quantization with a rotation trick. We validate our framework on the KITTI-360 dataset under various weather perturbations, showing significant performance gains in object detection without degrading clean performance. Our approach is lightweight, modular, and applicable to any multi-scale feature-extracting backbone, making it ideal for safety-critical applications such as autonomous driving.
AB - Learning robust representations in vision models is essential for reliable performance under diverse real-world conditions, such as weather-induced noise or distribution shifts. In this paper, we propose a novel robust training framework that encourages feature-level rectification directly within the backbone network during training. Unlike existing approaches that rely on external modules or post-processing, our method introduces minimal overhead while enhancing the inherent robustness of the encoder itself. To achieve this, we construct a paired dataset of clean and task-specific noisy images, and apply three complementary training strategies: (1) a reconstruction decoder to align the pixel-space outputs of clean and noisy inputs; (2) contrastive learning to enforce latent similarity between the two views; and (3) a quantization module that constrains latent features to discrete clean representations using vector quantization with a rotation trick. We validate our framework on the KITTI-360 dataset under various weather perturbations, showing significant performance gains in object detection without degrading clean performance. Our approach is lightweight, modular, and applicable to any multi-scale feature-extracting backbone, making it ideal for safety-critical applications such as autonomous driving.
KW - Feature Rectification
KW - Object Detection
KW - Robust training
UR - https://www.scopus.com/pages/publications/105016329651
U2 - 10.1109/ITC-CSCC66376.2025.11137732
DO - 10.1109/ITC-CSCC66376.2025.11137732
M3 - Conference contribution
AN - SCOPUS:105016329651
T3 - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
BT - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Technical Conference on Circuits/Systems, Computers, and Communications, ITC-CSCC 2025
Y2 - 7 July 2025 through 10 July 2025
ER -