TY - JOUR
T1 - Semantic video segmentation with dynamic keyframe selection and distortion-aware feature rectification
AU - Awan, Mehwish
AU - Shin, Jitae
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/6
Y1 - 2021/6
N2 - The per-frame segmentation methods have a high computational cost, thereby, these methods are insufficient to cope with the fast inference need of semantic video segmentation. To efficaciously reuse the extracted features by feature propagation, in this paper, we present distortion-aware feature rectification and online selection of keyframes for fast and accurate video segmentation. The proposed dynamic keyframe scheduling scheme is based on the extent of temporal variations using reinforcement learning. We employ policy gradient reinforcement strategy to learn policy function for maximizing the expected reward. The policy network has two actions (key and non-key) in the action space. State information is derived from the element-wise difference frame of the current frame and the warped current frame generated by the propagated previous frame. Afterward, an adaptive partial feature rectification with distortion-aware corrections is performed for the warped features of the non-key frames. Precise feature propagation is a critical task to uphold the temporal updates in the video sequence since it enormously affects the accuracy as well as the throughput of the whole video analysis framework. The distorted feature maps are revised with the light-weight feature extractor by the guidance of the distortion map while the correctly propagated features are not influenced. Deep feature flow approach is adopted for feature propagation. We evaluate our scheme on the Cityscapes and CamVid datasets with DeepLabv3 as segmentation network and LiteFlowNet for computing flow fields. Experimental results show that the proposed method outperforms the previous state-of-the-art methods significantly both in terms of accuracy and throughput.
AB - The per-frame segmentation methods have a high computational cost, thereby, these methods are insufficient to cope with the fast inference need of semantic video segmentation. To efficaciously reuse the extracted features by feature propagation, in this paper, we present distortion-aware feature rectification and online selection of keyframes for fast and accurate video segmentation. The proposed dynamic keyframe scheduling scheme is based on the extent of temporal variations using reinforcement learning. We employ policy gradient reinforcement strategy to learn policy function for maximizing the expected reward. The policy network has two actions (key and non-key) in the action space. State information is derived from the element-wise difference frame of the current frame and the warped current frame generated by the propagated previous frame. Afterward, an adaptive partial feature rectification with distortion-aware corrections is performed for the warped features of the non-key frames. Precise feature propagation is a critical task to uphold the temporal updates in the video sequence since it enormously affects the accuracy as well as the throughput of the whole video analysis framework. The distorted feature maps are revised with the light-weight feature extractor by the guidance of the distortion map while the correctly propagated features are not influenced. Deep feature flow approach is adopted for feature propagation. We evaluate our scheme on the Cityscapes and CamVid datasets with DeepLabv3 as segmentation network and LiteFlowNet for computing flow fields. Experimental results show that the proposed method outperforms the previous state-of-the-art methods significantly both in terms of accuracy and throughput.
KW - Deep learning
KW - Distortion-aware feature correction
KW - Dynamic keyframe selection scheme
KW - Feature warping
KW - Policy network
KW - Reinforcement learning
KW - Semantic video segmentation
UR - https://www.scopus.com/pages/publications/85105696541
U2 - 10.1016/j.imavis.2021.104184
DO - 10.1016/j.imavis.2021.104184
M3 - Article
AN - SCOPUS:85105696541
SN - 0262-8856
VL - 110
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 104184
ER -