TY - JOUR
T1 - Conflux LSTMs Network
T2 - A Novel Approach for Multi-View Action Recognition
AU - Ullah, Amin
AU - Muhammad, Khan
AU - Hussain, Tanveer
AU - Baik, Sung Wook
N1 - Publisher Copyright:
© 2021 The Authors
PY - 2021/5/7
Y1 - 2021/5/7
N2 - Multi-view action recognition (MVAR) is an optimal technique to acquire numerous clues from different views data for effective action recognition, however, it is not well explored yet. There exist several challenges to MVAR domain such as divergence in viewpoints, invisible regions, and different scales of appearance in each view require better solutions for real world applications. In this paper, we present a conflux long short-term memory (LSTMs) network to recognize actions from multi-view cameras. The proposed framework has four major steps; 1) frame level feature extraction, 2) its propagation through conflux LSTMs network for view self-reliant patterns learning, 3) view inter-reliant patterns learning and correlation computation, and 4) action classification. First, we extract deep features from a sequence of frames using a pre-trained VGG19 CNN model for each view. Second, we forward the extracted features to conflux LSTMs network to learn the view self-reliant patterns. In the next step, we compute the inter-view correlations using the pairwise dot product from output of the LSTMs network corresponding to different views to learn the view inter-reliant patterns. In the final step, we use flatten layers followed by SoftMax classifier for action recognition. Experimental results over benchmark datasets compared to state-of-the-art report an increase of 3% and 2% on northwestern-UCLA and MCAD datasets, respectively.
AB - Multi-view action recognition (MVAR) is an optimal technique to acquire numerous clues from different views data for effective action recognition, however, it is not well explored yet. There exist several challenges to MVAR domain such as divergence in viewpoints, invisible regions, and different scales of appearance in each view require better solutions for real world applications. In this paper, we present a conflux long short-term memory (LSTMs) network to recognize actions from multi-view cameras. The proposed framework has four major steps; 1) frame level feature extraction, 2) its propagation through conflux LSTMs network for view self-reliant patterns learning, 3) view inter-reliant patterns learning and correlation computation, and 4) action classification. First, we extract deep features from a sequence of frames using a pre-trained VGG19 CNN model for each view. Second, we forward the extracted features to conflux LSTMs network to learn the view self-reliant patterns. In the next step, we compute the inter-view correlations using the pairwise dot product from output of the LSTMs network corresponding to different views to learn the view inter-reliant patterns. In the final step, we use flatten layers followed by SoftMax classifier for action recognition. Experimental results over benchmark datasets compared to state-of-the-art report an increase of 3% and 2% on northwestern-UCLA and MCAD datasets, respectively.
KW - Action recognition
KW - Artificial intelligence
KW - CNN
KW - Deep learning
KW - LSTM
KW - Multi-view action recognition
KW - Multi-view video analytics
KW - Sequence learning
UR - https://www.scopus.com/pages/publications/85101397395
U2 - 10.1016/j.neucom.2019.12.151
DO - 10.1016/j.neucom.2019.12.151
M3 - Article
AN - SCOPUS:85101397395
SN - 0925-2312
VL - 435
SP - 321
EP - 329
JO - Neurocomputing
JF - Neurocomputing
ER -