TY - GEN
T1 - IVIST
T2 - 28th International Conference on MultiMedia Modeling, MMM 2022
AU - Lee, Sangmin
AU - Park, Sungjune
AU - Ro, Yong Man
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - This paper presents the details of the proposed video retrieval tool, named Interactive VIdeo Search Tool (IVIST) for the Video Browser Showdown (VBS) 2022. In order to retrieve desired videos from a multimedia database, it is necessary to match queries from humans and video shots in the database effectively. To boost such matching relationship, we propose a multi-modal-based retrieval scheme that can fully utilize various modal features of the multimedia data and synthetically consider the matching relationships between modalities. The proposed IVIST maps human-made queries (e.g., language) and features (e.g., visual and sound) from the database into a multi-modal matching latent space through deep neural networks. Based on the latent space, videos with high similarity to the query feature are suggested as candidate shots. Prior knowledge-based filtering can be further applied to refine the results of candidate shots. Moreover, the user interface of the tool is devised in a user-friendly way for interactive video searching.
AB - This paper presents the details of the proposed video retrieval tool, named Interactive VIdeo Search Tool (IVIST) for the Video Browser Showdown (VBS) 2022. In order to retrieve desired videos from a multimedia database, it is necessary to match queries from humans and video shots in the database effectively. To boost such matching relationship, we propose a multi-modal-based retrieval scheme that can fully utilize various modal features of the multimedia data and synthetically consider the matching relationships between modalities. The proposed IVIST maps human-made queries (e.g., language) and features (e.g., visual and sound) from the database into a multi-modal matching latent space through deep neural networks. Based on the latent space, videos with high similarity to the query feature are suggested as candidate shots. Prior knowledge-based filtering can be further applied to refine the results of candidate shots. Moreover, the user interface of the tool is devised in a user-friendly way for interactive video searching.
KW - Interactive video retrieval
KW - Multi-modal matching
KW - Video Browser Showdown
UR - https://www.scopus.com/pages/publications/85127204580
U2 - 10.1007/978-3-030-98355-0_49
DO - 10.1007/978-3-030-98355-0_49
M3 - Conference contribution
AN - SCOPUS:85127204580
SN - 9783030983543
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 524
EP - 529
BT - MultiMedia Modeling - 28th International Conference, MMM 2022, Proceedings
A2 - Þór Jónsson, Björn
A2 - Gurrin, Cathal
A2 - Tran, Minh-Triet
A2 - Dang-Nguyen, Duc-Tien
A2 - Hu, Anita Min-Chun
A2 - Huynh Thi Thanh, Binh
A2 - Huet, Benoit
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 June 2022 through 10 June 2022
ER -