High-level Image Classification by Synergizing Image Captioning with BERT

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Conventional image classification methods mostly aim to classify a single object in an image in which an object often occupies a large area. However, images in social network services (SNS) are more complicated. They usually include multiple objects that have much information, such as people, environments, and actions. In this work, we aim at understanding images from SNS and classifying them to categories such as fashion, traveling, education, beauty, and animals. To improve the classification accuracy in such complicated scenario, in this paper, we propose a new framework for high-level image classification by synergizing the image captioning and the Natural Language Processing (NLP) model. First, we use an image captioning model to understand images, which generates text descriptions about the images. Second, we use a natural language processing model to classify the generated text descriptions from the images. In this way, we can classify the images according to the classification results of the generated text descriptions. Our framework includes two models; one is image captioning model, which we use a TensorFlow based visual attention model with the inception V3 model for pre-processing and extracting the image features. The other model is the NLP model, Bidirectional Encoder Representations from Transformers (BERT). We have built a labeled image dataset from Instagram, a popular SNS platform, to test our framework. Our results show that our proposed method has a promising performance in terms of classification accuracy.

Original languageEnglish
Title of host publicationICTC 2021 - 12th International Conference on ICT Convergence
Subtitle of host publicationBeyond the Pandemic Era with ICT Convergence Innovation
PublisherIEEE Computer Society
Pages1686-1690
Number of pages5
ISBN (Electronic)9781665423830
DOIs
StatePublished - 2021
Event12th International Conference on Information and Communication Technology Convergence, ICTC 2021 - Jeju Island, Korea, Republic of
Duration: 20 Oct 202122 Oct 2021

Publication series

NameInternational Conference on ICT Convergence
Volume2021-October
ISSN (Print)2162-1233
ISSN (Electronic)2162-1241

Conference

Conference12th International Conference on Information and Communication Technology Convergence, ICTC 2021
Country/TerritoryKorea, Republic of
CityJeju Island
Period20/10/2122/10/21

Keywords

  • BERT
  • COCO Dataset
  • High-level Image Classification
  • Image Captioning
  • SNS

Fingerprint

Dive into the research topics of 'High-level Image Classification by Synergizing Image Captioning with BERT'. Together they form a unique fingerprint.

Cite this