Classification of Sewer Defects Using Point Clouds Based on a Novel Sewer Vision Transformer With Cross-Modal In-Domain Knowledge

  • Shuju Jing
  • , Xiangyang Li
  • , Daniel Asefa Beyene
  • , Gichun Cha
  • , Seunghee Park

Research output: Contribution to journalArticlepeer-review

Abstract

The high-precision geometric measurement capabilities of sensor-based point clouds provide significant advantages for sewer defect detection. To enhance the classification of valuable yet data-scarce sewer-defect knowledge within the point cloud community, this study proposes a cross-modal framework that combines self-supervised pretraining with supervised fine-tuning. The proposed sewer vision transformer (Sewer-ViT) integrates key-edge sampling, neighborhood dilation learning, dual-domain feature fusion, and inverted bottleneck structures to reinforce defect feature embedding and inductive bias. These features are subsequently processed by a transformer encoder pretrained with 2-D in-domain knowledge, and the latent representations are further optimized through weight fusion within a unified vector space, thereby improving classification performance. The method achieved average precision, recall, and F1 -scores of 75.87%, 76.73%, and 75.44% on the overall test set and 65.09%, 62.47%, and 62.58% on a real-world test set, respectively—surpassing the existing approaches. These results highlight the practical potential of this method for sewer defect detection and point to a promising future for multimodal fusion research.

Original languageEnglish
Pages (from-to)40188-40202
Number of pages15
JournalIEEE Sensors Journal
Volume25
Issue number21
DOIs
StatePublished - 2025

Keywords

  • Cross-modal learning
  • point clouds
  • self-supervised learning (SSL)
  • sewer-defect classification
  • vision transformer (ViT)

Fingerprint

Dive into the research topics of 'Classification of Sewer Defects Using Point Clouds Based on a Novel Sewer Vision Transformer With Cross-Modal In-Domain Knowledge'. Together they form a unique fingerprint.

Cite this