TY - JOUR
T1 - LyFormer
T2 - A context-aware transformer with progressive preprocessing for accurate detection of small, dense components in SMT manufacturing
AU - Jeong, Jongpil
AU - Kim, Jaesung
AU - Park, Jinwoo
AU - Koh, Jeong Seog
AU - Yoon, Taehwi
N1 - Publisher Copyright:
© 2025 The Author(s)
PY - 2025/12
Y1 - 2025/12
N2 - Accurate detection and counting of small electronic components on printed circuit boards (PCBs) are critical for ensuring product quality and operational efficiency in surface mount technology (SMT) assembly lines. In particular, reliable counting of semiconductor components inside reels using X-ray inspection is essential, as counting errors directly impact downstream manufacturing and quality assurance. However, existing YOLO-based detection frameworks, while effective in general contexts, often fail under complex SMT conditions with low contrast, high density, and noisy imagery. To address this limitation, we propose LyFormer, a YOLOv8s-based framework integrating four specialized modules: the Adaptive Multi-level Preprocessing Module (AMPM) for dynamic image preprocessing, the Spatial Relation-aware Image Segmentation Patch (SRISP) for precise localization, the Fine-grained Cue Extraction Module (FCEM) for enhancing subtle texture cues, and the Context-aware Transformer (CaT) for global–local context integration. Unlike conventional approaches such as FPN, Deformable DETR, and SAHI, LyFormer represents a modular backbone specifically designed for the low-contrast, high-density, and noise-prone characteristics of SMT X-ray imagery. Unlike prior improvements to YOLOv8s, LyFormer introduces four modules explicitly derived from SMT X-ray failure modes: AMPM integrates ROI-aware masking with contrast enhancement, going beyond global methods such as histogram equalization or Retinex; SRISP replaces SAHI's tiling with efficient relation-aware patching inside the backbone; FCEM compensates for the sensitivity of IoU and NWD to localization errors by reinforcing fine-grained cues; and CaT jointly leverages global–local context through ROI-biased attention and variable patch sizing, unlike standard Transformer-based detectors. Experiments on real-world SMT reel X-ray datasets show that LyFormer achieves a mean Average Precision ([email protected]) of 0.672, significantly surpassing the YOLOv8s baseline (0.399), while maintaining real-time performance. These results support LyFormer's accuracy, robustness, and practical value for small-object detection and counting in challenging industrial environments.
AB - Accurate detection and counting of small electronic components on printed circuit boards (PCBs) are critical for ensuring product quality and operational efficiency in surface mount technology (SMT) assembly lines. In particular, reliable counting of semiconductor components inside reels using X-ray inspection is essential, as counting errors directly impact downstream manufacturing and quality assurance. However, existing YOLO-based detection frameworks, while effective in general contexts, often fail under complex SMT conditions with low contrast, high density, and noisy imagery. To address this limitation, we propose LyFormer, a YOLOv8s-based framework integrating four specialized modules: the Adaptive Multi-level Preprocessing Module (AMPM) for dynamic image preprocessing, the Spatial Relation-aware Image Segmentation Patch (SRISP) for precise localization, the Fine-grained Cue Extraction Module (FCEM) for enhancing subtle texture cues, and the Context-aware Transformer (CaT) for global–local context integration. Unlike conventional approaches such as FPN, Deformable DETR, and SAHI, LyFormer represents a modular backbone specifically designed for the low-contrast, high-density, and noise-prone characteristics of SMT X-ray imagery. Unlike prior improvements to YOLOv8s, LyFormer introduces four modules explicitly derived from SMT X-ray failure modes: AMPM integrates ROI-aware masking with contrast enhancement, going beyond global methods such as histogram equalization or Retinex; SRISP replaces SAHI's tiling with efficient relation-aware patching inside the backbone; FCEM compensates for the sensitivity of IoU and NWD to localization errors by reinforcing fine-grained cues; and CaT jointly leverages global–local context through ROI-biased attention and variable patch sizing, unlike standard Transformer-based detectors. Experiments on real-world SMT reel X-ray datasets show that LyFormer achieves a mean Average Precision ([email protected]) of 0.672, significantly surpassing the YOLOv8s baseline (0.399), while maintaining real-time performance. These results support LyFormer's accuracy, robustness, and practical value for small-object detection and counting in challenging industrial environments.
KW - Industrial vision
KW - Progressive preprocessing
KW - Semiconductor reel counting
KW - Small object detection
KW - SMT assembly
KW - Transformer
KW - X-ray inspection
KW - YOLOv8
UR - https://www.scopus.com/pages/publications/105017417707
U2 - 10.1016/j.rineng.2025.107413
DO - 10.1016/j.rineng.2025.107413
M3 - Article
AN - SCOPUS:105017417707
SN - 2590-1230
VL - 28
JO - Results in Engineering
JF - Results in Engineering
M1 - 107413
ER -