Pretrained models are commonly employed to improve finetuning performance in metallic surface defect detection, especially in data-scarce environments. However, models pretrained on ImageNet often underperform due to data distribution gaps and misaligned training objectives. To address this, we propose a novel method called Anomaly-Guided Self-Supervised Pretraining (AGSSP), which pretrains on a large industrial dataset containing 120,000 images. AGSSP adopts a two-stage framework: (1) anomaly map guided backbone pretraining, which integrates domain-specific knowledge into feature learning through anomaly maps, and (2) anomaly box guided detector pretraining, where pseudo-defect boxes derived from anomaly maps act as targets to guide detector training. Anomaly maps are generated using a knowledge enhanced anomaly detection method. Additionally, we present two small-scale, pixel-level labeled metallic surface defect datasets for validation. Extensive experiments demonstrate that AGSSP consistently enhances performance across various settings, achieving up to a 10% improvement in mAP@0.5 and 11.4% in mAP@0.5:0.95 compared to ImageNet-based models.
Dataset composition. Half of the dataset is aggregated from 20 publicly available industrial surface defect datasets, including AITEX, APDDD, BSD Cls, BSData, BTAD, DAGM2007, DTD-Synthetic, FSSD-12, CR7-DET, KolektorSDD, KolektorSDD2, Magnetic Tile, MPDD, MSD Seg, MVTec_AD, NEU Seg, road_crack_dataset, RSDDs, severstal-steel-defect-detection, SSGD. The dataset's remaining portion comprises proprietary metallic surface data collected from 14 steel plants and production lines, encompassing unlabeled samples of multiple metallic substrates including aluminum plates, steel sheets/strips, pipes, and rails. Each facility contributed unique data, characterized by material diversity, acquisition variance, process variability, and equipment profiles. While confidentiality restrictions prevent the full disclosure of the dataset, we provide representative samples to showcase its diversity, and all the pre-trained weights are publicly available in our GitHub repository. Our taxonomy further organizes the combined dataset (from 34 distinct sources) into 61 finely differentiated categories based on material classification and acquisition context.
Name | Number of Images | Image Resolution | Defect Types |
---|---|---|---|
Casting Billet | 1,060 (780 defective) | 96×106 to 3,228×492 | Scratch, Weld slag, Cutting opening, Water slag mark, Slag skin, Longitudinal crack |
Steel Pipe | 1,227 (554 defective) | 728×544 | Warp, External fold, Wrinkle, Scratch |
Here, we provide a detailed demonstration of the anomaly text prompt generation process for the casting billet and steel pipe datasets. Specifically, we began by selecting a representative sample for each type of defect. This sample was then input into GPT-4o, where the model was tasked with locating and describing the defect in the image, providing reference descriptions of the defect's characteristics. In parallel, we incorporated expert knowledge gathered from industrial fieldwork (including insights from field workers and relevant professional books) that outlined the common characteristics of such defects. These two sources of information complemented each other: the GPT-4o model provided textual expressions of defect features, while the expert knowledge helped fill in the gaps by offering a broader understanding of common characteristics that might not be fully captured by the representative samples in specific scenarios. The final text prompts were then crafted through human judgment. During the anomaly detection phase, all potential defect types and their associated features were input to assess the current object. Moreover, for the private dataset used in pretraining, we have showcased some sample data and their corresponding text prompts. It is important to note that these datasets are fully unlabeled.
This framework is composed of two key phases: anomaly map guided backbone pretraining (b) and anomaly box guided detector pretraining (c). Using the Knowledge Enhanced Anomaly Detection algorithm (a), which incorporates detailed defect descriptions as prior knowledge, anomaly maps are generated. In the backbone pretraining phase, anomaly map information is transferred into the network via distillation loss. This process can be seamlessly combined with existing pretraining tasks. During the detector pretraining phase, the pretrained backbone is kept frozen, while the anomaly maps are used to generate pseudo-defect boxes. These pseudo-boxes are used for detector component pretraining within the object detection model.
Experimental results across four downstream datasets demonstrate that AGSSP's pretraining framework—compatible with multiple backbone architectures, detection methods, and pretraining approaches—consistently outperforms baseline models pretrained on ImageNet and COCO.
@misc{liu2025advancingmetallicsurfacedefect,
title={Advancing Metallic Surface Defect Detection via Anomaly-Guided Pretraining on a Large Industrial Dataset},
author={Chuni Liu and Hongjie Li and Jiaqi Du and Yangyang Hou and Qian Sun and Lei Jin and Ke Xu},
year={2025},
eprint={2509.18919},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.18919},
}