MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects

Lei Fan1*, Dongdong Fan2, Zhiguang Hu3, Yiwen Ding2, Donglin Di4, Kai Yi5, Maurice Pagnucco1, Yang Song1
ArXiv 2024
1CSE, UNSW Sydney,
2Gaozhe Technology, 3South China Agricultural University, 4Li Auto, 5University of Cambridge

*Corresponding Author
Image description

Overview of MANTA. It consists of both visual and text components. The visual part includes over 137K multi-view images spanning five domains. The text part is divided into two subsets: Declarative Knowledge, comprising 875 words describing common anomalies, and Constructivist Learning, which includes 2K Image-text multiple-choice questions.

Abstract

We present MANTA, a visual-text anomaly detection dataset for tiny objects. The visual component comprises over 137.3K images across 38 object categories spanning five typical domains, of which 8.6K images are labeled as anomalous with pixel-level annotations. Each image is captured from five distinct viewpoints to ensure comprehensive object coverage. The text component consists of two subsets: Declarative Knowledge, including 875 words that describe common anomalies across various domains and specific categories, with detailed explanations for < what, why, how>, including causes and visual characteristics; and Constructivist Learning, providing 2K multiple-choice questions with varying levels of difficulty, each paired with images and corresponded answer explanations. We also propose a baseline for visual-text tasks and conduct extensive benchmarking experiments to evaluate advanced methods across different settings, highlighting the challenges and efficacy of our dataset.

Dataset Information

The MANTA dataset provides a comprehensive collection of high-quality images for cereal grain inspection. It includes over 100K images captured from four types of grains: wheat, maize, soybean and paddy. The dataset supports tasks such as anomaly detection.

The MANTA dataset is licensed under the Creative Commons BY-NC-SA 4.0 license. Note that All data must not be used for commercial purposes.

Visual Component

You can access the dataset using the figshare links below:

  • Agriculture: part1 (3 classes, 13.6G) , part2 (1 class, 14.3G)
  • Medicine: part1 (4 classes, 16.1G) , part2 (5 classes, 16.7G)
  • Electronics: part1 (11 classes, 18.3G)
  • Mechanics: part1 (5 classes, 14.5G) , part2 (6 classes, 16.7G)
  • Groceries: part1 (3 classes, 7.6G)
  • We additionally provide a tiny version of our MANTA, in which each class includes 800 images and all images are resized to 256 * (256x5). The detailed distribution can be found in distribution.md file:

  • MANTA-Tiny: part1 (38 classes, 10.1G) ,
  • Text Component

    You can access the Declarative Knowledge part using the figshare links below:

  • Declarative Knowledge: coming soon (, M) ,
  • You can access the Constructivist Learning part using the figshare links below:

  • Constructivist Learning: coming soon (, G) ,
  • BibTeX

    
            @article{fan2024manta,
              title={MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects},
              author={Fan, Lei and Fan, Dongdong and Hu, Zhiguang and Ding, Yiwen and Di, Donglin and Yi, Kai and Pagnucco, Maurice and Song, Yang},
              journal={arXiv preprint arXiv:2412.04867},
              year={2024}
            }