All Projects → GuansongPang → ADRepository-Anomaly-detection-datasets

GuansongPang / ADRepository-Anomaly-detection-datasets

Licence: GPL-3.0 License
ADRepository: Real-world anomaly detection datasets

Projects that are alternatives of or similar to ADRepository-Anomaly-detection-datasets

kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (-53.25%)
Mutual labels:  outlier-detection, anomaly-detection, novelty-detection
outliertree
(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
Stars: ✭ 40 (-48.05%)
Mutual labels:  outlier-detection, anomaly-detection
Anomaly Detection Resources
Anomaly detection related books, papers, videos, and toolboxes
Stars: ✭ 5,306 (+6790.91%)
Mutual labels:  outlier-detection, anomaly-detection
Awesome Ts Anomaly Detection
List of tools & datasets for anomaly detection on time-series data.
Stars: ✭ 2,027 (+2532.47%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network
Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection
Stars: ✭ 94 (+22.08%)
Mutual labels:  outlier-detection, anomaly-detection
DCSO
Supplementary material for KDD 2018 workshop "DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles"
Stars: ✭ 20 (-74.03%)
Mutual labels:  outlier-detection, anomaly-detection
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+6501.3%)
Mutual labels:  outlier-detection, anomaly-detection
pytod
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Stars: ✭ 131 (+70.13%)
Mutual labels:  outlier-detection, anomaly-detection
f anogan pytorch
Code for reproducing f-AnoGAN in Pytorch
Stars: ✭ 28 (-63.64%)
Mutual labels:  outlier-detection, anomaly-detection
drama
Main component extraction for outlier detection
Stars: ✭ 17 (-77.92%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network-image
Official PyTorch implementation of the paper “Explainable Deep Few-shot Anomaly Detection with Deviation Networks”, weakly/partially supervised anomaly detection, few-shot anomaly detection, image defect detection.
Stars: ✭ 47 (-38.96%)
Mutual labels:  outlier-detection, anomaly-detection
DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (+9.09%)
Mutual labels:  outlier-detection, anomaly-detection
XGBOD
Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"
Stars: ✭ 59 (-23.38%)
Mutual labels:  outlier-detection, anomaly-detection
ManTraNet-pytorch
Implementation of the famous Image Manipulation\Forgery Detector "ManTraNet" in Pytorch
Stars: ✭ 47 (-38.96%)
Mutual labels:  anomaly-detection
Anomaly Detection
anomaly detection with anomalize and Google Trends data
Stars: ✭ 38 (-50.65%)
Mutual labels:  anomaly-detection
Meta-GDN AnomalyDetection
Implementation of TheWebConf 2021 -- Few-shot Network Anomaly Detection via Cross-network Meta-learning
Stars: ✭ 22 (-71.43%)
Mutual labels:  anomaly-detection
mvts-ano-eval
A repository for code accompanying the manuscript 'An Evaluation of Anomaly Detection and Diagnosis in Multivariate Time Series' (published at TNNLS)
Stars: ✭ 26 (-66.23%)
Mutual labels:  anomaly-detection
benfordslaw
benfordslaw is about the frequency distribution of leading digits.
Stars: ✭ 29 (-62.34%)
Mutual labels:  anomaly-detection
MemStream
MemStream: Memory-Based Streaming Anomaly Detection
Stars: ✭ 58 (-24.68%)
Mutual labels:  anomaly-detection
Mean-Shifted-Anomaly-Detection
Mean-Shifted Contrastive Loss for Anomaly Detection
Stars: ✭ 61 (-20.78%)
Mutual labels:  anomaly-detection

ADRepository: Real-world anomaly detection datasets

In this repository, we provide a continuously updated collection of popular real-world datasets used for anomaly detection in the literature. Some of the datasets are converted from imbalanced classification datasets, while the others contain real anomalies.

This repository is created to serve as an extension to the datasets presented in our recent survey paper on deep anomaly detection. If you use the datasets below, you may cite the survey paper or the specific papers in the following sections to acknowledge the use.

@article{pang2021deep,
  title={Deep learning for anomaly detection: A review},
  author={Pang, Guansong and Shen, Chunhua and Cao, Longbing and Hengel, Anton Van Den},
  journal={ACM Computing Surveys (CSUR)},
  volume={54},
  number={2},
  pages={1--38},
  year={2021},
  publisher={ACM New York, NY, USA}
}

A continuously updated repository for SOTA deep anomaly detection implementation is made publicly available at https://github.com/GuansongPang/SOTA-Deep-Anomaly-Detection

Numerical Datasets

Seven datasets from the KDD19 paper - DevNet - are available at the DevNet_datasets folder in this repository. The basic statistics of these datasets are shown below.

Dataset Data size Dimensionality
donors 619,326 10
census 299,285 500
fraud 284,807 29
celeba 202,599 39
backdoor 95,329 196
campaign 41,188 62
thyroid 7,200 21

Detailed introduction of these datasets and some performance benchmarks can be found in the DevNet paper. Source code of DevNet is available at here.

@inproceedings{pang2019deep,
  title={Deep anomaly detection with deviation networks},
  author={Pang, Guansong and Shen, Chunhua and van den Hengel, Anton},
  booktitle={Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery \& data mining},
  pages={353--362},
  year={2019}
}

Categorical Datasets

14 widely-used categorical datasets for anomaly detection are available at the Categorical data folder. The basic statistics of these datasets are shown below.

Dataset Data size Dimensionality Anomaly class
bank 41,188 10 yes
census 299,285 33 50K+
AID362 4,279 114 active
w7a 49,749 300 yes
CMC 1,473 8 child>10
APAS 12,695 64 train
CelebA 202,599 39 bald
Chess 28,056 6 zero
AD 3,279 1,555 ad.
Solar-flare 1,066 11 F
Probe 64,759 6 attack
U2R 60,821 6 attack
R10 12,897 100 corn
CoverType 581,012 44 cottonwood

Detailed introduction of these datasets and some state-of-the-art performance benchmark can be found in the following papers:

@inproceedings{pang2016outlier,
  title={Outlier detection in complex categorical data by modelling the feature value couplings},
  author={Pang, Guansong and Cao, Longbing and Chen, Ling},
  booktitle={IJCAI International Joint Conference on Artificial Intelligence},
  year={2016}
}
@inproceedings{xu2018exploring,
  title={Exploring a high-quality outlying feature value set for noise-resilient outlier detection in categorical data},
  author={Xu, Hongzuo and Wang, Yongjun and Cheng, Li and Wang, Yijie and Ma, Xingkong},
  booktitle={Proceedings of the 27th ACM International Conference on Information and Knowledge Management},
  pages={17--26},
  year={2018}
}
@article{pang2021homophily,
  title={Homophily outlier detection in non-IID categorical data},
  author={Pang, Guansong and Cao, Longbing and Chen, Ling},
  journal={Data Mining and Knowledge Discovery},
  pages={1--62},
  year={2021},
  publisher={Springer}
}

Video Datasets

Two popular weakly supervised video anomaly detection datasets, including ShanghaiTech Campus and UCF-Crime, are added to the video data folder. The sources are features extracted using I3D backbone rather than raw data. Weakly supervised video anomaly detection assumes the availability of the video-level labels and aims at detecting frame-level anomalies. They can also be re-organized and used for semi-supervised settings that the training data contains normal videos only. More information about the datasets can be found in the following papers.

@inproceedings{sultani2018real,
  title={Real-world anomaly detection in surveillance videos},
  author={Sultani, Waqas and Chen, Chen and Shah, Mubarak},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={6479--6488},
  year={2018}
}
@inproceedings{tian2021weakly,
  title={Weakly-supervised Video Anomaly Detection with Robust Temporal Feature Magnitude Learning},
  author={Tian, Yu and Pang, Guansong and Chen, Yuanhong and Singh, Rajvinder and Verjans, Johan W and Carneiro, Gustavo},
  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
  year={2021}
}

Image Datasets

We add 9 publicly available image datasets with real anomalies from diverse application domains, including defect detection, novelty detection in rover-based planetary exploration, and lesion detection in medical images. See image data for more details. These datasets are used to evaluate a wide range of detection models in different settings in the following paper:

@article{pang2021explainable,
  title={Explainable Deep Few-shot Anomaly Detection with Deviation Networks},
  author={Pang, Guansong and Ding, Choubo and Shen, Chunhua and Hengel, Anton van den},
  journal={arXiv preprint arXiv:2108.00462},
  year={2021}
}

Graph Datasets

16 real-world datasets for graph-level anomaly detection are added. Basic statistics of these datasets are as follows. See graph data for more details.

Dataset #Graphs #Avg. Nodes #Avg. Edges
PROTEINS_full 1,113 39.06 72.82
ENZYMES 600 32.63 62.14
AIDS 2,000 15.69 16.2
DHFR 467 42.43 44.54
BZR 405 35.75 38.36
COX2 467 41.22 43.45
DD 1,178 284.32 715.66
NCI1 4,110 29.87 32.3
IMDB 1,000 19.77 96.53
REDDIT 2,000 429.63 497.75
HSE 8,417 16.89 17.23
MMP 7,558 17.62 17.98
p53 8,903 17.92 18.34
PPAR-gamma 8,451 17.38 17.72
COLLAB 5,000 74.49 2,457.78
hERG 655 26.48 28.79

The datasets were used and made available by the authors of the following paper.

@inproceedings{ma2022deep,
  title={Deep Graph-level Anomaly Detection by Glocal Knowledge Distillation},
  author={Ma, Rongrong and Pang, Guansong and Chen, Ling and van den Hengel, Anton},
  booktitle={The Fifteenth ACM International Conference on Web Search and Data Mining},
  year={2022}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].