All Projects → yzhao062 → XGBOD

yzhao062 / XGBOD

Licence: other
Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to XGBOD

DCSO
Supplementary material for KDD 2018 workshop "DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles"
Stars: ✭ 20 (-66.1%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
Anomaly Detection Resources
Anomaly detection related books, papers, videos, and toolboxes
Stars: ✭ 5,306 (+8893.22%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+8515.25%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
pytod
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Stars: ✭ 131 (+122.03%)
Mutual labels:  outlier-detection, anomaly-detection
f anogan pytorch
Code for reproducing f-AnoGAN in Pytorch
Stars: ✭ 28 (-52.54%)
Mutual labels:  outlier-detection, anomaly-detection
DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (+42.37%)
Mutual labels:  outlier-detection, anomaly-detection
ADRepository-Anomaly-detection-datasets
ADRepository: Real-world anomaly detection datasets
Stars: ✭ 77 (+30.51%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network
Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection
Stars: ✭ 94 (+59.32%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network-image
Official PyTorch implementation of the paper “Explainable Deep Few-shot Anomaly Detection with Deviation Networks”, weakly/partially supervised anomaly detection, few-shot anomaly detection, image defect detection.
Stars: ✭ 47 (-20.34%)
Mutual labels:  outlier-detection, anomaly-detection
outliertree
(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
Stars: ✭ 40 (-32.2%)
Mutual labels:  outlier-detection, anomaly-detection
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (-38.98%)
Mutual labels:  outlier-detection, anomaly-detection
Awesome Ts Anomaly Detection
List of tools & datasets for anomaly detection on time-series data.
Stars: ✭ 2,027 (+3335.59%)
Mutual labels:  outlier-detection, anomaly-detection
drama
Main component extraction for outlier detection
Stars: ✭ 17 (-71.19%)
Mutual labels:  outlier-detection, anomaly-detection
CCD
Code for 'Constrained Contrastive Distribution Learning for Unsupervised Anomaly Detection and Localisation in Medical Images' [MICCAI 2021]
Stars: ✭ 30 (-49.15%)
Mutual labels:  anomaly-detection
deepAD
Detection of Accounting Anomalies in the Latent Space using Adversarial Autoencoder Neural Networks - A lab we prepared for the KDD'19 Workshop on Anomaly Detection in Finance that will walk you through the detection of interpretable accounting anomalies using adversarial autoencoder neural networks. The majority of the lab content is based on J…
Stars: ✭ 65 (+10.17%)
Mutual labels:  anomaly-detection
ailia-models
The collection of pre-trained, state-of-the-art AI models for ailia SDK
Stars: ✭ 1,102 (+1767.8%)
Mutual labels:  anomaly-detection
msda
Library for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (+35.59%)
Mutual labels:  anomaly-detection
FSSD OoD Detection
Feature Space Singularity for Out-of-Distribution Detection. (SafeAI 2021)
Stars: ✭ 66 (+11.86%)
Mutual labels:  anomaly-detection
anomagram
Interactive Visualization to Build, Train and Test an Autoencoder with Tensorflow.js
Stars: ✭ 152 (+157.63%)
Mutual labels:  anomaly-detection
kubervisor
The Kubervisor allow you to control which pods should receive traffic or not based on anomaly detection.It is a new kind of health check system.
Stars: ✭ 35 (-40.68%)
Mutual labels:  anomaly-detection

XGBOD (Extreme Boosting Based Outlier Detection)


Zhao, Y. and Hryniewicki, M.K., "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning," International Joint Conference on Neural Networks (IJCNN), IEEE, 2018.

Please cite the paper as:

@inproceedings{zhao2018xgbod,
  title={XGBOD: improving supervised outlier detection with unsupervised representation learning},
  author={Zhao, Yue and Hryniewicki, Maciej K},
  booktitle={2018 International Joint Conference on Neural Networks (IJCNN)},
  pages={1--8},
  year={2018},
  organization={IEEE}
}

PDF | IEEE Explore | API Documentation | Example with PyOD

Update (Dec 25th, 2018): XGBOD has been officially released in Python Outlier Detection (PyOD) V0.6.6.

Update (Dec 6th, 2018): XGBOD has been implemented in Python Outlier Detection (PyOD), to be released in pyod V0.6.6.


Additional notes:

  1. Two versions of codes are provided:
    1. Demo version (xgbod_demo.py) is refactored for fast execution and reproduction as a proof of concept. The key difference from the full version is TOS are built in once for both training and test data. It could be regarded as a static unsupervised engineering. However, it is noted users should not expose and use the testing data while building TOS in practice.
    2. Production version (Python Outlier Detection (PyOD)) is released with full optimization and testing as a framework. The purpose of this version is to be used in real applications, which should require fewer dependencies and faster execution.
  2. It is understood that there are small variations in the results due to the random process, such as xgboost and Random TOS Selection. Again, running demo code would only give you similar results but not the exact results. Additionally, specific setups are slightly different for distinct datasets, which we have not published yet.
  3. While running L1_Comb and L2_Comb, EasyEnsemble is used to construct balanced bags. It is noted the demo code uses 10 bags instead of 50, for executing efficiently. Despite, increasing to 50 bags would not change the result too much but just bring better stablity. You are welcomed to change "BalancedBaggingClassifier" parameter for using 50 bags. However, it is very slow and this is also one of the reasons why we propose XGBOD -- it is much more efficient:)

Introduction

XGBOD is a three-phase framework (see Figure below). In the first phase, it generates new data representations. Specifically, various unsupervised outlier detection methods are applied to the original data to get transformed outlier scores as new data representations. In the second phase, a selection process is performed on newly generated outlier scores to keep the useful ones. The selected outlier scores are then combined with the original features to become the new feature space. Finally, an XGBoost model is trained on the new feature space, and its output decides the outlier prediction result.

XGBOD Flowchart

Dependency

The experiment code is writen in Python 3 and built on a number of Python packages:

  • matplotlib==2.0.2
  • xgboost==0.7
  • pandas==0.21.0
  • imbalanced_learn==0.3.2
  • scipy==0.19.1
  • numpy==1.13.1
  • PyNomaly==0.1.7
  • imblearn==0.0
  • scikit_learn==0.19.1

Batch installation is possible using the supplied "requirements.txt":

pip install -r requirements.txt

Datasets

Seven datasets are used (see dataset folder):

Datasets Dimension Sample Size Number of Outliers
Arrhythmia 351 274 126 (36%)
Letter 1600 32 100 (6.25%)
Cardio 1831 21 176 (9.6%)
Speech 3686 600 61(1.65%)
Satellite 6435 36 2036 (31.64%)
Mnist 7603 100 700 (9.21%)
Mammography 11863 6 260 (2.32%)

All datasets are accessible at http://odds.cs.stonybrook.edu/. Citation Suggestion for the datasets please refer to:

Shebuti Rayana (2016). ODDS Library [http://odds.cs.stonybrook.edu]. Stony Brook, NY: Stony Brook University, Department of Computer Science.


Usage and Sample Output (Demo Version)

Experiments could be reproduced by running xgbod_demo.py directly. You could simply download/clone the entire repository and execute the code by "python xgbod_demo.py".

The first part of the code read in the datasets using Scipy. Five public outlier datasets are supplied. Then various TOS are built by seven different algorithms:

  1. KNN
  2. K-Median
  3. AvgKNN
  4. LOF
  5. LoOP
  6. One-Class SVM
  7. Isolation Forests Please be noted that you may include more TOS

Taking KNN as an example, codes are as below:

# Generate TOS using KNN based algorithms
feature_list, roc_knn, prc_n_knn, result_knn = get_TOS_knn(X_norm, y, k_range, feature_list)

Then three TOS selection methods are used to select p TOS:

p = 10  # number of selected TOS
# random selection
X_train_new_rand, X_train_all_rand = random_select(X, X_train_new_orig, roc_list, p)
# accurate selection
X_train_new_accu, X_train_all_accu = accurate_select(X, X_train_new_orig, feature_list, roc_list, p)
# balance selection
X_train_new_bal, X_train_all_bal = balance_select(X, X_train_new_orig, roc_list, p)

Finally, various classification methods are applied to the datasets. Sample outputs are provided below:

Sample Outputs on Arrhythmia

Figures

Running plots.py would generate the figures for various TOS selection algorithms: The effect of number of TOS and selection method

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].