All Projects → yzhao062 → DCSO

yzhao062 / DCSO

Licence: other
Supplementary material for KDD 2018 workshop "DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles"

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to DCSO

Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+25315%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
XGBOD
Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"
Stars: ✭ 59 (+195%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
Anomaly Detection Resources
Anomaly detection related books, papers, videos, and toolboxes
Stars: ✭ 5,306 (+26430%)
Mutual labels:  outlier-detection, anomaly-detection, outlier-ensembles
outliertree
(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
Stars: ✭ 40 (+100%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network
Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection
Stars: ✭ 94 (+370%)
Mutual labels:  outlier-detection, anomaly-detection
Awesome Ts Anomaly Detection
List of tools & datasets for anomaly detection on time-series data.
Stars: ✭ 2,027 (+10035%)
Mutual labels:  outlier-detection, anomaly-detection
ADRepository-Anomaly-detection-datasets
ADRepository: Real-world anomaly detection datasets
Stars: ✭ 77 (+285%)
Mutual labels:  outlier-detection, anomaly-detection
pytod
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Stars: ✭ 131 (+555%)
Mutual labels:  outlier-detection, anomaly-detection
f anogan pytorch
Code for reproducing f-AnoGAN in Pytorch
Stars: ✭ 28 (+40%)
Mutual labels:  outlier-detection, anomaly-detection
drama
Main component extraction for outlier detection
Stars: ✭ 17 (-15%)
Mutual labels:  outlier-detection, anomaly-detection
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (+80%)
Mutual labels:  outlier-detection, anomaly-detection
DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (+320%)
Mutual labels:  outlier-detection, anomaly-detection
deviation-network-image
Official PyTorch implementation of the paper “Explainable Deep Few-shot Anomaly Detection with Deviation Networks”, weakly/partially supervised anomaly detection, few-shot anomaly detection, image defect detection.
Stars: ✭ 47 (+135%)
Mutual labels:  outlier-detection, anomaly-detection
detection-rules
Threat Detection & Anomaly Detection rules for popular open-source components
Stars: ✭ 34 (+70%)
Mutual labels:  anomaly-detection
az-ml-batch-score
Deploying a Batch Scoring Pipeline for Python Models
Stars: ✭ 17 (-15%)
Mutual labels:  anomaly-detection
singular-spectrum-transformation
fast implementation of singular spectrum transformation (change point detection algorithm)
Stars: ✭ 41 (+105%)
Mutual labels:  anomaly-detection
visualqc
VisualQC : assistive tool to ease the quality control workflow of neuroimaging data.
Stars: ✭ 56 (+180%)
Mutual labels:  outlier-detection
MStream
Anomaly Detection on Time-Evolving Streams in Real-time. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.
Stars: ✭ 68 (+240%)
Mutual labels:  anomaly-detection
Anomaly Detection
anomaly detection with anomalize and Google Trends data
Stars: ✭ 38 (+90%)
Mutual labels:  anomaly-detection
Meta-GDN AnomalyDetection
Implementation of TheWebConf 2021 -- Few-shot Network Anomaly Detection via Cross-network Meta-learning
Stars: ✭ 22 (+10%)
Mutual labels:  anomaly-detection

DCSO (Dynamic Combination of Detector Scores for Outlier Ensembles)

Supplementary materials: datasets, demo source codes and sample outputs.

Y. Zhao and M.K. Hryniewicki, "DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles" ACM KDD Workshop on Outlier Detection De-constructed (ODD v5.0), 2018.

Please cite the paper as:

@conference{zhao2018dcso,
    author     = {Zhao, Yue and Hryniewicki, Maciej K},
    title      = {{DCSO:} Dynamic Combination of Detector Scores for Outlier Ensembles},
    booktitle  = {ACM SIGKDD ODD Workshop},
    year       = {2018},
    address    = {London, UK},
    timestamp  = {Mon, 22 Oct 2018 13:07:32 +0200},
}

PDF | Presentation Slides ]

Note: LSCP is an upgraded version of DCSO, which has been accepted at SDM' 19.


Additional notes:

  1. Three versions of codes are (going to be) provided:
    1. Demo version (demo_lof.py and demo_knn.py) are created for the fast reproduction of the experiment results. The demo version only compares the baseline algorithms with DCSO algorithms. The effect of parameters, e.g., the choice of k, are not included.
    2. Full version (tba) will be released after moderate code cleanup and optimization. In contrast to the demo version, the full version also considers the impact of parameter setting. The full version is therefore relatively slow, which will be further optimized. It is noted the demo version is sufficient to prove the idea. We suggest to using the demo version while playing with DCSO, during the full version is being optimized.
    3. Production version (tba) will be released with full optimization and testing as a framework. The purpose of this version is to be used in real applications, which should require fewer dependencies and faster execution.
  2. It is understood that there are small variations in the results due to the random process, e.g., spliting the training and test sets. Thus, running demo codes would only result in similar results to the paper but not the exactly same results.

Introduction

In this paper, an unsupervised outlier detector combination framework called DCSO (Dynamic Combination of Detector Scores for Outlier Ensembles) is proposed, demonstrated and assessed for the dynamic selection of most competent base detectors, with an emphasis on data locality. The proposed DCSO framework first defines the local region of a test instance by its k nearest neighbors and then identifies the top-performing base detectors within the local region. As classification ensembles, DCSO has two key stages. In the Generation stage, the chosen base detector algorithm is initialized with distinct parameters to build a pool of diversified detectors, and all are then fitted on the entire training dataset. In the Combination stage, DCSO picks the most competent detector in the local region defined by the test instance. Finally, the selected detector is used to predict the outlier score for the test instance.

Flowchart

Dependency

The experiment codes are writen in Python 3.6 and built on a number of Python packages:

  • numpy>=1.13
  • scipy>=0.19
  • scikit_learn>=0.19

Batch installation is possible using the supplied "requirements.txt" with pip or conda.


Datasets

Ten datasets are used (see dataset folder):

Datasets # Points (n) Dimension (d) # Outliers % Outliers
Pima 768 8 268 34.8958
Vowels 1456 12 50 3.4341
Letter 1600 32 100 6.2500
Cardio 1831 21 176 9.6122
Thyroid 3772 6 93 2.4655
Satellite 6435 36 2036 31.6394
Pendigits 6870 16 156 2.2707
Annthyroid 7200 6 534 7.4167
Mnist 7603 100 700 9.2069
Shuttle 49097 9 3511 7.1511

All datasets are accesible from http://odds.cs.stonybrook.edu/. Citation Suggestion for the datasets please refer to:

Shebuti Rayana (2016). ODDS Library [http://odds.cs.stonybrook.edu]. Stony Brook, NY: Stony Brook University, Department of Computer Science.

To replicate the demo, you should download the datasets from http://odds.cs.stonybrook.edu/ and place them in ./datasets/. We do not provide the data download.


Usage and Sample Output (Demo Version)

Experiments could be reproduced by running demo_lof.py and demo_knn.py directly. You could simply download/clone the entire repository and execute the code by

python demo_lof.py

The difference between demo_lof.py and demo_knn.py is simply at the base detector choice. Apparently, the former uses LOF as the base detector, while the latter uses kNN instead. We introduce two evalution methods:

  1. The area under receiver operating characteristic curve (ROC)
  2. Precision at rank m (P@m)

The results of demo_lof.py and demo_knn.py are presented below. Table 1 and 2 illustrate the results when LOF is used as the base detector, while Table 3 and 4 are based when kNN is used as the base detector. The highest score is highlighted in bold, while the lowest is marked with an asterisk (*).

 LOF_ROC  LOF_PRC  KNN_ROC  KNN_PRC

Visualizations (based on demo_lof.py )

The figure below visually compares the performance of SG and DCSO methods on Cardio, Thyroid and Letter using t-distributed stochastic neighbor embedding (t-SNE). Normal and outlying points are denoted as orange dots and red squares, respectively. The normal points that are only correctly detected by SG methods are named SG_N (** green triangle_down**), and only by DCSO are named as DCSO_N (blue cross sign). Similarly, outliers are denoted as SG_N (green triangle_up) and DCSO_N (blue plus sign), given they can only be detected by SG or DCSO methods, respectively.

 tsne

Full visulization could be found at t-SNE. To replicate the visualization, please use "viz_tsne.py". It is noted this script is not fully optimized and could be cubersome.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].