All Projects โ†’ kLabUM โ†’ Rrcf

kLabUM / Rrcf

Licence: mit
๐ŸŒฒ Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Rrcf

EurekaTrees
Visualizes the Random Forest debug string from the MLLib in Spark using D3.js
Stars: โœญ 37 (-87.2%)
Mutual labels:  tree, random-forest
Pysad
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)
Stars: โœญ 87 (-69.9%)
Mutual labels:  streaming-data, anomaly-detection
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: โœญ 280 (-3.11%)
Mutual labels:  anomaly-detection, random-forest
linear-tree
A python library to build Model Trees with Linear Models at the leaves.
Stars: โœญ 128 (-55.71%)
Mutual labels:  tree, random-forest
Data Structures Algorithms
My implementation of 85+ popular data structures and algorithms and interview questions in Python 3 and C++
Stars: โœญ 273 (-5.54%)
Mutual labels:  tree
FloydHub-Anomaly-Detection-Blog
Contains the thorough experiments made for a FloydHub article on Anomaly Detection
Stars: โœญ 15 (-94.81%)
Mutual labels:  anomaly-detection
Thio
Thio - a playground for real-time anomaly detection
Stars: โœญ 38 (-86.85%)
Mutual labels:  anomaly-detection
T3
[EMNLP 2020] "T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack" by Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
Stars: โœญ 25 (-91.35%)
Mutual labels:  tree
Ggeditor
A visual graph editor based on G6 and React
Stars: โœญ 3,220 (+1014.19%)
Mutual labels:  tree
Dgfraud
A Deep Graph-based Toolbox for Fraud Detection
Stars: โœญ 281 (-2.77%)
Mutual labels:  anomaly-detection
Anomalize
Tidy anomaly detection
Stars: โœญ 263 (-9%)
Mutual labels:  anomaly-detection
MVTec-Anomaly-Detection
This project proposes an end-to-end framework for semi-supervised Anomaly Detection and Segmentation in images based on Deep Learning.
Stars: โœญ 161 (-44.29%)
Mutual labels:  anomaly-detection
Cloudflow
Cloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: โœญ 278 (-3.81%)
Mutual labels:  streaming-data
Merlion
Merlion: A Machine Learning Framework for Time Series Intelligence
Stars: โœญ 2,368 (+719.38%)
Mutual labels:  anomaly-detection
tree
This is a mirror of the source available on the Web site.
Stars: โœญ 31 (-89.27%)
Mutual labels:  tree
2020plus
Classifies genes as an oncogene, tumor suppressor gene, or as a non-driver gene by using Random Forests
Stars: โœญ 44 (-84.78%)
Mutual labels:  random-forest
Benthos
Fancy stream processing made operationally mundane
Stars: โœญ 3,705 (+1182.01%)
Mutual labels:  streaming-data
fgeo
[Meta R-package on CRAN] Analyse forest diversity and dynamics
Stars: โœญ 22 (-92.39%)
Mutual labels:  tree
tiny-wheels
ไธ€ๅฅ—ๅŸบไบŽๅŽŸ็”ŸJavaScriptๅผ€ๅ‘็š„็ป„ไปถๅบ“๏ผŒๆ— ไพ่ต–ใ€ไฝ“็งฏๅฐใ€็ฎ€ๅ•ๆ˜“็”จ
Stars: โœญ 60 (-79.24%)
Mutual labels:  tree
Javascript Datastructures Algorithms
๐Ÿ“š collection of JavaScript and TypeScript data structures and algorithms for education purposes. Source code bundle of JavaScript algorithms and data structures book
Stars: โœญ 3,221 (+1014.53%)
Mutual labels:  tree

rrcf ๐ŸŒฒ๐ŸŒฒ๐ŸŒฒ

Build Status Coverage Status Python 3.6 GitHub status

Implementation of the Robust Random Cut Forest Algorithm for anomaly detection by Guha et al. (2016).

S. Guha, N. Mishra, G. Roy, & O. Schrijvers, Robust random cut forest based anomaly detection on streams, in Proceedings of the 33rd International conference on machine learning, New York, NY, 2016 (pp. 2712-2721).

About

The Robust Random Cut Forest (RRCF) algorithm is an ensemble method for detecting outliers in streaming data. RRCF offers a number of features that many competing anomaly detection algorithms lack. Specifically, RRCF:

  • Is designed to handle streaming data.
  • Performs well on high-dimensional data.
  • Reduces the influence of irrelevant dimensions.
  • Gracefully handles duplicates and near-duplicates that could otherwise mask the presence of outliers.
  • Features an anomaly-scoring algorithm with a clear underlying statistical meaning.

This repository provides an open-source implementation of the RRCF algorithm and its core data structures for the purposes of facilitating experimentation and enabling future extensions of the RRCF algorithm.

Documentation

Read the docs here ๐Ÿ“–.

Installation

Use pip to install rrcf via pypi:

$ pip install rrcf

Currently, only Python 3 is supported.

Dependencies

The following dependencies are required to install and use rrcf:

The following optional dependencies are required to run the examples shown in the documentation:

Listed version numbers have been tested and are known to work (this does not necessarily preclude older versions).

Robust random cut trees

A robust random cut tree (RRCT) is a binary search tree that can be used to detect outliers in a point set. A RRCT can be instantiated from a point set. Points can also be added and removed from an RRCT.

Creating the tree

import numpy as np
import rrcf

# A (robust) random cut tree can be instantiated from a point set (n x d)
X = np.random.randn(100, 2)
tree = rrcf.RCTree(X)

# A random cut tree can also be instantiated with no points
tree = rrcf.RCTree()

Inserting points

tree = rrcf.RCTree()

for i in range(6):
    x = np.random.randn(2)
    tree.insert_point(x, index=i)
โ”€+
 โ”œโ”€โ”€โ”€+
 โ”‚   โ”œโ”€โ”€โ”€+
 โ”‚   โ”‚   โ”œโ”€โ”€(0)
 โ”‚   โ”‚   โ””โ”€โ”€โ”€+
 โ”‚   โ”‚       โ”œโ”€โ”€(5)
 โ”‚   โ”‚       โ””โ”€โ”€(4)
 โ”‚   โ””โ”€โ”€โ”€+
 โ”‚       โ”œโ”€โ”€(2)
 โ”‚       โ””โ”€โ”€(3)
 โ””โ”€โ”€(1)

Deleting points

tree.forget_point(2)
โ”€+
 โ”œโ”€โ”€โ”€+
 โ”‚   โ”œโ”€โ”€โ”€+
 โ”‚   โ”‚   โ”œโ”€โ”€(0)
 โ”‚   โ”‚   โ””โ”€โ”€โ”€+
 โ”‚   โ”‚       โ”œโ”€โ”€(5)
 โ”‚   โ”‚       โ””โ”€โ”€(4)
 โ”‚   โ””โ”€โ”€(3)
 โ””โ”€โ”€(1)

Anomaly score

The likelihood that a point is an outlier is measured by its collusive displacement (CoDisp): if including a new point significantly changes the model complexity (i.e. bit depth), then that point is more likely to be an outlier.

# Seed tree with zero-mean, normally distributed data
X = np.random.randn(100,2)
tree = rrcf.RCTree(X)

# Generate an inlier and outlier point
inlier = np.array([0, 0])
outlier = np.array([4, 4])

# Insert into tree
tree.insert_point(inlier, index='inlier')
tree.insert_point(outlier, index='outlier')
tree.codisp('inlier')
>>> 1.75
tree.codisp('outlier')
>>> 39.0

Batch anomaly detection

This example shows how a robust random cut forest can be used to detect outliers in a batch setting. Outliers correspond to large CoDisp.

import numpy as np
import pandas as pd
import rrcf

# Set parameters
np.random.seed(0)
n = 2010
d = 3
num_trees = 100
tree_size = 256

# Generate data
X = np.zeros((n, d))
X[:1000,0] = 5
X[1000:2000,0] = -5
X += 0.01*np.random.randn(*X.shape)

# Construct forest
forest = []
while len(forest) < num_trees:
    # Select random subsets of points uniformly from point set
    ixs = np.random.choice(n, size=(n // tree_size, tree_size),
                           replace=False)
    # Add sampled trees to forest
    trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs]
    forest.extend(trees)

# Compute average CoDisp
avg_codisp = pd.Series(0.0, index=np.arange(n))
index = np.zeros(n)
for tree in forest:
    codisp = pd.Series({leaf : tree.codisp(leaf) for leaf in tree.leaves})
    avg_codisp[codisp.index] += codisp
    np.add.at(index, codisp.index.values, 1)
avg_codisp /= index

Image

Streaming anomaly detection

This example shows how the algorithm can be used to detect anomalies in streaming time series data.

import numpy as np
import rrcf

# Generate data
n = 730
A = 50
center = 100
phi = 30
T = 2*np.pi/100
t = np.arange(n)
sin = A*np.sin(T*t-phi*T) + center
sin[235:255] = 80

# Set tree parameters
num_trees = 40
shingle_size = 4
tree_size = 256

# Create a forest of empty trees
forest = []
for _ in range(num_trees):
    tree = rrcf.RCTree()
    forest.append(tree)
    
# Use the "shingle" generator to create rolling window
points = rrcf.shingle(sin, size=shingle_size)

# Create a dict to store anomaly score of each point
avg_codisp = {}

# For each shingle...
for index, point in enumerate(points):
    # For each tree in the forest...
    for tree in forest:
        # If tree is above permitted size, drop the oldest point (FIFO)
        if len(tree.leaves) > tree_size:
            tree.forget_point(index - tree_size)
        # Insert the new point into the tree
        tree.insert_point(point, index=index)
        # Compute codisp on the new point and take the average among all trees
        if not index in avg_codisp:
            avg_codisp[index] = 0
        avg_codisp[index] += tree.codisp(index) / num_trees

Image

Contributing

We welcome contributions to the rrcf repo. To contribute, submit a pull request to the dev branch.

Types of contributions

Some suggested types of contributions include:

  • Bug fixes
  • Documentation improvements
  • Performance enhancements
  • Extensions to the algorithm

Check the issue tracker for any specific issues that need help. If you encounter a problem using rrcf, or have an idea for an extension, feel free to raise an issue.

Guidelines for contributors

Please consider the following guidelines when contributing to the codebase:

  • Ensure that any new methods, functions or classes include docstrings. Docstrings should include a description of the code, as well as descriptions of the inputs (arguments) and outputs (returns). Providing an example use case is recommended (see existing methods for examples).
  • Write unit tests for any new code and ensure that all tests are passing with no warnings. Please ensure that overall code coverage does not drop below 80%.

Running unit tests

To run unit tests, first ensure that pytest and pytest-cov are installed:

$ pip install pytest pytest-cov

To run the tests, navigate to the root directory of the repo and run:

$ pytest --cov=rrcf/

Citing

If you have used this codebase in a publication and wish to cite it, please use the Journal of Open Source Software article.

M. Bartos, A. Mullapudi, & S. Troutman, rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams, in: Journal of Open Source Software, The Open Journal, Volume 4, Number 35. 2019

@article{bartos_2019_rrcf,
  title={{rrcf: Implementation of the Robust Random Cut Forest algorithm for anomaly detection on streams}},
  authors={Matthew Bartos and Abhiram Mullapudi and Sara Troutman},
  journal={{The Journal of Open Source Software}},
  volume={4},
  number={35},
  pages={1336},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].