All Projects → hibayesian → spark-lof

hibayesian / spark-lof

Licence: Apache-2.0 license
A parallel implementation of local outlier factor based on Spark

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to spark-lof

HampelFilter
Arduino library for identifying outliers with a Hampel filter
Stars: ✭ 26 (+62.5%)
Mutual labels:  outlier-detection
kenchi
A scikit-learn compatible library for anomaly detection
Stars: ✭ 36 (+125%)
Mutual labels:  outlier-detection
deviation-network-image
Official PyTorch implementation of the paper “Explainable Deep Few-shot Anomaly Detection with Deviation Networks”, weakly/partially supervised anomaly detection, few-shot anomaly detection, image defect detection.
Stars: ✭ 47 (+193.75%)
Mutual labels:  outlier-detection
aequitas
Fairness regulator and rate limiter
Stars: ✭ 49 (+206.25%)
Mutual labels:  outlier-detection
Pyod
A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
Stars: ✭ 5,083 (+31668.75%)
Mutual labels:  outlier-detection
f anogan pytorch
Code for reproducing f-AnoGAN in Pytorch
Stars: ✭ 28 (+75%)
Mutual labels:  outlier-detection
visualqc
VisualQC : assistive tool to ease the quality control workflow of neuroimaging data.
Stars: ✭ 56 (+250%)
Mutual labels:  outlier-detection
outlier-detection-grubbs-test-and-generalized-esd-test-python
Repository for detecting outliers using Grubb's Threshold & Generalized Extreme Studentized Deviate (ESD) Test
Stars: ✭ 20 (+25%)
Mutual labels:  outlier-detection
Awesome Ts Anomaly Detection
List of tools & datasets for anomaly detection on time-series data.
Stars: ✭ 2,027 (+12568.75%)
Mutual labels:  outlier-detection
COVID-away
Repo of paper title 'Avoid touching your face: A hand-to-face 3d motion dataset (covid-away) and trained models for smartwatches'
Stars: ✭ 18 (+12.5%)
Mutual labels:  local-outlier-factor
outliertree
(Python, R, C++) Explainable outlier/anomaly detection through decision tree conditioning
Stars: ✭ 40 (+150%)
Mutual labels:  outlier-detection
Anomaly Detection Resources
Anomaly detection related books, papers, videos, and toolboxes
Stars: ✭ 5,306 (+33062.5%)
Mutual labels:  outlier-detection
pytod
TOD: GPU-accelerated Outlier Detection via Tensor Operations
Stars: ✭ 131 (+718.75%)
Mutual labels:  outlier-detection
DCSO
Supplementary material for KDD 2018 workshop "DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles"
Stars: ✭ 20 (+25%)
Mutual labels:  outlier-detection
XGBOD
Supplementary material for IJCNN paper "XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning"
Stars: ✭ 59 (+268.75%)
Mutual labels:  outlier-detection
ADRepository-Anomaly-detection-datasets
ADRepository: Real-world anomaly detection datasets
Stars: ✭ 77 (+381.25%)
Mutual labels:  outlier-detection
drama
Main component extraction for outlier detection
Stars: ✭ 17 (+6.25%)
Mutual labels:  outlier-detection
Feature-Engineering-for-Fraud-Detection
Implementation of feature engineering from Feature engineering strategies for credit card fraud
Stars: ✭ 31 (+93.75%)
Mutual labels:  local-outlier-factor
RobustModels.jl
A Julia package for robust regressions using M-estimators and quantile regressions
Stars: ✭ 18 (+12.5%)
Mutual labels:  outlier-detection
DGFraud-TF2
A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X
Stars: ✭ 84 (+425%)
Mutual labels:  outlier-detection

Spark-LOF

In anomaly detection, the local outlier factor(LOF) algorithm is based on a concept of a local density, where locality is given by k nearest neighbors, whose distance is used to estimate the density. By comparing the local density of an object to the local densities of its neighbors, one can identify regions of similar density, and points that have a substantially lower density than their neighbors. Due to the local approach, LOF is able to identify outliers in a data set that would not be outliers in another area of the data set. Spark-LOF is a parallel implementation of local outlier factor based on Spark.

Examples

Scala API

val spark = SparkSession
  .builder()
  .appName("LOFExample")
  .master("local[4]")
  .getOrCreate()

val schema = new StructType(Array(
  new StructField("col1", DataTypes.DoubleType),
  new StructField("col2", DataTypes.DoubleType)))
val df = spark.read.schema(schema).csv("data/outlier.csv")

val assembler = new VectorAssembler()
  .setInputCols(df.columns)
  .setOutputCol("features")
val data = assembler.transform(df).repartition(4)

val startTime = System.currentTimeMillis()
val result = new LOF()
  .setMinPts(5)
  .transform(data)
val endTime = System.currentTimeMillis()
result.count()
    
// Outliers have much higher LOF value than normal data
result.sort(desc("lof")).head(10).foreach { row =>
  println(row.get(0) + " | " + row.get(1) + " | " + row.get(2))
}
println("Total time = " + (endTime - startTime) / 1000.0 + "s")

Requirements

Spark-LOF is built against Spark 3.1.1.

Build From Source

sbt assembly

Licenses

Spark-LOF is available under Apache Licenses 2.0.

Contact & Feedback

If you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].