pranab / Beymani
Hadoop, Spark and Storm based anomaly detection implementations for data quality, cyber security, fraud detection etc.
Stars: ✭ 106
Labels
Projects that are alternatives of or similar to Beymani
Tianchi Antaicup International E Commerce Artificial Intelligence Challenge
1st place solution for the AntaiCup-International-E-commerce-Artificial-Intelligence-Challenge
Stars: ✭ 104 (-1.89%)
Mutual labels: jupyter-notebook
Cc6204
Material del curso de Deep Learning de la Universidad de Chile
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Jupyterworkflow
Reproducible Data Analysis Workflow in Jupyter
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Harry potter nlp
Harry Potter and the Allocation of Dirichlet
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Stream
STREAM: Single-cell Trajectories Reconstruction, Exploration And Mapping of single-cell data
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Time Series Forecasting With Python
A use-case focused tutorial for time series forecasting with python
Stars: ✭ 105 (-0.94%)
Mutual labels: jupyter-notebook
Cnn Yelp Challenge 2016 Sentiment Classification
IPython Notebook for training a word-level Convolutional Neural Network model for sentiment classification task on Yelp-Challenge-2016 review dataset.
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Are You Fake News
Bias detection in the news. Back and front end for areyoufakenews.com
Stars: ✭ 105 (-0.94%)
Mutual labels: jupyter-notebook
Self Driving Car
A End to End CNN Model which predicts the steering wheel angle based on the video/image
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Research Methods For Data Science With Python
Research Methods for Data Science with Python
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Msu Datascience Ml Tutorial 2018
Machine learning with Python tutorial at MSU Data Science 2018
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Mcmc pydata london 2019
PyData London 2019 Tutorial on Markov chain Monte Carlo with PyMC3
Stars: ✭ 105 (-0.94%)
Mutual labels: jupyter-notebook
Caffe Excitationbp
Implementation of Excitation Backprop in Caffe
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Cross Lingual Voice Cloning
Tacotron 2 - PyTorch implementation with faster-than-realtime inference modified to enable cross lingual voice cloning.
Stars: ✭ 106 (+0%)
Mutual labels: jupyter-notebook
Sklearn tutorial
Materials for my scikit-learn tutorial
Stars: ✭ 1,521 (+1334.91%)
Mutual labels: jupyter-notebook
Dog Breeds Classification
Set of scripts and data for reproducing dog breed classification model training, analysis, and inference.
Stars: ✭ 105 (-0.94%)
Mutual labels: jupyter-notebook
Introduction
Beymani consists of set of Hadoop, Spark and Storm based tools for outlier and anamoly detection, which can be used for fraud detection, intrusion detection etc.
Philosophy
- Simple to use
- Input output in CSV format
- Metadata defined in simple JSON file
- Extremely configurable with tons of configuration knobs
Blogs
The following blogs of mine are good source of details of beymani
- http://pkghosh.wordpress.com/2012/01/02/fraudsters-outliers-and-big-data-2/
- http://pkghosh.wordpress.com/2012/02/18/fraudsters-are-not-model-citizens/
- http://pkghosh.wordpress.com/2012/06/18/its-a-lonely-life-for-outliers/
- http://pkghosh.wordpress.com/2012/10/18/relative-density-and-outliers/
- http://pkghosh.wordpress.com/2013/10/21/real-time-fraud-detection-with-sequence-mining/
- https://pkghosh.wordpress.com/2018/09/18/contextual-outlier-detection-with-statistical-modeling-on-spark/
- https://pkghosh.wordpress.com/2018/10/15/learning-alarm-threshold-from-user-feedback-using-decision-tree-on-spark/
- https://pkghosh.wordpress.com/2019/07/25/time-series-sequence-anomaly-detection-with-markov-chain-on-spark/
- https://pkghosh.wordpress.com/2020/09/27/time-series-change-point-detection-with-two-sample-statistic-on-spark-with-application-for-retail-sales-data/
- https://pkghosh.wordpress.com/2020/12/24/concept-drift-detection-techniques-with-python-implementation-for-supervised-machine-learning-models/
- https://pkghosh.wordpress.com/2021/01/20/customer-service-quality-monitoring-with-autoencoder-based-anomalous-case-detection/
Algorithms
- Univarite distribution model
- Multi variate sequence or multi gram distribution model
- Average instance Distance
- Relative instance Density
- Markov chain with sequence data
- Spectral residue for sequence data
- Quantized symbol mapping for sequence data
- Local outlier factor for multivariate data
- Instance clustering
- Sequence clustering
- Change point detection
Getting started
Project's resource directory has various tutorial documents for the use cases described in the blogs.
Build
For Hadoop 1
- mvn clean install
For Hadoop 2 (non yarn)
- git checkout nuovo
- mvn clean install
For Hadoop 2 (yarn)
- git checkout nuovo
- mvn clean install -P yarn
For Spark
- mvn clean install
- sbt publishLocal
- in ./spark sbt clean package
Help
Please feel free to email me at [email protected]
Contribution
Contributors are welcome. Please email me at [email protected]
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].