All Projects → MentatInnovations → Datastream.io

MentatInnovations / Datastream.io

Licence: apache-2.0
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Datastream.io

Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+51.47%)
Mutual labels:  elasticsearch, timeseries, anomaly-detection, kibana
Anomaly Detection
anomaly detection with anomalize and Google Trends data
Stars: ✭ 38 (-95.33%)
Mutual labels:  datascience, machinelearning, anomalydetection, anomaly-detection
Openuba
A robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-84.4%)
Mutual labels:  datascience, elasticsearch, anomaly-detection, sklearn
Covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Stars: ✭ 208 (-74.45%)
Mutual labels:  data-science, dataset, dashboard
Responsible Ai Widgets
This project provides responsible AI user interfaces for Fairlearn, interpret-community, and Error Analysis, as well as foundational building blocks that they rely on.
Stars: ✭ 107 (-86.86%)
Mutual labels:  data-science, jupyter, machinelearning
Boostaroota
A fast xgboost feature selection algorithm
Stars: ✭ 165 (-79.73%)
Mutual labels:  data-science, datascience, machinelearning
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-69.29%)
Mutual labels:  data-science, datascience, machinelearning
Igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
Stars: ✭ 2,956 (+263.14%)
Mutual labels:  data-science, machinelearning, sklearn
Notebooks Statistics And Machinelearning
Jupyter Notebooks from the old UnsupervisedLearning.com (RIP) machine learning and statistics blog
Stars: ✭ 270 (-66.83%)
Mutual labels:  data-science, datascience, machinelearning
Hastic Server
Hastic data management server for analyzing patterns and anomalies from Grafana
Stars: ✭ 292 (-64.13%)
Mutual labels:  elasticsearch, timeseries, anomaly-detection
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (-64.74%)
Mutual labels:  data-science, datascience, machinelearning
Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: ✭ 3,191 (+292.01%)
Mutual labels:  data-science, jupyter, machinelearning
Repo2docker Action
GitHub Action for repo2docker
Stars: ✭ 88 (-89.19%)
Mutual labels:  data-science, jupyter, datascience
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-78.99%)
Mutual labels:  data-science, dataset, datascience
Elastic
Elastic Stack (6.2.4) 을 활용한 Dashboard 만들기 Project
Stars: ✭ 121 (-85.14%)
Mutual labels:  elasticsearch, dashboard, kibana
Bowtie
Create a dashboard with python!
Stars: ✭ 724 (-11.06%)
Mutual labels:  data-science, jupyter, dashboard
Hastic Grafana App
Hastic data management server for labeling patterns and anomalies in Grafana
Stars: ✭ 166 (-79.61%)
Mutual labels:  timeseries, anomaly-detection, dashboard
Pivot Kibana
Flexmonster Pivot plugin for Kibana
Stars: ✭ 58 (-92.87%)
Mutual labels:  elasticsearch, dashboard, kibana
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-65.23%)
Mutual labels:  data-science, dataset, datascience
Kbn network
Network Plugin for Kibana
Stars: ✭ 339 (-58.35%)
Mutual labels:  elasticsearch, dashboard, kibana

datastream.io

An open-source framework for real-time anomaly detection using Python, Elasticsearch and Kibana.

Installation

The recommended installation method is to use pip within a Python 3.x virtalenv.

virtualenv --python=python3 dsio-env
source dsio-env/bin/activate
pip install -e git+https://github.com/MentatInnovations/datastream.io#egg=dsio

Usage

You can use dsio through the command line or import it in your Python code. You can visualize your data streams using the built-in Bokeh server or you can restream them to Elasticsearch and visualize them with Kibana. In either case, dsio will generate an appropriate dashboard for your stream. Also, if you invoke dsio through a Jupyter notebook, it will embed the streaming Bokeh dashboard within the same notebook.

Jupyter

Examples

For this section, it is best to run commands from inside the examples directory. If you have installed dsio via pip as demonstrated above, you'd need to run the following command:

cd dsio-env/src/dsio/examples

If instead you cloned the github repo then just cd dsio/examples will do.

You can use the example csv datasets or provide your own. If the dataset includes a time dimension, dsio will attempt to detect it automatically. Alternatively, you can use the --timefield argument to manually configure the field that designates the time dimension. If no such field exists, dsio will assume the data is a time series starting from now with 1sec intervals between samples.

dsio data/cardata_sample.csv

The above command will load the cardata sample csv and will use the default Gaussian1D anomaly detector to apply scores on every numeric column. Then it will generate an appropriate Bokeh dashboard and restream the data. A browser window should open that will point to the generated dashboard.

Bokeh

You can experiment with different datasets and anomaly detectors. E.g.

dsio --detector percentile1d path_to_my_dataset/my_dataset.csv

You can select specific columns using the --sensors argument and you can increase or decrease the streaming speed using the --speed argument.

dsio --sensors accelerator_pedal_position engine_speed --detector gaussian1d --speed 5 data/cardata_sample.csv

Elasticsearch & Kibana (optional)

In order to restream to an Elasticsearch instance that you're running locally and generate a Kibana dashboard you can use the --es-uri and --kibana-uri arguments.

dsio --es-uri http://localhost:9200/ --kibana-uri http://localhost:5601/app/kibana data/cardata_sample.csv

If you are using localhost and the default Kibana and ES ports, you can use the shorthand:

dsio --es data/cardata_sample.csv

ElasticKibana

If you don't have access to Elasticsearch and Kibana 5.x instances, you can easily start them up in your machine using the docker-compose.yaml file within the examples directory. Docker and docker-compose need to be installed for this to work.

docker-compose up -d

Check that Elasticsearch and Kibana are up.

docker-compose ps

Once you're done you can bring them down.

docker-compose down

Keep in mind that docker-compose commands need to be run in the directory where the docker-compose.yaml file resides (e.g. dsio-env/src/dsio/examples)

Defining your own anomaly detectors

You can use dsio with your own hand coded anomaly detectors. These should inherit from the AnomalyDetector abstract base class and implement at least the train, update & score methods. You can find an example 99th percentile anomaly detector in the examples dir. Load the python modules that contain your detectors using the --modules argument and select the target detector by providing its class name to the --detector argument (case insensitive).

dsio  --modules detector.py --detector GreaterThanMaxRolling data/cardata_sample.csv

Integration with scikit-learn

Naturally we encourage people to use dsio in combination with sklearn: we have no wish to reinvent the wheel! However, sklearn currently supports regression, classification and clustering interfaces, but not anomaly detection as a standalone category. We are trying to correct that by the introduction of the AnomalyMixin: an interface for anomaly detection which follows sklearn design patterns. When you import an sklearn object you can therefore simply define or override certain methods to make it compatible with dsio. We have provided an example for you here:

./datamstream.io/examples/lof_anomaly_detector.py
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].