All Projects → binli826 → Lstm Autoencoders

binli826 / Lstm Autoencoders

Anomaly detection for streaming data using autoencoders

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Lstm Autoencoders

Ad examples
A collection of anomaly detection methods (iid/point-based, graph and time series) including active learning for anomaly detection/discovery, bayesian rule-mining, description for diversity/explanation/interpretability. Analysis of incorporating label feedback with ensemble and tree-based detectors. Includes adversarial attacks with Graph Convolutional Network.
Stars: ✭ 641 (+467.26%)
Mutual labels:  timeseries, lstm, autoencoder
Repo 2016
R, Python and Mathematica Codes in Machine Learning, Deep Learning, Artificial Intelligence, NLP and Geolocation
Stars: ✭ 103 (-8.85%)
Mutual labels:  timeseries, lstm, autoencoder
video autoencoder
Video lstm auto encoder built with pytorch. https://arxiv.org/pdf/1502.04681.pdf
Stars: ✭ 32 (-71.68%)
Mutual labels:  lstm, autoencoder
Dancenet
DanceNet -💃💃Dance generator using Autoencoder, LSTM and Mixture Density Network. (Keras)
Stars: ✭ 469 (+315.04%)
Mutual labels:  lstm, autoencoder
Lstm Fcn
Codebase for the paper LSTM Fully Convolutional Networks for Time Series Classification
Stars: ✭ 482 (+326.55%)
Mutual labels:  timeseries, lstm
Timeseries Clustering Vae
Variational Recurrent Autoencoder for timeseries clustering in pytorch
Stars: ✭ 190 (+68.14%)
Mutual labels:  timeseries, autoencoder
dltf
Hands-on in-person workshop for Deep Learning with TensorFlow
Stars: ✭ 14 (-87.61%)
Mutual labels:  lstm, autoencoder
Anomalydetection
Twitter's Anomaly Detection in Pure Python
Stars: ✭ 225 (+99.12%)
Mutual labels:  timeseries, anomalydetection
Tensorflow Lstm Sin
TensorFlow 1.3 experiment with LSTM (and GRU) RNNs for sine prediction
Stars: ✭ 52 (-53.98%)
Mutual labels:  timeseries, lstm
Timecop
Time series based anomaly detector
Stars: ✭ 65 (-42.48%)
Mutual labels:  timeseries, lstm
Aialpha
Use unsupervised and supervised learning to predict stocks
Stars: ✭ 1,191 (+953.98%)
Mutual labels:  lstm, autoencoder
Sequitur
Library of autoencoders for sequential data
Stars: ✭ 162 (+43.36%)
Mutual labels:  lstm, autoencoder
Predictive Maintenance Using Lstm
Example of Multiple Multivariate Time Series Prediction with LSTM Recurrent Neural Networks in Python with Keras.
Stars: ✭ 352 (+211.5%)
Mutual labels:  timeseries, lstm
Datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+620.35%)
Mutual labels:  timeseries, anomalydetection
Sax
Java implementation of SAX, HOT-SAX, and EMMA
Stars: ✭ 78 (-30.97%)
Mutual labels:  timeseries, anomalydetection
Keras Video Classifier
Keras implementation of video classifier
Stars: ✭ 100 (-11.5%)
Mutual labels:  timeseries, lstm
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+9723.01%)
Mutual labels:  lstm
Tsmoothie
A python library for time-series smoothing and outlier detection in a vectorized way.
Stars: ✭ 109 (-3.54%)
Mutual labels:  timeseries
Ml Ai Experiments
All my experiments with AI and ML
Stars: ✭ 107 (-5.31%)
Mutual labels:  lstm
Timeseriesadmin
Administration panel and querying interface for InfluxDB databases. (Electron app / Docker container)
Stars: ✭ 107 (-5.31%)
Mutual labels:  timeseries

Anomaly detection for streaming data using autoencoders

This project is my master thesis. The main target is to maintain an adaptive autoencoder-based anomaly detection framework that is able to not only detect contextual anomalies from streaming data, but also update itself according to the latest data feature.

Quick access

Introduction

The high-volume and -velocity data stream generated from devices and applications from different domains grows steadily and is valuable for big data research. One of the most important topics is anomaly detection for streaming data, which has attracted attention and investigation in plenty of areas, e.g., the sensor data anomaly detection, predictive maintenance, event detection. Those efforts could potentially avoid large amount of financial costs in the manufacture. However, different from traditional anomaly detection tasks, anomaly detection in streaming data is especially difficult due to that data arrives along with the time with latent distribution changes, so that a single stationary model doesn’t fit streaming data all the time. An anomaly could become normal during the data evolution, therefore it is necessary to maintain a dynamic system to adapt the changes. In this work, we propose a LSTMs-Autoencoder anomaly detection model for streaming data. This is a mini-batch based streaming processing approach. We experimented with streaming data that containing different kinds of anomalies as well as concept drifts, the results suggest that our model can sufficiently detect anomaly from data stream and update model timely to fit the latest data property.

Model

LSTM-Autoencoder

The LSTM-Autoencoder is based on the work of Malhotra et al. There are two LSTM units, one as encoder and the other one as decoder. Model will only be trained with normal data, so the reconstruction of anomalies is supposed to lead higher reconstruction error.

LSTM-Autoencoder

Input/Output format

< Batch size, Time steps, Data dimensions >
Batch size: Number of windows contained in a single batch
Time steps: Number of instances within a window (T)
Data dimensions: Size of feature space

Online framework

Once the LSTM-Autoencoder is initialized with a subset of respective data streams, it is used for the online anomaly detection. For each accumulated batch of streaming data, the model predict each window as normal or anomaly. Afterwards, we introduce experts to label the windows and evaluate the performance. Hard windows will be appended into the updating buffers. Once the normal buffer is full, there will the a continue training of LSTM-Autoencoders only with the hard windows in the buffers.

Online framework

Datasets

The model is experimenced with 5 datasets. PowerDemand dataset records the power demand over one year, the unnormal power demand on special days (e.g. festivals, christmas etc.) are labeled as anomalies. SMTP and HTTP are extracted from the KDDCup99 dataset. SMTP+HTTP is a direct connection of SMTP and HTTP, in order to simulate a concept drift in between. Here treat the network attacks as anomalies. FOREST dataset records statistics of 7 different forest cover types. We follow the same setting as Dong et al., take the smallest class Cottonwood/Willow as anomaly. The following table shows statistical information of each dataset.(Only numerical features are taken into consideration)

Dataset Dimensionality #Instances Anomaly proportion (%)
PowerDemand 1 35040 2.20
SMTP 34 96554 1.22
HTTP 34 623 091 0.65
SMTP+HTTP 34 719 645 0.72
FOREST 7 581 012 0.47

Results

Here is an reconstruction example of a normal window and an anomaly window of the PowerDemand data. Reconstruction example

With AUC as evaluation metric, we got following performance of the data stream anomaly detection.

Dataset AUC without updating AUC with updating #Updating
PowerDemand 0.91 0.97 2
SMTP 0.94 0.98 2
HTTP 0.76 0.86 2
SMTP+HTTP 0.64 0.85 3
FOREST 0.74 0.82 8

Usage

Data preparation

Once datasets avaliable, covert the raw data into uniform format using dataPreparation.py.

python /src/Initialization/dataPreparation.py dataset inputpath outputpath --powerlabel --kddcol
# Example
python /src/Initialization/dataPreparation.py kdd /mypath/kddcup.data.corrected /mypath --kddcol /mypath/columns.txt

Initialization

With the processed dataset, the model initialization phase can be processed by following command, with figuring out the dataset to use, the data path, and a folder path to save the trained model.

python /src/Initialization/initialization.py dataset  dataPath  modelSavePath
# Example
python /src/Initialization/initialization.py smtp  /mypath/smtp.csv    /mypath/models/

Online prediction

Once data are prepared and model is initializated and saved locally, the online prediction process can be executed as follow,

python /src/OnlinePrediction/OnlinePrediction.py datasetname  dataPath  modelPath
# Example
python /src/OnlinePrediction/OnlinePrediction.py  smtp  /mypath/smtp.csv    /mypath/model_smtp/

About hyper-parameters

Hyper-parameters are leared by grid search with respect to each dataset, and can be modified in conf_init.py and conf_online.py

Versions

This project works with

  • Python 3.6
  • Tensorflow 1.4.0
  • Numpy 1.13.3
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].