All Projects → johnpaparrizos → Kshape

johnpaparrizos / Kshape

Licence: mit
Python implementation of k-Shape

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kshape

Scipy con 2019
Tutorial Sessions for SciPy Con 2019
Stars: ✭ 142 (-12.35%)
Mutual labels:  time-series
Asap
ASAP: Prioritizing Attention via Time Series Smoothing
Stars: ✭ 151 (-6.79%)
Mutual labels:  time-series
Celerite
Scalable 1D Gaussian Processes in C++, Python, and Julia
Stars: ✭ 155 (-4.32%)
Mutual labels:  time-series
Anomaly detection tuto
Anomaly detection tutorial on univariate time series with an auto-encoder
Stars: ✭ 144 (-11.11%)
Mutual labels:  time-series
Forecasting
Time Series Forecasting Best Practices & Examples
Stars: ✭ 2,123 (+1210.49%)
Mutual labels:  time-series
Pyfts
An open source library for Fuzzy Time Series in Python
Stars: ✭ 154 (-4.94%)
Mutual labels:  time-series
Friartuck
Live Quant Trading Framework for Robinhood, using IEX Trading and AlphaVantage for Free Prices.
Stars: ✭ 142 (-12.35%)
Mutual labels:  time-series
Influxdb.net
Cross-platform .NET library for InfluxDB distributed time-series database.
Stars: ✭ 159 (-1.85%)
Mutual labels:  time-series
Gluon Ts
Probabilistic time series modeling in Python
Stars: ✭ 2,373 (+1364.81%)
Mutual labels:  time-series
Pyflux
Open source time series library for Python
Stars: ✭ 1,932 (+1092.59%)
Mutual labels:  time-series
Hurst
Hurst exponent evaluation and R/S-analysis in Python
Stars: ✭ 148 (-8.64%)
Mutual labels:  time-series
Vde
Variational Autoencoder for Dimensionality Reduction of Time-Series
Stars: ✭ 148 (-8.64%)
Mutual labels:  time-series
Java Timeseries
Time series analysis in Java
Stars: ✭ 155 (-4.32%)
Mutual labels:  time-series
Tscv
Time Series Cross-Validation -- an extension for scikit-learn
Stars: ✭ 145 (-10.49%)
Mutual labels:  time-series
Skits
scikit-learn-inspired time series
Stars: ✭ 158 (-2.47%)
Mutual labels:  time-series
Sweep
Extending broom for time series forecasting
Stars: ✭ 143 (-11.73%)
Mutual labels:  time-series
Adaptive Alerting
Anomaly detection for streaming time series, featuring automated model selection.
Stars: ✭ 152 (-6.17%)
Mutual labels:  time-series
Khiva
An open-source library of algorithms to analyse time series in GPU and CPU.
Stars: ✭ 161 (-0.62%)
Mutual labels:  time-series
Motion Sense
MotionSense Dataset for Human Activity and Attribute Recognition ( time-series data generated by smartphone's sensors: accelerometer and gyroscope)
Stars: ✭ 159 (-1.85%)
Mutual labels:  time-series
Java Deep Learning Cookbook
Code for Java Deep Learning Cookbook
Stars: ✭ 156 (-3.7%)
Mutual labels:  time-series

k-Shape

Build Status

Python implementation of k-Shape, a new fast and accurate unsupervised time-series cluster algorithm. See also

We used this implementation for our paper: Sieve: Actionable Insights from Monitored Metrics in Distributed Systems

Installation

kshape is available on PyPI https://pypi.python.org/pypi/kshape

$ pip install kshape

Install from source

If you are using a virtualenv activate it. Otherwise you can install into the system python

$ python setup.py install

Usage

from kshape.core import kshape, zscore

time_series = [[1,2,3,4], [0,1,2,3], [0,1,2,3], [1,2,2,3]]
cluster_num = 2
clusters = kshape(zscore(time_series, axis=1), cluster_num)
#=> [(array([-1.161895  , -0.38729833,  0.38729833,  1.161895  ]), [0, 1, 2]),
#    (array([-1.22474487,  0.        ,  0.        ,  1.22474487]), [3])]

Returns list of tuples with the clusters found by kshape. The first value of the tuple is zscore normalized centroid. The second value of the tuple is the index of assigned series to this cluster. The results can be examined by drawing graphs of the zscore normalized values and the corresponding centroid.

Gotchas when working with real-world time series

  • If the data is available from different sources with same frequency but at different points in time, it needs to be aligned.
  • In the following a tab seperated file is assumed, where each column is a different observation; gapps in columns happen, when only a certain value at this point in time was obtained.
import pandas as pd
# assuming the time series are stored in a tab seperated file, where `time` is
# the name of the column containing the timestamp
df = pd.read_csv(filename, sep="\t", index_col='time', parse_dates=True)
# use a meaningful sample size depending on how the frequency of your time series:
# Higher is more accurate, but if series gets too long, the calculation gets cpu and memory intensive.
# Keeping the length below 2000 values is usually a good idea.
df = df.resample("500ms").mean()
df.interpolate(method="time", limit_direction="both", inplace=True)
df.fillna(method="bfill", inplace=True)
  • kshape also expect no time series with a constant observation value or 'n/a'
time_series = []
for f in df.columns:
  if not df[f].isnull().any() and df[f].var() != 0:
    time_series.append[df[f]]

Relevant Articles

Original paper

Paparrizos J and Gravano L (2015).
k-Shape: Efficient and Accurate Clustering of Time Series.
In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, series SIGMOD '15,
pp. 1855-1870. ISBN 978-1-4503-2758-9, http://doi.org/10.1145/2723372.2737793. '

Our paper where we used the python implementation

@article{sieve-middleware-2017,
  author       = {J{\"o}rg Thalheim, Antonio Rodrigues, Istemi Ekin Akkus, Pramod Bhatotia, Ruichuan Chen, Bimal Viswanath, Lei Jiao, Christof Fetzer},
  title        = {Sieve: Actionable Insights from Monitored Metrics in Distributed Systems}
  booktitle    = {Proceedings of Middleware Conference (Middleware)},
  year         = {2017},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].