All Projects → blue-yonder → Tsfresh

blue-yonder / Tsfresh

Licence: mit
Automatic extraction of relevant features from time series:

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Tsfresh

Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (-94.34%)
Mutual labels:  jupyter-notebook, data-science, time-series, feature-extraction
Tsfel
An intuitive library to extract features from time series
Stars: ✭ 202 (-96.68%)
Mutual labels:  data-science, time-series, feature-extraction
H1st
The AI Application Platform We All Need. Human AND Machine Intelligence. Based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more.
Stars: ✭ 697 (-88.53%)
Mutual labels:  jupyter-notebook, data-science, time-series
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (-80.66%)
Mutual labels:  jupyter-notebook, data-science, feature-extraction
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-98.58%)
Mutual labels:  jupyter-notebook, data-science, feature-extraction
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (-96.05%)
Mutual labels:  jupyter-notebook, data-science, feature-extraction
Mckinsey Smartcities Traffic Prediction
Adventure into using multi attention recurrent neural networks for time-series (city traffic) for the 2017-11-18 McKinsey IronMan (24h non-stop) prediction challenge
Stars: ✭ 49 (-99.19%)
Mutual labels:  jupyter-notebook, data-science, time-series
Scipy con 2019
Tutorial Sessions for SciPy Con 2019
Stars: ✭ 142 (-97.66%)
Mutual labels:  jupyter-notebook, data-science, time-series
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-96.41%)
Mutual labels:  jupyter-notebook, data-science, feature-extraction
Pycaret
An open-source, low-code machine learning library in Python
Stars: ✭ 4,594 (-24.4%)
Mutual labels:  jupyter-notebook, data-science, time-series
Data Science
Collection of useful data science topics along with code and articles
Stars: ✭ 315 (-94.82%)
Mutual labels:  jupyter-notebook, data-science, time-series
Feature Selection
Features selector based on the self selected-algorithm, loss function and validation method
Stars: ✭ 534 (-91.21%)
Mutual labels:  data-science, feature-extraction
Intro To Python
An intro to Python & programming for wanna-be data scientists
Stars: ✭ 536 (-91.18%)
Mutual labels:  jupyter-notebook, data-science
Cookbook 2nd Code
Code of the IPython Cookbook, Second Edition, by Cyrille Rossant, Packt Publishing 2018 [read-only repository]
Stars: ✭ 541 (-91.1%)
Mutual labels:  jupyter-notebook, data-science
Data Science Portfolio
Portfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-90.8%)
Mutual labels:  jupyter-notebook, data-science
Data Science Your Way
Ways of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (-91.28%)
Mutual labels:  jupyter-notebook, data-science
Probabilistic Programming And Bayesian Methods For Hackers
aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;)
Stars: ✭ 23,912 (+293.48%)
Mutual labels:  jupyter-notebook, data-science
Data Analysis And Machine Learning Projects
Repository of teaching materials, code, and data for my data analysis and machine learning projects.
Stars: ✭ 5,166 (-14.99%)
Mutual labels:  jupyter-notebook, data-science
Datasets For Recommender Systems
This is a repository of a topic-centric public data sources in high quality for Recommender Systems (RS)
Stars: ✭ 564 (-90.72%)
Mutual labels:  jupyter-notebook, data-science
Telemanom
A framework for using LSTMs to detect anomalies in multivariate time series data. Includes spacecraft anomaly data and experiments from the Mars Science Laboratory and SMAP missions.
Stars: ✭ 589 (-90.31%)
Mutual labels:  jupyter-notebook, time-series

tsfresh

Documentation Status Build Status codecov license py36 status Binder Downloads

This repository contains the TSFRESH python package. The abbreviation stands for

"Time Series Feature extraction based on scalable hypothesis tests".

The package provides systematic time-series feature extraction by combining established algorithms from statistics, time-series analysis, signal processing, and nonlinear dynamics with a robust feature selection algorithm. In this context, the term time-series is interpreted in the broadest possible sense, such that any types of sampled data or even event sequences can be characterised.

Spend less time on feature engineering

Data Scientists often spend most of their time either cleaning data or building features. While we cannot change the first thing, the second can be automated. TSFRESH frees your time spent on building features by extracting them automatically. Hence, you have more time to study the newest deep learning paper, read hacker news or build better models.

Automatic extraction of 100s of features

TSFRESH automatically extracts 100s of features from time series. Those features describe basic characteristics of the time series such as the number of peaks, the average or maximal value or more complex features such as the time reversal symmetry statistic.

The features extracted from a exemplary time series

The set of features can then be used to construct statistical or machine learning models on the time series to be used for example in regression or classification tasks.

Forget irrelevant features

Time series often contain noise, redundancies or irrelevant information. As a result most of the extracted features will not be useful for the machine learning task at hand.

To avoid extracting irrelevant features, the TSFRESH package has a built-in filtering procedure. This filtering procedure evaluates the explaining power and importance of each characteristic for the regression or classification tasks at hand.

It is based on the well developed theory of hypothesis testing and uses a multiple test procedure. As a result the filtering process mathematically controls the percentage of irrelevant extracted features.

The TSFRESH package is described in the following open access paper:

  • Christ, M., Braun, N., Neuffer, J., and Kempa-Liehr A.W. (2018). Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh -- A Python package). Neurocomputing 307 (2018) 72-77, doi: 10.1016/j.neucom.2018.03.067.

The FRESH algorithm is described in the following whitepaper:

  • Christ, M., Kempa-Liehr, A.W., and Feindt, M. (2017). Distributed and parallel time series feature extraction for industrial big data applications. ArXiv e-print 1610.07717, https://arxiv.org/abs/1610.07717.

Due to the fact that tsfresh basically provides time-series feature extraction for free, you can now concentrate on engineering new time-series, like e.g. differences of signals from synchronous measurements, which provide even better time-series features:

  • Kempa-Liehr, A.W., Oram, J., Wong, A., Finch, M., Besier, T. (2020). Feature engineering workflow for activity recognition from synchronized inertial measurement units. In: Pattern Recognition. ACPR 2019. Ed. by M. Cree et al. Vol. 1180. Communications in Computer and Information Science (CCIS). Singapore: Springer 2020, 223–231. doi: 10.1007/978-981-15-3651-9_20.

Systematic time-series features engineering allows to work with time-series samples of different lengths, because every time-series is projected into a well-defined feature space. This approach allows the design of robust machine learning algorithms in applications with missing data.

  • Kennedy, A., Gemma, N., Rattenbury, N., Kempa-Liehr, A.W. (2021). Modelling the projected separation of microlensing events using systematic time-series feature engineering. Astronomy and Computing 35.100460 (2021), 1–14, doi: 10.1016/j.ascom.2021.100460

Natural language processing of written texts is an example of applying systematic time-series feature engineering to event sequences, which is described in the following open access paper:

  • Tang, Y., Blincoe, K., Kempa-Liehr, A.W. (2020). Enriching Feature Engineering for Short Text Samples by Language Time Series Analysis. EPJ Data Science 9.26 (2020), 1–59. doi: 10.1140/epjds/s13688-020-00244-9

Advantages of tsfresh

TSFRESH has several selling points, for example

  1. it is field tested
  2. it is unit tested
  3. the filtering process is statistically/mathematically correct
  4. it has a comprehensive documentation
  5. it is compatible with sklearn, pandas and numpy
  6. it allows anyone to easily add their favorite features
  7. it both runs on your local machine or even on a cluster

Next steps

If you are interested in the technical workings, go to see our comprehensive Read-The-Docs documentation at http://tsfresh.readthedocs.io.

The algorithm, especially the filtering part are also described in the paper mentioned above.

If you have some questions or feedback you can find the developers in the gitter chatroom.

We appreciate any contributions, if you are interested in helping us to make TSFRESH the biggest archive of feature extraction methods in python, just head over to our How-To-Contribute instructions.

If you want to try out tsfresh quickly or if you want to integrate it into your workflow, we also have a docker image available:

docker pull nbraun/tsfresh

Acknowledgements

The research and development of TSFRESH was funded in part by the German Federal Ministry of Education and Research under grant number 01IS14004 (project iPRODICT).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].