All Projects → antoinecarme → Pyaf

antoinecarme / Pyaf

Licence: bsd-3-clause
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pyaf

Machinelearningcourse
A collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-87.89%)
Mutual labels:  jupyter, pandas, scikit-learn
Sktime
A unified framework for machine learning with time series
Stars: ✭ 4,741 (+1540.48%)
Mutual labels:  time-series, scikit-learn, forecasting
Choochoo
Training Diary
Stars: ✭ 186 (-35.64%)
Mutual labels:  jupyter, pandas, time-series
Crime Analysis
Association Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-93.08%)
Mutual labels:  jupyter, pandas, scikit-learn
Practical Machine Learning With Python
Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.
Stars: ✭ 1,868 (+546.37%)
Mutual labels:  jupyter, pandas, scikit-learn
Cheatsheets.pdf
📚 Various cheatsheets in PDF
Stars: ✭ 159 (-44.98%)
Mutual labels:  jupyter, pandas, scikit-learn
five-minute-midas
Predicting Profitable Day Trading Positions using Decision Tree Classifiers. scikit-learn | Flask | SQLite3 | pandas | MLflow | Heroku | Streamlit
Stars: ✭ 41 (-85.81%)
Mutual labels:  heroku, scikit-learn, pandas
gpu accelerated forecasting modeltime gluonts
GPU-Accelerated Deep Learning for Time Series using Modeltime GluonTS (Learning Lab 53). Event sponsors: Saturn Cloud, NVIDIA, & Business Science.
Stars: ✭ 20 (-93.08%)
Mutual labels:  time-series, forecasting
time-series-autoencoder
📈 PyTorch dual-attention LSTM-autoencoder for multivariate Time Series 📈
Stars: ✭ 198 (-31.49%)
Mutual labels:  time-series, forecasting
Geopython
Notebooks and libraries for spatial/geo Python explorations
Stars: ✭ 268 (-7.27%)
Mutual labels:  jupyter, pandas
magi
📈 high level wrapper for parallel univariate time series forecasting 📉
Stars: ✭ 17 (-94.12%)
Mutual labels:  time-series, forecasting
ts-forecasting-ensemble
CentOS based Docker container for Time Series Analysis and Modeling.
Stars: ✭ 19 (-93.43%)
Mutual labels:  time-series, forecasting
astetik
Astetik takes away the pain from telling visual stories with data on Python
Stars: ✭ 15 (-94.81%)
Mutual labels:  jupyter, pandas
skippa
SciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (-88.58%)
Mutual labels:  scikit-learn, pandas
A-Detector
⭐ An anomaly-based intrusion detection system.
Stars: ✭ 69 (-76.12%)
Mutual labels:  scikit-learn, pandas
Arch-Data-Science
Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Stars: ✭ 92 (-68.17%)
Mutual labels:  scikit-learn, pandas
modeltime.gluonts
GluonTS Deep Learning with Modeltime
Stars: ✭ 31 (-89.27%)
Mutual labels:  time-series, forecasting
DataSciPy
Data Science with Python
Stars: ✭ 15 (-94.81%)
Mutual labels:  scikit-learn, pandas
Code
Compilation of R and Python programming codes on the Data Professor YouTube channel.
Stars: ✭ 287 (-0.69%)
Mutual labels:  pandas, scikit-learn
epic-kitchens-55-starter-kit-action-recognition
🌱 Starter kit for working with the EPIC-KITCHENS-55 dataset for action recognition or anticipation
Stars: ✭ 40 (-86.16%)
Mutual labels:  jupyter, pandas

PyAF (Python Automatic Forecasting)

Build Status

PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python modules : numpy, scipy, pandas and scikit-learn.

PyAF works as an automated process for predicting future values of a signal using a machine learning approach. It provides a set of features that is comparable to some popular commercial automatic forecasting products.

PyAF has been developed, tested and benchmarked using a python 3.x version.

PyAF is distributed under the 3-Clause BSD license.

Demo

also availabe as a jupyter notebook

import numpy as np
import pandas as pd

# generate a daily signal covering one year 2016 in a pandas dataframe
N = 360
df_train = pd.DataFrame({"Date" : pd.date_range(start="2016-01-25", periods=N, freq='D'),
                         "Signal" : (np.arange(N)//40 + np.arange(N) % 21 + np.random.randn(N))})

import pyaf.ForecastEngine as autof
# create a forecast engine. This is the main object handling all the operations
lEngine = autof.cForecastEngine()

# get the best time series model for predicting one week
lEngine.train(iInputDS = df_train, iTime = 'Date', iSignal = 'Signal', iHorizon = 7);
lEngine.getModelInfo() # => relative error 7% (MAPE)

# predict one week
df_forecast = lEngine.forecast(iInputDS = df_train, iHorizon = 7)
# list the columns of the forecast dataset
print(df_forecast.columns) #

# print the real forecasts
# Future dates : ['2017-01-19T00:00:00.000000000' '2017-01-20T00:00:00.000000000' '2017-01-21T00:00:00.000000000' '2017-01-22T00:00:00.000000000' '2017-01-23T00:00:00.000000000' '2017-01-24T00:00:00.000000000' '2017-01-25T00:00:00.000000000']
print(df_forecast['Date'].tail(7).values)

# signal forecast : [ 9.74934646  10.04419761  12.15136455  12.20369717  14.09607727 15.68086323  16.22296559]
print(df_forecast['Signal_Forecast'].tail(7).values)

Features

PyAF allows forecasting a time series (or a signal) for future values in a fully automated way. To build forecasts, PyAF allows using time information (by identifying long-term evolution and periodic patterns), analyzes the past of the signal, exploits exogenous data (user-provided time series that may be correlated with the signal) as well as the hierarchical structure of the signal (by aggregating spatial components forecasts, for example)

PyAF uses Pandas as a data access layer. It consumes data coming from a pandas data- frame (with time and signal columns), builds a time series model, and outputs the forecasts in a pandas data-frame. Pandas is an excellent data access layer, it allows reading/writing a huge set of file formats, accessing various data sources (databases) and has an extensive set of algorithms to handle data- frames (aggregation, statistics, linear algebra, plotting etc).

PyAF statistical time series models are built/estimated/trained using scikit-learn library.

The following features are available :

  1. Training a model to forecast a time series (given in a pandas data-frame with time and signal columns).
    • PyAF uses a machine learning approach (The signal is cut into Estimation and validation parts, respectively, 80% and 20% of the signal).
    • A time-series cross-validation can also be used.
  2. Forecasting a time series model on a given horizon (forecast result is also pandas data-frame) and providing prediction/confidence intervals for the forecasts.
  3. Generic training features
    • Signal decomposition as the sum of a trend, periodic and AR component
    • PyAF works as a competition between a comprehensive set of possible signal transformations and linear decompositions. For each transformed signal , a set of possible trends, periodic components and AR models is generated and all the possible combinations are estimated. The best decomposition in term of performance is kept to forecast the signal (the performance is computed on a part of the signal that was not used for the estimation).
    • Signal transformation is supported before signal decompositions. Four transformations are supported by default. Other transformation are available (Box-Cox etc).
    • All Models are estimated using standard procedures and state-of-the-art time series modeling. For example, trend regressions and AR/ARX models are estimated using scikit-learn linear regression models.
    • Standard performance measures are used (L1, RMSE, MAPE, etc)
  4. PyAF analyzes the time variable and infers the frequency from the data.
    • Natural time frequencies are supported : Minute, Hour, Day, Week, Month.
    • Strange frequencies like every 3.2 days or every 17 minutes are supported if data are recorded accordingly (every other Monday => two weeks frequency).
    • The frequency is computed as the mean duration between consecutive observations by default (as a pandas DateOffset).
    • The frequency is used to generate values for future dates automatically.
    • PyAF does its best when dates are not regularly observed. Time frequency is approximate is this case.
    • Real/Integer valued (fake) dates are also supported and handled in a similar way.
  5. Exogenous Data Support
    • Exogenous data can be provided to improve the forecasts. These are expected to be stored in an external data-frame (this data-frame will be merged with the training data-frame).
    • Exogenous data are integrated in the modeling process through their past values (ARX models).
    • Exogenous variables can be of any type (numeric, string , date, or object).
    • Exogenous variables are dummified for the non-numeric types, and standardized for the numeric types.
  6. PyAF implements Hierarchical Forecasting. It follows the excellent approach used in Rob J Hyndman and George Athanasopoulos book. Thanks @robjhyndman
    • Hierarchies and grouped time series are supported.
    • Bottom-Up, Top-Down (using proportions), Middle-Out and Optimal Combinations are implemented.
  7. The modeling process is customizable and has a huge set of options. The default values of these options should however be OK to produce a reasonable quality model in a limited amount of time (a few minutes).
    • These options give access to a full set of signal transformations and AR-like models that are not enabled by default.
    • Gives rise to Logit , Fisher transformations as well as XGBoost, Support Vectort Regressions and Croston intermittent models, among others.
    • By default , PyAF uses a fast mode that activates many popular models. It is also possible to activate a slow mode, in which pyaf explores all possible models.
    • Specific models and features can be customized.
  8. A benchmarking process is in place (using M1, M2, M3 competitions, NN3, NN5 forecasting competitions).
    • This process will be used to control the quality of modeling changes introduced in the future versions of PyAF. A related github issue is created.
    • Benchmarks data/reports are saved in a separate github repository.
    • Sample benchmark report with 1001 datasets from the M1 Forecasting Competition.
  9. Basic plotting functions using matplotlib with standard time series and forecasts plots.
  10. Software Quality Highlights
    • An object-oriented approach is used for the system design. Separation of concerns is the key factor here.
    • Fully written in python with numpy, scipy, pandas and scikit-learn objects. Tries to be column-based everywhere for performance reasons (respecting some modeling time and memory constraints).
    • Internally using a fit/predict pattern , inspired by scikit-learn, to estimate/forecast the different signal components (trends, cycles and AR models).
    • A test-driven approach (TDD) is used. Test scripts are available in the tests directory, one directory for each feature.
    • TDD implies that even the most recent features have some sample scripts in this directory. Want to know how to use cross-validation with PyAF? here are some scripts.
    • Some jupyter notebooks are available for demo purposes with standard time series and forecasts plots.
    • Very simple API for training and forecasting.
  11. A basic RESTful Web Service (Flask) is available.
    • This service allows building a time series model, forecasting future data and some standard plots by providing a minimal specification of the signal in the JSON request body (at least a link to a csv file containing the data).
    • See this doc and the related github issue for more details.

PyAF is a work in progress. The set of features is evolving. Your feature requests, comments, help, hints are very welcome.

Installation

PyAF has been developed, tested and used on a python 3.x version.

It can be installed from PyPI for the latest official release:

pip install pyaf

The development version is also available by executing :

pip install scipy pandas sklearn matplotlib pydot dill sqlalchemy xgboost
pip install --upgrade git+git://github.com/antoinecarme/pyaf.git

Development

Code contributions are welcome. Bug reports, request for new features and documentation, tests are welcome. Please use GitHub platform for these tasks.

You can check the latest sources of PyAF from GitHub with the command::

git clone http://github.com/antoinecarme/pyaf.git

Project history

This project was started in summer 2016 as a POC to check the feasibility of an automatic forecasting tool based only on python available data science software (numpy, scipy, pandas, scikit-learn etc).

See the AUTHORS.rst file for a complete list of contributors.

Help and Support

PyAF is currently maintained by the original developer. PyAF support will be provided when possible and even if you are not creating an issue, you are encouraged to follow these guidelines.

Bug reports, Improvement requests, Documentation, Hints and Test scripts are welcome. Please use the GitHub platform for these tasks.

For your commercial forecasting projects, please consider using the services of a forecasting expert near you (Be it an R or a Python expert).

Documentation

An introductory notebook to the time series forecasting with PyAF is available here. It contains some real-world examples and use cases.

A specific notebook describing the use of exogenous data is available here

Notebooks describing an example of hierarchical forecasting models are available for Signal Hierarchies and for Grouped Signals

The python code is not yet fully documented. This is a top priority (TODO).

Communication

Comments , appreciations, remarks , etc .... are welcome. Your feedback is welcome if you use this library in a project or a publication.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].