All Projects → winedarksea → AutoTS

winedarksea / AutoTS

Licence: MIT license
Automated Time Series Forecasting

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to AutoTS

Forecasting
Time Series Forecasting Best Practices & Examples
Stars: ✭ 2,123 (+219.25%)
Mutual labels:  time-series, forecasting, automl
Luminaire
Luminaire is a python package that provides ML driven solutions for monitoring time series data.
Stars: ✭ 316 (-52.48%)
Mutual labels:  time-series, forecasting, automl
Merlion
Merlion: A Machine Learning Framework for Time Series Intelligence
Stars: ✭ 2,368 (+256.09%)
Mutual labels:  time-series, forecasting, automl
Statespacemodels.jl
StateSpaceModels.jl is a Julia package for time-series analysis using state-space models.
Stars: ✭ 116 (-82.56%)
Mutual labels:  time-series, forecasting
Neural prophet
NeuralProphet - A simple forecasting model based on Neural Networks in PyTorch
Stars: ✭ 1,125 (+69.17%)
Mutual labels:  time-series, forecasting
Anticipy
A Python library for time series forecasting
Stars: ✭ 71 (-89.32%)
Mutual labels:  time-series, forecasting
Gluon Ts
Probabilistic time series modeling in Python
Stars: ✭ 2,373 (+256.84%)
Mutual labels:  time-series, forecasting
ForestCoverChange
Detecting and Predicting Forest Cover Change in Pakistani Areas Using Remote Sensing Imagery
Stars: ✭ 23 (-96.54%)
Mutual labels:  time-series, forecasting
Pyfts
An open source library for Fuzzy Time Series in Python
Stars: ✭ 154 (-76.84%)
Mutual labels:  time-series, forecasting
Modeltime
Modeltime unlocks time series forecast models and machine learning in one framework
Stars: ✭ 189 (-71.58%)
Mutual labels:  time-series, forecasting
Introduction To Time Series Forecasting Python
Introduction to time series preprocessing and forecasting in Python using AR, MA, ARMA, ARIMA, SARIMA and Prophet model with forecast evaluation.
Stars: ✭ 173 (-73.98%)
Mutual labels:  time-series, forecasting
Auto ts
Automatically build ARIMA, SARIMAX, VAR, FB Prophet and XGBoost Models on Time Series data sets with a Single Line of Code. Now updated with Dask to handle millions of rows.
Stars: ✭ 195 (-70.68%)
Mutual labels:  time-series, automl
Msgarch
MSGARCH R Package
Stars: ✭ 51 (-92.33%)
Mutual labels:  time-series, forecasting
Pmdarima
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Stars: ✭ 838 (+26.02%)
Mutual labels:  time-series, forecasting
Forecastml
An R package with Python support for multi-step-ahead forecasting with machine learning and deep learning algorithms
Stars: ✭ 101 (-84.81%)
Mutual labels:  time-series, forecasting
Informer2020
The GitHub repository for the paper "Informer" accepted by AAAI 2021.
Stars: ✭ 771 (+15.94%)
Mutual labels:  time-series, forecasting
Darts
A python library for easy manipulation and forecasting of time series.
Stars: ✭ 760 (+14.29%)
Mutual labels:  time-series, forecasting
dbnR
Gaussian dynamic Bayesian networks structure learning and inference based on the bnlearn package
Stars: ✭ 33 (-95.04%)
Mutual labels:  time-series, forecasting
Arch
ARCH models in Python
Stars: ✭ 660 (-0.75%)
Mutual labels:  time-series, forecasting
H1st
The AI Application Platform We All Need. Human AND Machine Intelligence. Based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more.
Stars: ✭ 697 (+4.81%)
Mutual labels:  time-series, automl

AutoTS

AutoTS is a time series package for Python designed for rapidly deploying high-accuracy forecasts at scale.

There are dozens of forecasting models usable in the sklearn style of .fit() and .predict(). These includes naive, statistical, machine learning, and deep learning models. Additionally, there are over 30 time series specific transforms usable in the sklearn style of .fit(), .transform() and .inverse_transform(). All of these function directly on Pandas Dataframes, without the need for conversion to proprietary objects.

All models support forecasting multivariate (multiple time series) outputs and also support probabilistic (upper/lower bound) forecasts. Most models can readily scale to tens and even hundreds of thousands of input series. Many models also support passing in user-defined exogenous regressors.

These models are all designed for integration in an AutoML feature search which automatically finds the best models, preprocessing, and ensembling for a given dataset through genetic algorithms.

Horizontal and mosaic style ensembles are the flagship ensembling types, allowing each series to receive the most accurate possible models while still maintaining scalability.

A combination of metrics and cross-validation options, the ability to apply subsets and weighting, regressor generation tools, simulation forecasting mode, event risk forecasting, live datasets, template import and export, plotting, and a collection of data shaping parameters round out the available feature set.

Table of Contents

Installation

pip install autots

This includes dependencies for basic models, but additonal packages are required for some models and methods.

Basic Use

Input data for AutoTS is expected to come in either a long or a wide format:

  • The wide format is a pandas.DataFrame with a pandas.DatetimeIndex and each column a distinct series.
  • The long format has three columns:
    • Date (ideally already in pandas-recognized datetime format)
    • Series ID. For a single time series, series_id can be = None.
    • Value
  • For long data, the column name for each of these is passed to .fit() as date_col, id_col, and value_col. No parameters are needed for wide data.

Lower-level functions are only designed for wide style data.

# also load: _hourly, _monthly, _weekly, _yearly, or _live_daily
from autots import AutoTS, load_daily

# sample datasets can be used in either of the long or wide import shapes
long = False
df = load_daily(long=long)

model = AutoTS(
    forecast_length=21,
    frequency='infer',
    prediction_interval=0.9,
    ensemble=None,
    model_list="fast",  # "superfast", "default", "fast_parallel"
    transformer_list="fast",  # "superfast",
    drop_most_recent=1,
    max_generations=4,
    num_validations=2,
    validation_method="backwards"
)
model = model.fit(
    df,
    date_col='datetime' if long else None,
    value_col='value' if long else None,
    id_col='series_id' if long else None,
)

prediction = model.predict()
# plot a sample
prediction.plot(model.df_wide_numeric,
                series=model.df_wide_numeric.columns[0],
                start_date="2019-01-01")
# Print the details of the best model
print(model)

# point forecasts dataframe
forecasts_df = prediction.forecast
# upper and lower forecasts
forecasts_up, forecasts_low = prediction.upper_forecast, prediction.lower_forecast

# accuracy of all tried model results
model_results = model.results()
# and aggregated from cross validation
validation_results = model.results("validation")

The lower-level API, in particular the large section of time series transformers in the scikit-learn style, can also be utilized independently from the AutoML framework.

Check out extended_tutorial.md for a more detailed guide to features.

Also take a look at the production_example.py

Tips for Speed and Large Data:

  • Use appropriate model lists, especially the predefined lists:
    • superfast (simple naive models) and fast (more complex but still faster models, optimized for many series)
    • fast_parallel (a combination of fast and parallel) or parallel, given many CPU cores are available
      • n_jobs usually gets pretty close with ='auto' but adjust as necessary for the environment
    • see a dict of predefined lists (some defined for internal use) with from autots.models.model_list import model_lists
  • Use the subset parameter when there are many similar series, subset=100 will often generalize well for tens of thousands of similar series.
    • if using subset, passing weights for series will weight subset selection towards higher priority series.
    • if limited by RAM, it can be distributed by running multiple instances of AutoTS on different batches of data, having first imported a template pretrained as a starting point for all.
  • Set model_interrupt=True which passes over the current model when a KeyboardInterrupt ie crtl+c is pressed (although if the interrupt falls between generations it will stop the entire training).
  • Use the result_file method of .fit() which will save progress after each generation - helpful to save progress if a long training is being done. Use import_results to recover.
  • While Transformations are pretty fast, setting transformer_max_depth to a lower number (say, 2) will increase speed. Also utilize transformer_list == 'fast' or 'superfast'.
  • Check out this example of using AutoTS with pandas UDF.
  • Ensembles are obviously slower to predict because they run many models, 'distance' models 2x slower, and 'simple' models 3x-5x slower.
    • ensemble='horizontal-max' with model_list='no_shared_fast' can scale relatively well given many cpu cores because each model is only run on the series it is needed for.
  • Reducing num_validations and models_to_validate will decrease runtime but may lead to poorer model selections.
  • For datasets with many records, upsampling (for example, from daily to monthly frequency forecasts) can reduce training time if appropriate.
    • this can be done by adjusting frequency and aggfunc but is probably best done before passing data into AutoTS.
  • It will be faster if NaN's are already filled. If a search for optimal NaN fill method is not required, then fill any NaN with a satisfactory method before passing to class.
  • Set runtime_weighting in metric_weighting to a higher value. This will guide the search towards faster models, although it may come at the expense of accuracy.

How to Contribute:

  • Give feedback on where you find the documentation confusing
  • Use AutoTS and...
    • Report errors and request features by adding Issues on GitHub
    • Posting the top model templates for your data (to help improve the starting templates)
    • Feel free to recommend different search grid parameters for your favorite models
  • And, of course, contributing to the codebase directly on GitHub.

Also known as Project CATS (Catlin's Automated Time Series) hence the logo.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].