All Projects → ceshine → Favorita_sales_forecasting

ceshine / Favorita_sales_forecasting

Solution to Corporación Favorita Grocery Sales Forecasting Competition

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Favorita sales forecasting

Fecon235
Notebooks for financial economics. Keywords: Jupyter notebook pandas Federal Reserve FRED Ferbus GDP CPI PCE inflation unemployment wage income debt Case-Shiller housing asset portfolio equities SPX bonds TIPS rates currency FX euro EUR USD JPY yen XAU gold Brent WTI oil Holt-Winters time-series forecasting statistics econometrics
Stars: ✭ 708 (+3271.43%)
Mutual labels:  time-series
Falcon Plus
An open-source and enterprise-level monitoring system.
Stars: ✭ 6,770 (+32138.1%)
Mutual labels:  time-series
Agots
Anomaly Generator on Time Series
Stars: ✭ 24 (+14.29%)
Mutual labels:  time-series
Uplot
📈 A small, fast chart for time series, lines, areas, ohlc & bars
Stars: ✭ 6,808 (+32319.05%)
Mutual labels:  time-series
Btgym
Scalable, event-driven, deep-learning-friendly backtesting library
Stars: ✭ 765 (+3542.86%)
Mutual labels:  time-series
Void
terminal-based personal organizer
Stars: ✭ 831 (+3857.14%)
Mutual labels:  time-series
H1st
The AI Application Platform We All Need. Human AND Machine Intelligence. Based on experience building AI solutions at Panasonic: robotics predictive maintenance, cold-chain energy optimization, Gigafactory battery mfg, avionics, automotive cybersecurity, and more.
Stars: ✭ 697 (+3219.05%)
Mutual labels:  time-series
Mycodo
An environmental monitoring and regulation system
Stars: ✭ 936 (+4357.14%)
Mutual labels:  time-series
Informer2020
The GitHub repository for the paper "Informer" accepted by AAAI 2021.
Stars: ✭ 771 (+3571.43%)
Mutual labels:  time-series
Gesturerecognition
Gesture Recognition using TensorFlow
Stars: ✭ 19 (-9.52%)
Mutual labels:  time-series
Getting Things Done With Pytorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
Stars: ✭ 738 (+3414.29%)
Mutual labels:  time-series
Darts
A python library for easy manipulation and forecasting of time series.
Stars: ✭ 760 (+3519.05%)
Mutual labels:  time-series
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+3857.14%)
Mutual labels:  time-series
Rnn Time Series Anomaly Detection
RNN based Time-series Anomaly detector model implemented in Pytorch.
Stars: ✭ 718 (+3319.05%)
Mutual labels:  time-series
Phildb
Timeseries database
Stars: ✭ 25 (+19.05%)
Mutual labels:  time-series
Rrdtool 1.x
RRDtool 1.x - Round Robin Database
Stars: ✭ 702 (+3242.86%)
Mutual labels:  time-series
Deep Learning Time Series
List of papers, code and experiments using deep learning for time series forecasting
Stars: ✭ 796 (+3690.48%)
Mutual labels:  time-series
Pmdarima
A statistical library designed to fill the void in Python's time series analysis capabilities, including the equivalent of R's auto.arima function.
Stars: ✭ 838 (+3890.48%)
Mutual labels:  time-series
Tempdisagg
Methods for Temporal Disaggregation and Interpolation of Time Series
Stars: ✭ 25 (+19.05%)
Mutual labels:  time-series
Heroic
The Heroic Time Series Database
Stars: ✭ 836 (+3880.95%)
Mutual labels:  time-series

(Simplified) Solution to Favorita Competition

Sorry, no CPU-only mode. You have to use an nvidia card to train models.

Test environment:

  1. GTX 1070
  2. 16 GB RAM + 8 GB Swap
  3. At least 30 GB free disk space
  • (it can be less if you turn off some of the joblib disk caching)
  1. Docker 17.12.0-ce
  2. Nvidia-docker 2.0

Acknowledgement

  1. Transformer model comes from Yu-Hsiang Huang's implementation. His repo is included in "attention-is-all-you-need-pytorch" folder via git subtree.
  2. LSTNet model is largely inspired from GUOKUN LAI's implementation.
  3. The model structure is inspired by the work of Sean Vasquez and Arthur Suilin.

Docker Usage

First build the image. Example command: docker build -t favorita .

Then spin up a docker container:

docker run --runtime=nvidia --rm -ti \
    -v /mnt/Data/favorita_cache:/home/docker/labs/cache \
    -v /mnt/Data/favorita_data:/home/docker/labs/data \
    -p 6006:6006 favorita bash
  • It is recommended to manually mount the data and cache folder
  • port 6006 is for running tensorboard inside the container

Where to put the data

Download and extract the data files from Kaggle into data folder.

We're going to assume you're using the BASH prompt inside the container in the rest of this README.

Model Training

Preprocessing

python prepare_seq_data.py

Train Model

For now there are two types of model ready to be trained:

  1. Transformer (fit_transformer.py)
  2. LSTNet (fit_lstnet.py)

The training scripts use Sacred to manage experiments. It is recommended to set a seed explicitly via CLI:

python fit_transformer.py with seed=93102

You can also use Mongo to save experiment results and hyper-parameters for each run. Please refer to the Sacred documentation for more details.

Prediction for Validation and Testing Dataset

The CSV output will be saved in cache/preds/val/ and cache/preds/test/ respectively.

Tensorboard

Training and validation loss curves, and some of the embeddings are logged in tensorboard format. Launch tensorboad via:

tensorboard --logdir runs

Then visit http://localhost:6006 for the web interface.

TODO (For now you need to figure them out yourself)

  1. Ensembling script: I made some changes to the outputs of model training scripts so they are more readable. But that means ensembling script needs to be updated as well. (For those who want to try: the ground truth for validation set is stored in cache/yval_seq.npy.)
  2. Encoder/Decoder and Encoder/MLP models with LSTM, GRU, QRNN, SRU units: I tried a lot of different stuffs for this competition. But I feel the code could use some refactoring, so they are removed for now.
  3. Tabular data preparation and models: My GBM models is mediocre at best, so not really worth sharing here. But as I mentioned in the blog post. For those store/item combination that were removed by the 56-day nonzero filter, using a GBM model to predict values for them will give you a better score than predicting zeros.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].