All Projects → louis925 → Kaggle Web Traffic Time Series Forecasting

louis925 / Kaggle Web Traffic Time Series Forecasting

Solution to Kaggle - Web Traffic Time Series Forecasting

Projects that are alternatives of or similar to Kaggle Web Traffic Time Series Forecasting

Kaggle Web Traffic
1st place solution
Stars: ✭ 1,641 (+5558.62%)
Mutual labels:  kaggle, jupyter-notebook, time-series, timeseries
Tsai
Time series Timeseries Deep Learning Pytorch fastai - State-of-the-art Deep Learning with Time Series and Sequences in Pytorch / fastai
Stars: ✭ 407 (+1303.45%)
Mutual labels:  jupyter-notebook, time-series, timeseries, cnn
Kaggle Competition Favorita
5th place solution for Kaggle competition Favorita Grocery Sales Forecasting
Stars: ✭ 169 (+482.76%)
Mutual labels:  kaggle, time-series, cnn
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+3951.72%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+4279.31%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Tcdf
Temporal Causal Discovery Framework (PyTorch): discovering causal relationships between time series
Stars: ✭ 217 (+648.28%)
Mutual labels:  jupyter-notebook, timeseries, cnn
Lung Diseases Classifier
Diseases Detection from NIH Chest X-ray data
Stars: ✭ 52 (+79.31%)
Mutual labels:  kaggle, jupyter-notebook, cnn
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (+196.55%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Simplestockanalysispython
Stock Analysis Tutorial in Python
Stars: ✭ 126 (+334.48%)
Mutual labels:  jupyter-notebook, time-series, timeseries
histopathologic cancer detector
CNN histopathologic tumor identifier.
Stars: ✭ 26 (-10.34%)
Mutual labels:  kaggle, kaggle-competition, cnn-keras
digit recognizer
CNN digit recognizer implemented in Keras Notebook, Kaggle/MNIST (0.995).
Stars: ✭ 27 (-6.9%)
Mutual labels:  kaggle, kaggle-competition, cnn-keras
Pytorch Kaggle Starter
Pytorch starter kit for Kaggle competitions
Stars: ✭ 268 (+824.14%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Screenshot To Code
A neural network that transforms a design mock-up into a static website.
Stars: ✭ 13,561 (+46662.07%)
Mutual labels:  jupyter-notebook, cnn, cnn-keras
Timesynth
A Multipurpose Library for Synthetic Time Series Generation in Python
Stars: ✭ 170 (+486.21%)
Mutual labels:  jupyter-notebook, time-series, timeseries
Image classifier
CNN image classifier implemented in Keras Notebook 🖼️.
Stars: ✭ 139 (+379.31%)
Mutual labels:  jupyter-notebook, cnn, cnn-keras
Kaggle Notebooks
Sample notebooks for Kaggle competitions
Stars: ✭ 77 (+165.52%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Tsmoothie
A python library for time-series smoothing and outlier detection in a vectorized way.
Stars: ✭ 109 (+275.86%)
Mutual labels:  jupyter-notebook, time-series, timeseries
Keras transfer cifar10
Object classification with CIFAR-10 using transfer learning
Stars: ✭ 120 (+313.79%)
Mutual labels:  jupyter-notebook, cnn, cnn-keras
Machine Learning Workflow With Python
This is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+441.38%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook
Amazon Forest Computer Vision
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks
Stars: ✭ 346 (+1093.1%)
Mutual labels:  kaggle-competition, kaggle, jupyter-notebook

Kaggle - Web Traffic Time Series Forecasting

This repository contains the codes to solve the Kaggle competition "Web traffic time series forecasting" (https://www.kaggle.com/c/web-traffic-time-series-forecasting).

Contributors

Louis Yang (github.com/louis925)

Chen-Hsi (Sky) Huang (github.com/skyhuang1208)

Achievement

11th place out of 1095 teams (Gold medal, top 2%) on stage 2 private leaderboard with SMAPE score 37.919.

How to Use

  1. Download this repository.
  2. Download and unzip the training data train_2.csv.zip and key_2.csv.zip from Kaggle website (https://www.kaggle.com/c/web-traffic-time-series-forecasting/data). Place them in the data directory. Rename train_2.csv to train_3.csv.
  3. Run main.ipynb in the codes directory to generate prediction in the results directory.

How it works

We use a Fibonacci series as the window sizes to compute the a serie of median excluding nan for each sample (page). Then find again the median amount the series excluding nan. Use this median-median (which we call Fibonacci median) as the center of each sample. The Fibonaccis series we took is [11, 18, 30, 48, 78, 126, 203, 329], which was suggested by Ehsan http://www.kaggle.com/safavieh/median-estimation-by-fibonacci-et-al-lb-44-9?scriptVersionId=1466647 and has original scored 44.9 in the stage 1 Kaggle leaderboard.

To better train the neural network for various scales of dayily visits, we first do the log1p transformation in the form

X_log = log10(X_ori + 1),

to bring them into the same order. Then we sample-wise (page-wise) standardize the data using the Fibonacci median (fib_med) instead of regular mean as the center baseline and the usual standard deviation (stdev) as the scale, where nan is treated as 0.

According to the Fibonacci median (fib_med), we split data (pages) into groups and train individual neural network (models) in each group. The group spliting is determined by

log10(fib_med + 1) < (1.0, 2.0, 4.0, greater)

So there are total 4 groups.

The first group (group 0) use the result from the Fibonacci median model (fib_med) since it is difficult to learn by our neural network. For the rest groups (group 1-3), we use the results from the convolutional neural network (CNN).

The neural network takes 64 days (x_length) of data and predicts the following 64 days (y_length) of results. For the neural network structure, we use single 1D convolutional layer with 140 neurons, kernel size 3, and relu activation function, which pass through average pooling with size 2. After the flatten the convolutional result, we feed the Fibonacci median and stdev (after log transform) for the sample as addional inputs via concatenation. Finally, the concatenated data is pass into 3 fully connected layer (130, 120, and 64 neurons) with 2 relu and 1 linear activation functions to do the regression.

The CNN structure is:

[X > Conv1D(140, kernel=3) > AvgPool(2) > Flat + Additional input, A, (median, stdev)] > Concat > 
FC(130, relu) > FC(120, relu) > FC(64, linear) > Y

Ensemble learning: We train the same neural network 5 times. Each run only train on 4/5 of the data. Then we take the median of the result from each run.

Group model optimization: We evaluate the group models (neural network trained within their own group) on other groups data, and assign their results based on their performance in test mode. Since in the test (validation) mode the neural network model trained by group 2 do better than those trained by group 3 on group 3 data, we use the group model 2 for predicting group 3 result.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].