All Projects → pudae → Kaggle Humpback

pudae / Kaggle Humpback

Licence: bsd-2-clause
Code for 3rd place solution in Kaggle Humpback Whale Identification Challenge.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Kaggle Humpback

Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-35.56%)
Mutual labels:  kaggle
Kaggle Freesound Audio Tagging
8th place solution (on Kaggle) to the Freesound General-Purpose Audio Tagging Challenge (DCASE 2018 - Task 2)
Stars: ✭ 111 (-17.78%)
Mutual labels:  kaggle
Kaggle Airbnb Recruiting New User Bookings
2nd Place Solution in Kaggle Airbnb New User Bookings competition
Stars: ✭ 118 (-12.59%)
Mutual labels:  kaggle
D2l En
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Stars: ✭ 11,837 (+8668.15%)
Mutual labels:  kaggle
Kaggle Dogs Vs Cats Caffe
Kaggle dogs vs cats solution in Caffe
Stars: ✭ 105 (-22.22%)
Mutual labels:  kaggle
Kaggle Houseprices
Kaggle Kernel for House Prices competition https://www.kaggle.com/massquantity/all-you-need-is-pca-lb-0-11421-top-4
Stars: ✭ 113 (-16.3%)
Mutual labels:  kaggle
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-36.3%)
Mutual labels:  kaggle
Pytorch Speech Commands
Speech commands recognition with PyTorch
Stars: ✭ 128 (-5.19%)
Mutual labels:  kaggle
Dog Breeds Classification
Set of scripts and data for reproducing dog breed classification model training, analysis, and inference.
Stars: ✭ 105 (-22.22%)
Mutual labels:  kaggle
Ml Fraud Detection
Credit card fraud detection through logistic regression, k-means, and deep learning.
Stars: ✭ 117 (-13.33%)
Mutual labels:  kaggle
Kaggle Past Solutions
A searchable compilation of Kaggle past solutions
Stars: ✭ 1,372 (+916.3%)
Mutual labels:  kaggle
Dataminingnotesandpractice
记录我学习数据挖掘过程的笔记和见到的奇技,持续更新~
Stars: ✭ 103 (-23.7%)
Mutual labels:  kaggle
Ds bowl 2018
Kaggle Data Science Bowl 2018
Stars: ✭ 116 (-14.07%)
Mutual labels:  kaggle
Kaggle Global Wheat Detection
9th Place Solution of Kaggle Global Wheat Detection
Stars: ✭ 91 (-32.59%)
Mutual labels:  kaggle
Ml Dl Scripts
The repository provides usefull python scripts for ML and data analysis
Stars: ✭ 119 (-11.85%)
Mutual labels:  kaggle
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+840.74%)
Mutual labels:  kaggle
Crypto
Cryptocurrency Historical Market Data R Package
Stars: ✭ 112 (-17.04%)
Mutual labels:  kaggle
Kaggle
Code for Kaggle Competitions
Stars: ✭ 128 (-5.19%)
Mutual labels:  kaggle
Kaggle Web Traffic
1st place solution
Stars: ✭ 1,641 (+1115.56%)
Mutual labels:  kaggle
Dogbreed gluon
kaggle Dog Breed Identification
Stars: ✭ 116 (-14.07%)
Mutual labels:  kaggle

kaggle-humpback-submission

Code for 3rd place solution in Kaggle Humpback Whale Identification Challange.

To read the detailed solution, please, refer to the Kaggle post

Hardware

The following specs were used to create the original solution.

  • Ubuntu 16.04 LTS
  • Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
  • 2x NVIDIA 1080 Ti

Reproducing Submission

To reproduce my submission without retraining, do the following steps:

  1. Installation
  2. Download Dataset
  3. Download Pretrained models
  4. Inference
  5. Make Submission

Installation

All requirements should be detailed in requirements.txt. Using Anaconda is strongly recommended.

conda create -n humpback python=3.6
source activate humpback
pip install -r requirements.txt

Download dataset

Download and extract train.zip and test.zip to data directory. If the Kaggle API is installed, run following command.

$ kaggle competitions download -c humpback-whale-identification -f train.zip
$ kaggle competitions download -c humpback-whale-identification -f test.zip
$ unzip train.zip -d data/train
$ unzip test.zip -d data/test

Generate CSV files

You can skip this step. All CSV files are prepared in the data directory.

List of CSV files

filename description
landmark.{split}.{fold}.csv predicted landmarks for the train and test set
duplcate_ids.csv list of duplicate identities
leaks.csv leaks from post
split.keypoint.{fold}.csv labels for training bounding box and landmark detector
train.v2.csv label file that duplicate ids are grouped to single identity and several new whales are also grouped.

Landmark

To inference landmarks, run following commands

$ sh inference_landmarks.sh

Training

In the configs directory, you can find configurations I used to train my final models.

Train models

To train models, run following commands.

$ python train.py --config={config_path}

The expected training times are:

Model GPUs Image size Training Epochs Training Time
densenet121 1x 1080 Ti 320 300 60 hours

Average weights

To average weights, run following commands.

$ python swa.py --config={config_path}

The averages weights will be located in train_logs/{train_dir}/checkpoint.

Pretrained models

You can download pretrained model that used for my submission from link. Or run following command.

$ wget https://www.dropbox.com/s/fdnh29pjk8rpxgs/train_logs.zip
$ tar xzvf train_logs.tar.gz

Unzip them into train_logs then you can see the following structure:

results
  +- densenet121.1st
  |  +- checkpoint
  +- densenet121.2nd
  |  +- checkpoint
  +- densenet121.3rd
  |  +- checkpoint
  +- landmark.0
  |  +- checkpoint
  +- landmark.1
  |  +- checkpoint
  +- landmark.2
  |  +- checkpoint
  +- landmark.3
  |  +- checkpoint
  +- landmark.4
  |  +- checkpoint

Inference

If trained weights are prepared, you can create files that contain cosine similarities of images with target whales.

$ python inference.py \
  --config={config_filepath} \
  --tta_landmark={0 or 1} \
  --tta_flip={0 or 1} \
  --output={output_filepath}

To make submission, you must inference test and test_val splits. For example:

$ python make_submission.py \
  --input_path={comma seperated list of similarity file paths} \
  --output_path={submission_file_path}

To inference all models and make submission using pretrained models, simply run sh inference.sh

Post Processing

As you know, there are some duplicate whale ids. For the duplicate ids, the following process are applied.

Assume that the identity A and the identity B are duplicate.

  1. If top1 prediction is the identity A, then I set the identity B to top2 prediction.
  2. If the size of test image is equal to one of images in identity A and is not equal to any of images in identity B, then I set top1 prediction to identity A.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].