All Projects β†’ minerva-ml β†’ Open Solution Toxic Comments

minerva-ml / Open Solution Toxic Comments

Licence: mit
Open solution to the Toxic Comment Classification Challenge

Programming Languages

python
139335 projects - #7 most used programming language
python3
1442 projects

Projects that are alternatives of or similar to Open Solution Toxic Comments

Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+678.57%)
Mutual labels:  kaggle, data-science, pipeline, prediction
Open Solution Mapping Challenge
Open solution to the Mapping Challenge 🌎
Stars: ✭ 291 (+88.96%)
Mutual labels:  competition, kaggle, data-science, pipeline
Data-Science-Hackathon-And-Competition
Grandmaster in MachineHack (3rd Rank Best) | Top 70 in AnalyticsVidya & Zindi | Expert at Kaggle | Hack AI
Stars: ✭ 165 (+7.14%)
Mutual labels:  competition, kaggle, kaggle-competition
Deep Learning Boot Camp
A community run, 5-day PyTorch Deep Learning Bootcamp
Stars: ✭ 1,270 (+724.68%)
Mutual labels:  kaggle-competition, kaggle, data-science
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Stars: ✭ 86 (-44.16%)
Mutual labels:  kaggle-competition, kaggle, data-science
Data Science Competitions
Goal of this repo is to provide the solutions of all Data Science Competitions(Kaggle, Data Hack, Machine Hack, Driven Data etc...).
Stars: ✭ 572 (+271.43%)
Mutual labels:  kaggle-competition, kaggle, data-science
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (+27.27%)
Mutual labels:  kaggle, data-science, pipeline
Open Solution Home Credit
Open solution to the Home Credit Default Risk challenge 🏑
Stars: ✭ 397 (+157.79%)
Mutual labels:  competition, kaggle, pipeline
My Journey In The Data Science World
πŸ“’ Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+662.99%)
Mutual labels:  kaggle-competition, kaggle, data-science
Datascicomp
A collection of popular Data Science Challenges/Competitions || Countdown timers to keep track of the entry deadlines.
Stars: ✭ 1,636 (+962.34%)
Mutual labels:  competition, data-science, challenge
Blurr
Data transformations for the ML era
Stars: ✭ 96 (-37.66%)
Mutual labels:  data-science, pipeline
Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Stars: ✭ 1,301 (+744.81%)
Mutual labels:  data-science, pipeline
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (+724.68%)
Mutual labels:  data-science, prediction
Kaggle Past Solutions
A searchable compilation of Kaggle past solutions
Stars: ✭ 1,372 (+790.91%)
Mutual labels:  kaggle, data-science
D2l En
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 300 universities from 55 countries including Stanford, MIT, Harvard, and Cambridge.
Stars: ✭ 11,837 (+7586.36%)
Mutual labels:  kaggle, data-science
Segmentation
Tensorflow implementation : U-net and FCN with global convolution
Stars: ✭ 101 (-34.42%)
Mutual labels:  kaggle-competition, kaggle
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-30.52%)
Mutual labels:  data-science, prediction
Kaggle Houseprices
Kaggle Kernel for House Prices competition https://www.kaggle.com/massquantity/all-you-need-is-pca-lb-0-11421-top-4
Stars: ✭ 113 (-26.62%)
Mutual labels:  kaggle, data-science
Skpro
Supervised domain-agnostic prediction framework for probabilistic modelling
Stars: ✭ 107 (-30.52%)
Mutual labels:  data-science, prediction
Chain.jl
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-23.38%)
Mutual labels:  data-science, pipeline

Starter code: Kaggle Toxic Comment Classification Challenge

More competitions πŸŽ‡

Check collection of public projects 🎁, where you can find multiple Kaggle competitions with code, experiments and outputs.

Here, at Neptune we enjoy participating in the Kaggle competitions. Toxic Comment Classification Challenge is especially interesting because it touches important issue of online harassment.

Ensemble our predictions in the cloud!

You need to be registered to neptune.ml to be able to use our predictions for your ensemble models.

  • click start notebook
  • choose browse button
  • select the neptune_ensembling.ipynb file from this repository.
  • choose worker type: gcp-large is the recommended one.
  • run first few cells to load our predictions on the held out validation set along with the labels
  • grid search over many possible parameter options. The more runs you choose the longer it will run.
  • train your second level, ensemble model (it should take less than an hour once you have the parameters)
  • load our predictions on the test set
  • feed our test set predictions to your ensemble model and get final predictions
  • save your submission file
  • click on browse files and find your submission file to download it.

Running the notebook as is got 0.986+ on the LB.

Disclaimer

In this open source solution you will find references to the neptune.ml. It is free platform for community Users, which we use daily to keep track of our experiments. Please note that using neptune.ml is not necessary to proceed with this solution. You may run it as plain Python script πŸ˜‰.

The idea

We are contributing starter code that is easy to use and extend. We did it before with Cdiscount’s Image Classification Challenge and we believe that it is correct way to open data science to the wider community and encourage more people to participate in Challenges. This starter is ready-to-use end-to-end solution. Since all computations are organized in separate steps, it is also easy to extend. Check devbook.ipynb for more information about different pipelines.

Now we want to go one step further and invite you to participate in the development of this analysis pipeline. At the later stage of the competition (early February) we will invite top contributors to join our team on Kaggle.

Contributing

You are welcome to extend this pipeline and contribute your own models or procedures. Please refer to the CONTRIBUTING for more details.

Installation

option 1: Neptune cloud

on the neptune site

  • log in: neptune accound login
  • create new project named toxic: Follow the link Projects (top bar, left side), then click New project button. This action will generate project-key TOX, which is already listed in the neptune.yaml.

run setup commands

$ git clone https://github.com/neptune-ml/kaggle-toxic-starter.git
$ pip3 install neptune-cli
$ neptune login

start experiment

$ neptune send --environment keras-2.0-gpu-py3 --worker gcp-gpu-medium --config best_configs/fasttext_gru.yaml -- train_evaluate_predict_cv_pipeline --pipeline_name fasttext_gru --model_level first

This should get you to 0.9852 Happy Training :)

Refer to Neptune documentation and Getting started: Neptune Cloud for more.

option 2: local install

Please refer to the Getting started: local instance for installation procedure.

Solution visualization

Below end-to-end pipeline is visualized. You can run exactly this one! pipeline_001

We have also prepared something simpler to just get you started:

pipeline_002

User support

There are several ways to seek help:

  1. Read project's Wiki, where we publish descriptions about the code, pipelines and neptune.
  2. Kaggle discussion is our primary way of communication.
  3. You can submit an issue directly in this repo.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].