All Projects → PavelOstyakov → Toxic

PavelOstyakov / Toxic

Licence: mit
Toxic Comment Classification Challenge

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Toxic

Data Analysis
主要是爬虫与数据分析项目总结,外加建模与机器学习,模型的评估。
Stars: ✭ 142 (-43.87%)
Mutual labels:  kaggle
Lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Stars: ✭ 13,293 (+5154.15%)
Mutual labels:  kaggle
Machine Learning
从零基础开始机器学习之旅
Stars: ✭ 209 (-17.39%)
Mutual labels:  kaggle
Open Solution Toxic Comments
Open solution to the Toxic Comment Classification Challenge
Stars: ✭ 154 (-39.13%)
Mutual labels:  kaggle
Competition Baseline
数据科学竞赛知识、代码、思路
Stars: ✭ 2,553 (+909.09%)
Mutual labels:  kaggle
Deeptoxic
top 1% solution to toxic comment classification challenge on Kaggle.
Stars: ✭ 180 (-28.85%)
Mutual labels:  kaggle
Benchmarks
Comparison tools
Stars: ✭ 139 (-45.06%)
Mutual labels:  kaggle
Kaggle airbus ship detection
Kaggle airbus ship detection challenge 21st solution
Stars: ✭ 238 (-5.93%)
Mutual labels:  kaggle
Kaggle Competition Favorita
5th place solution for Kaggle competition Favorita Grocery Sales Forecasting
Stars: ✭ 169 (-33.2%)
Mutual labels:  kaggle
Lightautoml
LAMA - automatic model creation framework
Stars: ✭ 196 (-22.53%)
Mutual labels:  kaggle
Human Action Recognition With Keras
Keras implementation of Human Action Recognition for the data set State Farm Distracted Driver Detection (Kaggle)
Stars: ✭ 156 (-38.34%)
Mutual labels:  kaggle
Open Solution Data Science Bowl 2018
Open solution to the Data Science Bowl 2018
Stars: ✭ 159 (-37.15%)
Mutual labels:  kaggle
Girls In Ai
免费学代码系列:小白python入门、数据分析data analyst、机器学习machine learning、深度学习deep learning、kaggle实战
Stars: ✭ 2,309 (+812.65%)
Mutual labels:  kaggle
Pins
Pin, Discover and Share Resources
Stars: ✭ 149 (-41.11%)
Mutual labels:  kaggle
Nyaggle
Code for Kaggle and Offline Competitions
Stars: ✭ 209 (-17.39%)
Mutual labels:  kaggle
Outbrain Click Prediction Kaggle
Solution to the Outbrain Click Prediction competition
Stars: ✭ 140 (-44.66%)
Mutual labels:  kaggle
Chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Stars: ✭ 176 (-30.43%)
Mutual labels:  kaggle
Kaggle Cifar10 Torch7
Code for Kaggle-CIFAR10 competition. 5th place.
Stars: ✭ 244 (-3.56%)
Mutual labels:  kaggle
Kaggle Hpa
Code for 3rd place solution in Kaggle Human Protein Atlas Image Classification Challenge.
Stars: ✭ 226 (-10.67%)
Mutual labels:  kaggle
Deep Time Series Prediction
Seq2Seq, Bert, Transformer, WaveNet for time series prediction.
Stars: ✭ 183 (-27.67%)
Mutual labels:  kaggle

Toxic Comment Classification Challenge

Code for Kaggle competition https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge

This script achieves 0.057 on LB.

Run script

First, install required libraries:

pip install nltk keras tqdm scikit-learn

Download embeddings. I used fastText crawl-300d-2M.vec. It can be found here: https://github.com/facebookresearch/fastText/blob/master/docs/english-vectors.md

Download competition's data. The links are here: https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

Don't forget to extract files from archives

Next, run

python fit_predict.py train.csv test.csv crawl-300d-2M.vec

You will need some time to train a model. It takes ~3-4 hours on GTX 1080 Ti. In the finish, there will be file toxic_results/submit which you will be able to submit on Kaggle.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].