All Projects → EliasCai → Bert Toxicity Classification

EliasCai / Bert Toxicity Classification

bert on Jigsaw Unintended Bias in Toxicity Classification

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bert Toxicity Classification

Multi Class Text Classification Cnn Rnn
Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.
Stars: ✭ 570 (+1139.13%)
Mutual labels:  kaggle, text-classification
kaggle redefining cancer treatment
Personalized Medicine: Redefining Cancer Treatment with deep learning
Stars: ✭ 21 (-54.35%)
Mutual labels:  text-classification, kaggle
Pytorch Toolbelt
PyTorch extensions for fast R&D prototyping and Kaggle farming
Stars: ✭ 942 (+1947.83%)
Mutual labels:  kaggle
Textclassifier
Text classifier for Hierarchical Attention Networks for Document Classification
Stars: ✭ 985 (+2041.3%)
Mutual labels:  text-classification
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-30.43%)
Mutual labels:  text-classification
Cnn Question Classification Keras
Chinese Question Classifier (Keras Implementation) on BQuLD
Stars: ✭ 28 (-39.13%)
Mutual labels:  text-classification
Healthcheck
Health Check ✔ is a Machine Learning Web Application made using Flask that can predict mainly three diseases i.e. Diabetes, Heart Disease, and Cancer.
Stars: ✭ 35 (-23.91%)
Mutual labels:  kaggle
Text2gender
Predict the author's gender from their text.
Stars: ✭ 14 (-69.57%)
Mutual labels:  text-classification
Kaggle Seizure Prediction
solution for the American Epilepsy Society Seizure Prediction Challenge
Stars: ✭ 44 (-4.35%)
Mutual labels:  kaggle
Few Shot Text Classification
Few-shot binary text classification with Induction Networks and Word2Vec weights initialization
Stars: ✭ 32 (-30.43%)
Mutual labels:  text-classification
Textcnn
TextCNN by TensorFlow 2.0.0 ( tf.keras mainly ).
Stars: ✭ 37 (-19.57%)
Mutual labels:  text-classification
Nlp xiaojiang
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Stars: ✭ 954 (+1973.91%)
Mutual labels:  text-classification
Kaggle Dae
kaggleのporto-seguro-safe-driver-prediction, michaelのsolver
Stars: ✭ 29 (-36.96%)
Mutual labels:  kaggle
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-23.91%)
Mutual labels:  text-classification
Text gcn
Graph Convolutional Networks for Text Classification. AAAI 2019
Stars: ✭ 945 (+1954.35%)
Mutual labels:  text-classification
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-17.39%)
Mutual labels:  text-classification
Keras Textclassification
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
Stars: ✭ 914 (+1886.96%)
Mutual labels:  text-classification
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-34.78%)
Mutual labels:  text-classification
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-26.09%)
Mutual labels:  text-classification
Kaggle Ndsb
Code for National Data Science Bowl. 10th place.
Stars: ✭ 45 (-2.17%)
Mutual labels:  kaggle

BERT-toxicity-classification

This repo show how to train bert model on Jigsaw Unintended Bias in Toxicity Classification
star me and i will keep update the code
this repo is modified from google open source code for bert , thank Jon Mischo advice here

LB Score

  1. 2019-04-06: 0.91216
  2. 2019-04-07: 0.91455(add text clean method reference here)

How to output the prediction on test data by finetuning bert model

prepare

  1. download the pretrain model
  2. download the data and unzip to input folder
  3. split the train and dev data(for convenience, i just tyde this command and not recommanded)
cat train.csv | tail -n 1000 > dev_1000.csv

train model

  1. run run_classifier.py
python run_classifier.py \
  --data_dir=input/ --vocab_file=uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=uncased_L-12_H-768_A-12/bert_model.ckpt \
  --task_name=toxic \
  --do_train=True \
  --do_eval=True \
  --do_predict=True \
  --output_dir=model_output/
  1. the model will train 10 epochs, but you can stop it depend on your time
  2. the checkpoint will be saved on the model_output, also the prediton on the test data(see model_output/test_result.tsv)

generate the submission

  1. run encode.py
  2. upload the output/sub.csv to kaggle

What is the different with official code**

  1. add csv handler(line 243 in run_classifier.py)
  2. add ToxicProcessor(line 264 in run_classifier.py)

To do

  1. text clean and OOV
  2. CV
  3. average different checkpoint prediction

like this repo? you can buy me a cup of coffee

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].