Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → xuyige → Bert4doc Classification

xuyige / Bert4doc Classification

Licence: apache-2.0

Code and source for paper ``How to Fine-Tune BERT for Text Classification?``

Programming Languages

139335 projects - #7 most used programming language

Labels

natural-language-processing text-classification

Projects that are alternatives of or similar to Bert4doc Classification

A list of NLP(Natural Language Processing) tutorials

Stars: ✭ 1,188 (+440%)

Mutual labels: natural-language-processing, text-classification

[ACL 2020] Tensorflow implementation for "Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks"

Stars: ✭ 103 (-53.18%)

Mutual labels: natural-language-processing, text-classification

Monkeylearn Ruby

Official Ruby client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Ruby apps.

Stars: ✭ 76 (-65.45%)

Mutual labels: natural-language-processing, text-classification

Text classification with Sparse Composite Document Vectors.

Stars: ✭ 54 (-75.45%)

Mutual labels: natural-language-processing, text-classification

Text vectorization tool to outperform TFIDF for classification tasks

Stars: ✭ 167 (-24.09%)

Mutual labels: natural-language-processing, text-classification

Arabic support for textblob

Stars: ✭ 60 (-72.73%)

Mutual labels: natural-language-processing, text-classification

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Stars: ✭ 1,356 (+516.36%)

Mutual labels: natural-language-processing, text-classification

A tool for learning vector representations of words and entities from Wikipedia

Stars: ✭ 655 (+197.73%)

Mutual labels: natural-language-processing, text-classification

Monkeylearn Python

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

Stars: ✭ 143 (-35%)

Mutual labels: natural-language-processing, text-classification

Nlp Pretrained Model

A collection of Natural language processing pre-trained models.

Stars: ✭ 122 (-44.55%)

Mutual labels: natural-language-processing, text-classification

Ml Classify Text Js

Machine learning based text classification in JavaScript using n-grams and cosine similarity

Stars: ✭ 38 (-82.73%)

Mutual labels: natural-language-processing, text-classification

fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

Stars: ✭ 2,441 (+1009.55%)

Mutual labels: natural-language-processing, text-classification

Easy Deep Learning With Allennlp

🔮Deep Learning for text made easy with AllenNLP

Stars: ✭ 32 (-85.45%)

Mutual labels: natural-language-processing, text-classification

Text Analytics With Python

Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.

Stars: ✭ 1,132 (+414.55%)

Mutual labels: natural-language-processing, text-classification

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+259.09%)

Mutual labels: natural-language-processing, text-classification

Pytorch implementation of "Graph Convolutional Networks for Text Classification"

Stars: ✭ 90 (-59.09%)

Mutual labels: natural-language-processing, text-classification

Pythoncode Tutorials

The Python Code Tutorials

Stars: ✭ 544 (+147.27%)

Mutual labels: natural-language-processing, text-classification

Natural Language Processing Best Practices & Examples

Stars: ✭ 5,783 (+2528.64%)

Mutual labels: natural-language-processing, text-classification

Kadot, the unsupervised natural language processing library.

Stars: ✭ 108 (-50.91%)

Mutual labels: natural-language-processing, text-classification

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+1044.55%)

Mutual labels: natural-language-processing, text-classification

View All Similar Projects ➔

How to Fine-Tune BERT for Text Classification?

This is the code and source for the paper How to Fine-Tune BERT for Text Classification?

In this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.

*********** update at Mar 14, 2020 *************

Our checkpoint can be loaded in BertEmbedding from the latest fastNLP package.

Link to fastNLP.embeddings.BertEmbedding

Requirements

For further pre-training, we borrow some code from Google BERT. Thus, we need:

tensorflow==1.1x
spacy
pandas
numpy

For fine-tuning, we borrow some codes from pytorch-pretrained-bert package (now well known as transformers). Thus, we need:

torch>=0.4.1,<=1.2.0

Run the code

1) Prepare the data set:

Sogou News

We determine the category of the news based on the URL, such as “sports” corresponding to “http://sports.sohu.com”. We choose 6 categories – “sports”, “house”, “business”, “entertainment”, “women” and “technology”. The number of training samples selected for each class is 9,000 and testing 1,000.

Data is available at here.

The rest data sets

The rest data sets were built by Zhang et al. (2015). We download from URL created by Xiang Zhang.

2) Prepare Google BERT:

BERT-Base, Uncased

BERT-Base, Chinese

3) Further Pre-Training:

Generate Further Pre-Training Corpus

Here we use AG's News as example:

python generate_corpus_agnews.py

File agnews_corpus_test.txt can be found in directory ./data.

Run Further Pre-Training

python create_pretraining_data.py \
  --input_file=./AGnews_corpus.txt \
  --output_file=tmp/tf_AGnews.tfrecord \
  --vocab_file=./uncased_L-12_H-768_A-12/vocab.txt \
  --do_lower_case=True \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --masked_lm_prob=0.15 \
  --random_seed=12345 \
  --dupe_factor=5
  
python run_pretraining.py \
  --input_file=./tmp/tf_AGnews.tfrecord \
  --output_dir=./uncased_L-12_H-768_A-12_AGnews_pretrain \
  --do_train=True \
  --do_eval=True \
  --bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt \
  --train_batch_size=32 \
  --max_seq_length=128 \
  --max_predictions_per_seq=20 \
  --num_train_steps=100000 \
  --num_warmup_steps=10000 \
  --save_checkpoints_steps=10000 \
  --learning_rate=5e-5

4) Fine-Tuning

Convert Tensorflow checkpoint to PyTorch checkpoint

python convert_tf_checkpoint_to_pytorch.py \
  --tf_checkpoint_path ./uncased_L-12_H-768_A-12_AGnews_pretrain/model.ckpt-100000 \
  --bert_config_file ./uncased_L-12_H-768_A-12_AGnews_pretrain/bert_config.json \
  --pytorch_dump_path ./uncased_L-12_H-768_A-12_AGnews_pretrain/pytorch_model.bin

Fine-Tuning on downstream tasks

While fine-tuning on downstream tasks, we notice that different GPU (e.g.: 1080Ti and Titan Xp) may cause slight differences in experimental results even though we fix the initial random seed. Here we use 1080Ti * 4 as example.

Take Exp-I (See Section 5.3) as example,

export CUDA_VISIBLE_DEVICES=0,1,2,3
python run_classifier_single_layer.py \
  --task_name imdb \
  --do_train \
  --do_eval \
  --do_lower_case \
  --data_dir ./IMDB_data/ \
  --vocab_file ./uncased_L-12_H-768_A-12_IMDB_pretrain/vocab.txt \
  --bert_config_file ./uncased_L-12_H-768_A-12_IMDB_pretrain/bert_config.json \
  --init_checkpoint ./uncased_L-12_H-768_A-12_IMDB_pretrain/pytorch_model.bin \
  --max_seq_length 512 \
  --train_batch_size 24 \
  --learning_rate 2e-5 \
  --num_train_epochs 3.0 \
  --output_dir ./imdb \
  --seed 42 \
  --layers 11 10 \
  --trunc_medium -1

where num_train_epochs can be 3.0, 4.0, or 6.0.

layers indicates list of layers which will be taken as feature for classification. -2 means use pooled output, -1 means concat all layer, the command above means concat layer-10 and layer-11 (last two layers).

trunc_medium indicates dealing with long texts. -2 means head-only, -1 means tail-only, 0 means head-half + tail-half (e.g.: head256+tail256), other natural number k means head-k + tail-rest (e.g.: head-k + tail-(512-k)).

There also other arguments for fine-tuning:

pooling_type indicates which feature will be used for classification. mean means mean-pooling for hidden state of the whole sequence, max means max-pooling, default means taking hidden state of [CLS] token as features.

layer_learning_rate and layer_learning_rate_decay in run_classifier_discriminative.py indicates layer-wise decreasing layer rate (See Section 5.3.4).

Further Pre-Trained Checkpoints

We upload IMDb-based further pre-trained checkpoints at here.

For other checkpoints, please contact us by e-mail.

How to cite our paper

@inproceedings{sun2019fine,
  title={How to fine-tune {BERT} for text classification?},
  author={Sun, Chi and Qiu, Xipeng and Xu, Yige and Huang, Xuanjing},
  booktitle={China National Conference on Chinese Computational Linguistics},
  pages={194--206},
  year={2019},
  organization={Springer}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 220

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗