Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yuanxiaosc → Bert For Sequence Labeling And Text Classification

yuanxiaosc / Bert For Sequence Labeling And Text Classification

Licence: apache-2.0

This is the template code to use BERT for sequence lableing and text classification, in order to facilitate BERT for more tasks. Currently, the template code has included conll-2003 named entity identification, Snips Slot Filling and Intent Prediction.

Programming Languages

python

139335 projects - #7 most used programming language

Labels

text-classification sequence-labeling

Projects that are alternatives of or similar to Bert For Sequence Labeling And Text Classification

Nlp Projects

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

Stars: ✭ 360 (+22.87%)

Mutual labels: text-classification, sequence-labeling

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+662.8%)

Mutual labels: text-classification, sequence-labeling

Neuronblocks

NLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego

Stars: ✭ 1,356 (+362.8%)

Mutual labels: text-classification, sequence-labeling

Pytorch-NLU

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (-48.46%)

Mutual labels: text-classification, sequence-labeling

Macadam

Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。

Stars: ✭ 149 (-49.15%)

Mutual labels: text-classification, sequence-labeling

Delft

a Deep Learning Framework for Text

Stars: ✭ 289 (-1.37%)

Mutual labels: text-classification, sequence-labeling

WeSTClass

[CIKM 2018] Weakly-Supervised Neural Text Classification

Stars: ✭ 67 (-77.13%)

Mutual labels: text-classification

Ner Pytorch

LSTM+CRF NER

Stars: ✭ 260 (-11.26%)

Mutual labels: sequence-labeling

PIE

Fast + Non-Autoregressive Grammatical Error Correction using BERT. Code and Pre-trained models for paper "Parallel Iterative Edit Models for Local Sequence Transduction": www.aclweb.org/anthology/D19-1435.pdf (EMNLP-IJCNLP 2019)

Stars: ✭ 164 (-44.03%)

Mutual labels: sequence-labeling

Kaggle-Twitter-Sentiment-Analysis

Kaggle Twitter Sentiment Analysis Competition

Stars: ✭ 18 (-93.86%)

Mutual labels: text-classification

Gector

Official implementation of the paper “GECToR – Grammatical Error Correction: Tag, Not Rewrite” // Published on BEA15 Workshop (co-located with ACL 2020) https://www.aclweb.org/anthology/2020.bea-1.16.pdf

Stars: ✭ 287 (-2.05%)

Mutual labels: sequence-labeling

Hscrf Pytorch

ACL 2018: Hybrid semi-Markov CRF for Neural Sequence Labeling (http://aclweb.org/anthology/P18-2038)

Stars: ✭ 284 (-3.07%)

Mutual labels: sequence-labeling

A Pytorch Tutorial To Sequence Labeling

Empower Sequence Labeling with Task-Aware Neural Language Model | a PyTorch Tutorial to Sequence Labeling

Stars: ✭ 257 (-12.29%)

Mutual labels: sequence-labeling

TextUnderstandingTsetlinMachine

Using the Tsetlin Machine to learn human-interpretable rules for high-accuracy text categorization with medical applications

Stars: ✭ 48 (-83.62%)

Mutual labels: text-classification

Nagisa

A Japanese tokenizer based on recurrent neural networks

Stars: ✭ 260 (-11.26%)

Mutual labels: sequence-labeling

HiLAP

Code for paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019

Stars: ✭ 116 (-60.41%)

Mutual labels: text-classification

Bertweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Stars: ✭ 282 (-3.75%)

Mutual labels: text-classification

node-fasttext

Nodejs binding for fasttext representation and classification.

Stars: ✭ 39 (-86.69%)

Mutual labels: text-classification

detecting-offensive-language-in-tweets

Detecting cyberbullying in tweets using Machine Learning

Stars: ✭ 19 (-93.52%)

Mutual labels: text-classification

Rnn For Joint Nlu

Tensorflow implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling" (https://arxiv.org/abs/1609.01454)

Stars: ✭ 281 (-4.1%)

Mutual labels: sequence-labeling

View All Similar Projects ➔

Template Code: BERT-for-Sequence-Labeling-and-Text-Classification

BERT is used for sequence annotation and text categorization template code to facilitate BERT for more tasks. The code has been tested on snips (intention recognition and slot filling task), ATIS (intention recognition and slot filling task) and conll-2003 (named entity recognition task) datasets. Welcome to use this BERT template to solve more NLP tasks, and then share your results and code here.

这是使用BERT进行序列标注和文本分类的模板代码，方便大家将BERT用于更多任务。该代码已经在SNIPS（意图识别和槽填充任务）、ATIS（意图识别和槽填充任务）和conll-2003（命名实体识别任务）数据集上进行了实验。欢迎使用这个BERT模板解决更多NLP任务，然后在这里分享你的结果和代码。

Task and Dataset

I have downloaded the data for you. Welcome to add new data set.

task name	dataset name	data source
CoNLL-2003 named entity recognition	conll2003ner	https://www.clips.uantwerpen.be/conll2003/ner/
Atis Joint Slot Filling and Intent Prediction	atis	https://github.com/MiuLab/SlotGated-SLU/tree/master/data/atis
Snips Joint Slot Filling and Intent Prediction	snips	https://github.com/MiuLab/SlotGated-SLU/tree/master/data/snips

Environment Requirements

Use pip install -r requirements.txt to install dependencies quickly.

python 3.6+
Tensorflow 1.12.0+
sklearn

Template Code Usage Method

Using pre training and fine-tuning model directly

For example: Atis Joint Slot Filling and Intent Prediction

Download model weight atis_join_task_LSTM_epoch30_simple.zip and unzip then to file store_fine_tuned_model, https://pan.baidu.com/s/1SZkQXP8NrOtZKVEMfDE4bw;
Run Code! You can change task_name and output_dir.

python run_slot_intent_join_task_LSTM.py \
  --task_name=Atis \
  --do_predict=true \
  --data_dir=data/atis_Intent_Detection_and_Slot_Filling \
  --vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=store_fine_tuned_model/atis_join_task_LSTM_epoch30_simple/model.ckpt-4198 \
  --max_seq_length=128 \
  --output_dir=./output_model_predict/atis_join_task_LSTM_epoch30_simple_ckpt4198

You can find the file of model prediction and the score of model prediction in output_dir (You can find the content of model socres later).

Quick start(model train and predict)

See predefined_task_usage.md for more predefined task usage codes.

Move google's BERT code to file bert (I've prepared a copy for you.);
Download google's BERT pretrained model and unzip then to file pretrained_model, https://github.com/google-research/bert;
Run Code! You can change task_name and output_dir.

model training

python run_sequence_labeling_and_text_classification.py \
  --task_name=snips \
  --do_train=true \
  --do_eval=true \
  --data_dir=data/snips_Intent_Detection_and_Slot_Filling \
  --vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=pretrained_model/uncased_L-12_H-768_A-12/bert_model.ckpt \
  --num_train_epochs=3.0 \
  --output_dir=./store_fine_tuned_model/snips_join_task_epoch3/

Then you can find the fine tuned model in the output_dir=./store_fine_tuned_model/snips_join_task_epoch3/ folder.

model prediction

python run_sequence_labeling_and_text_classification.py \
  --task_name=Snips \
  --do_predict=true \
  --data_dir=data/snips_Intent_Detection_and_Slot_Filling \
  --vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
  --bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
  --init_checkpoint=output_model/snips_join_task_epoch3/model.ckpt-1000 \
  --max_seq_length=128 \
  --output_dir=./output_model_prediction/snips_join_task_epoch3_ckpt1000

Then you can find the predicted output of the model and the output test results (accuracy, recall, F1 value, etc.) in the output_dir=./output_model_prediction/snips_join_task_epoch3_ckpt1000 folder.

File Structure

name	function
bert	store google's BERT code
data	store task raw data set
output_model_prediction	store model predict
store_fine_tuned_model	store finet tuned model
calculating_model_score
pretrained_model	store BERT pretrained model
run_sequence_labeling.py	for Sequence Labeling Task
run_text_classification.py	for Text Classification Task
run_sequence_labeling_and_text_classification.py	for join task
calculate_model_score.py	for evaluation model

Model Socres

The following model scores are model scores without careful adjustment of model parameters, that is to say, the scores can continue to improve!

CoNLL-2003 named entity recognition

eval_f = 0.926 eval_precision = 0.925 eval_recall = 0.928

Atis Joint Slot Filling and Intent Prediction

Intent Prediction Correct rate: 0.976 Accuracy: 0.976 Recall rate: 0.976 F1-score: 0.976

Slot Filling19 Correct rate: 0.955 Accuracy: 0.955 Recall rate: 0.955 F1-score: 0.955

How to add a new task

Just write a small piece of code according to the existing template!

Data

For example, If you have a new classification task QQP.

Before running this example you must download the GLUE data by running this script.

Code

Now, write code!

class QqpProcessor(DataProcessor):
    """Processor for the QQP data set."""

    def get_train_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "train.tsv")), "train")

    def get_dev_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "dev.tsv")), "dev")

    def get_test_examples(self, data_dir):
        """See base class."""
        return self._create_examples(
            self._read_tsv(os.path.join(data_dir, "test.tsv")), "test")

    def get_labels(self):
        """See base class."""
        return ["0", "1"]

    def _create_examples(self, lines, set_type):
        """Creates examples for the training and dev sets."""
        examples = []
        for (i, line) in enumerate(lines):
            if i == 0 or len(line)!=6:
                continue
            guid = "%s-%s" % (set_type, i)
            text_a = tokenization.convert_to_unicode(line[3])
            text_b = tokenization.convert_to_unicode(line[4])
            if set_type == "test":
                label = "1"
            else:
                label = tokenization.convert_to_unicode(line[5])
            examples.append(
                InputExample(guid=guid, text_a=text_a, text_b=text_b, label=label))
        return examples

Registration task

def main(_):
   tf.logging.set_verbosity(tf.logging.INFO)
   processors = {
       "qqp": QqpProcessor,
   }

Run

python run_text_classification.py \
--task_name=qqp \
--do_train=true \
--do_eval=true \
--data_dir=data/snips_Intent_Detection_and_Slot_Filling \
--vocab_file=pretrained_model/uncased_L-12_H-768_A-12/vocab.txt \
--bert_config_file=pretrained_model/uncased_L-12_H-768_A-12/bert_config.json \
--init_checkpoint=pretrained_model/uncased_L-12_H-768_A-12/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=./output/qqp_Intent_Detection/

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 293

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (13) 🔗