Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → lonePatient → ERNIE-text-classification-pytorch

lonePatient / ERNIE-text-classification-pytorch

Licence: other

This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.

Programming Languages

139335 projects - #7 most used programming language

Labels

text-classification pytorch bert chinese-text-classification ernie

Projects that are alternatives of or similar to ERNIE-text-classification-pytorch

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (+87.76%)

Mutual labels: text-classification, bert, chinese-text-classification

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+208.16%)

Mutual labels: text-classification, bert, chinese-text-classification

A PyTorch-based toolkit for natural language processing

Stars: ✭ 85 (+73.47%)

Mutual labels: text-classification, bert

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Stars: ✭ 166 (+238.78%)

Mutual labels: text-classification, bert

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-55.1%)

Mutual labels: text-classification, bert

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (+22.45%)

Mutual labels: text-classification, bert

[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.

Stars: ✭ 62 (+26.53%)

Mutual labels: text-classification, bert

Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT

Stars: ✭ 15 (-69.39%)

Mutual labels: text-classification, bert

classifier multi label

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification

Stars: ✭ 127 (+159.18%)

Mutual labels: text-classification, bert

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+5038.78%)

Mutual labels: text-classification, bert

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (-32.65%)

Mutual labels: text-classification, bert

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+13483.67%)

Mutual labels: text-classification, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-51.02%)

Mutual labels: text-classification, bert

Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!

Stars: ✭ 33 (-32.65%)

Mutual labels: text-classification, bert

Weakly supervised medical named entity classification

Stars: ✭ 55 (+12.24%)

Mutual labels: text-classification, bert

Text and Audio classification with Bert

Text Classification in Turkish Texts with Bert

Stars: ✭ 34 (-30.61%)

Mutual labels: text-classification, bert

protonet-bert-text-classification

finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet

Stars: ✭ 28 (-42.86%)

Mutual labels: text-classification, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+138.78%)

Mutual labels: text-classification, bert

classifier multi label seq2seq attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Stars: ✭ 26 (-46.94%)

Mutual labels: text-classification, bert

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-55.1%)

Mutual labels: text-classification, bert

View All Similar Projects ➔

ERNIE text classification by PyTorch

This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.

arxiv: https://arxiv.org/abs/1904.09223v1

Structure of the code

At the root of the project, you will see:

├── pyernie
|  └── callback
|  |  └── lrscheduler.py　　
|  |  └── trainingmonitor.py　
|  |  └── ...
|  └── config
|  |  └── basic_config.py #a configuration file for storing model parameters
|  └── dataset　　　
|  └── io　　　　
|  |  └── dataset.py　　
|  |  └── data_transformer.py　　
|  └── model
|  |  └── nn　
|  |  └── pretrain　
|  └── output #save the ouput of model
|  └── preprocessing #text preprocessing 
|  └── train #used for training a model
|  |  └── trainer.py 
|  |  └── ...
|  └── utils # a set of utility functions
├── convert_ernie_to_pytorch.py
├── fine_tune_ernie.py

Dependencies

csv
tqdm
numpy
pickle
scikit-learn
PyTorch 1.0
matplotlib
tensorboardX
Tensorflow (to be able to run TensorboardX)

How to use the code

you need download pretrained ERNIE model

Download the pretrained ERNIE model from baiduPan {password: uwds} and place it into the /pyernie/model/pretrain directory.
prepare Chinese raw data(example,news data), you can modify the io.data_transformer.py to adapt your data.
Modify configuration information in pyernie/config/basic_config.py(the path of data,...).
run fine_tune_ernie.py.

Fine-tuning result

training

Epoch: 4 - loss: 0.0136 - f1: 0.9967 - valid_loss: 0.0761 - valid_f1: 0.9798

train classify_report

label	precision	recall	f1-score	support
财经	0.99	0.99	0.99	3500
体育	1.00	1.00	1.00	3500
娱乐	1.00	1.00	1.00	3500
家居	1.00	1.00	1.00	3500
房产	0.99	0.99	0.99	3500
教育	1.00	0.99	1.00	3500
时尚	1.00	1.00	1.00	3500
时政	1.00	1.00	1.00	3500
游戏	1.00	1.00	1.00	3500
科技	0.99	1.00	1.00	3500
avg / total	1.00	1.00	1.00	35000

valid classify_report

label	precision	recall	f1-score	support
财经	0.97	0.96	0.96	1500
体育	1.00	1.00	1.00	1500
娱乐	0.99	0.99	0.99	1500
家居	0.99	0.99	0.99	1500
房产	0.96	0.96	0.96	1500
教育	0.98	0.98	0.98	1500
时尚	0.99	0.99	0.99	1500
时政	0.97	0.98	0.98	1500
游戏	0.99	0.99	0.99	1500
科技	0.97	0.97	0.97	1500
avg / total	0.98	0.98	0.98	15000

training figure

Tips

When converting the tensorflow checkpoint into the pytorch, it's expected to choice the "bert_model.ckpt", instead of "bert_model.ckpt.index", as the input file. Otherwise, you will see that the model can learn nothing and give almost same random outputs for any inputs. This means, in fact, you have not loaded the true ckpt for your model
When using multiple GPUs, the non-tensor calculations, such as accuracy and f1_score, are not supported by DataParallel instance
As recommanded by Jocob in his paper https://arxiv.org/pdf/1810.04805.pdf, in fine-tuning tasks, the hyperparameters are expected to set as following: Batch_size: 16 or 32, learning_rate: 5e-5 or 2e-5 or 3e-5, num_train_epoch: 3 or 4
The pretrained model has a limit for the sentence of input that its length should is not larger than 512, the max position embedding dim. The data flows into the model as: Raw_data -> WordPieces -> Model. Note that the length of wordPieces is generally larger than that of raw_data, so a safe max length of raw_data is at ~128 - 256
Upon testing, we found that fine-tuning all layers could get much better results than those of only fine-tuning the last classfier layer. The latter is actually a feature-based way

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 49

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗