Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → zhanlaoban → Transformers_for_text_classification

zhanlaoban / Transformers_for_text_classification

基于Transformers的文本分类

Programming Languages

139335 projects - #7 most used programming language

Labels

nlp text-classification

Projects that are alternatives of or similar to Transformers for text classification

Rcnn Text Classification

Tensorflow Implementation of "Recurrent Convolutional Neural Network for Text Classification" (AAAI 2015)

Stars: ✭ 127 (-19.62%)

Mutual labels: text-classification

Parselawdocuments

对收集的法律文档进行一系列分析，包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等（试验目前基于婚姻类案件，可扩展至其它领域）。

Stars: ✭ 138 (-12.66%)

Mutual labels: text-classification

"20 Newsgroups" text classification with python

Stars: ✭ 149 (-5.7%)

Mutual labels: text-classification

FastText for Node.js

Stars: ✭ 127 (-19.62%)

Mutual labels: text-classification

export bert model for serving

Stars: ✭ 138 (-12.66%)

Mutual labels: text-classification

Monkeylearn Python

Official Python client for the MonkeyLearn API. Build and consume machine learning models for language processing from your Python apps.

Stars: ✭ 143 (-9.49%)

Mutual labels: text-classification

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (-21.52%)

Mutual labels: text-classification

Nlp pytorch project

Embedding, NMT, Text_Classification, Text_Generation, NER etc.

Stars: ✭ 153 (-3.16%)

Mutual labels: text-classification

Document Classifier Lstm

A bidirectional LSTM with attention for multiclass/multilabel text classification.

Stars: ✭ 136 (-13.92%)

Mutual labels: text-classification

Text Classification Demos

Neural models for Text Classification in Tensorflow, such as cnn, dpcnn, fasttext, bert ...

Stars: ✭ 144 (-8.86%)

Mutual labels: text-classification

Textclassify with bert

使用BERT模型做文本分类；面向工业用途

Stars: ✭ 128 (-18.99%)

Mutual labels: text-classification

Hierarchical Multi Label Text Classification

The code of CIKM'19 paper《Hierarchical Multi-label Text Classification: An Attention-based Recurrent Network Approach》

Stars: ✭ 133 (-15.82%)

Mutual labels: text-classification

UDA(Unsupervised Data Augmentation) implemented by pytorch

Stars: ✭ 143 (-9.49%)

Mutual labels: text-classification

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (-19.62%)

Mutual labels: text-classification

Macadam是一个以Tensorflow(Keras)和bert4keras为基础，专注于文本分类、序列标注和关系抽取的自然语言处理工具包。支持RANDOM、WORD2VEC、FASTTEXT、BERT、ALBERT、ROBERTA、NEZHA、XLNET、ELECTRA、GPT-2等EMBEDDING嵌入; 支持FineTune、FastText、TextCNN、CharCNN、BiRNN、RCNN、DCNN、CRNN、DeepMoji、SelfAttention、HAN、Capsule等文本分类算法; 支持CRF、Bi-LSTM-CRF、CNN-LSTM、DGCNN、Bi-LSTM-LAN、Lattice-LSTM-Batch、MRC等序列标注算法。

Stars: ✭ 149 (-5.7%)

Mutual labels: text-classification

Cluedatasetsearch

搜索所有中文NLP数据集，附常用英文NLP数据集

Stars: ✭ 2,112 (+1236.71%)

Mutual labels: text-classification

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Stars: ✭ 143 (-9.49%)

Mutual labels: text-classification

Implementation of Very Deep Convolutional Neural Network for Text Classification

Stars: ✭ 158 (+0%)

Mutual labels: text-classification

Multi Label classification

transform multi-label classification as sentence pair task, with more training data and information

Stars: ✭ 151 (-4.43%)

Mutual labels: text-classification

A web app to create and browse text visualizations for automated customer listening.

Stars: ✭ 143 (-9.49%)

Mutual labels: text-classification

View All Similar Projects ➔

Transformers_for_Text_Classification

基于Transformers的文本分类

基于最新的 huggingface 出品的 transformers v2.2.2代码进行重构。为了保证代码日后可以直接复现而不出现兼容性问题，这里将 transformers 放在本地进行调用。

Highlights

支持transformer模型后接各种特征提取器
支持测试集预测代码
精简原始transformers代码，使之更适合文本分类任务
优化logging终端输出，使之输出内容更加合理

Support

model_type：

[x] bert
[x] bert_cnn
[x] bert_lstm
[x] bert_gru
[x] xlnet
[ ] xlnet_cnn
[x] xlnet_lstm
[x] xlnet_gru
[ ] albert

Content

dataset：存放数据集
pretrained_models：存放预训练模型
transformers：transformers文件夹
results：存放训练结果

Usage

1. 使用不同模型

在shell文件中修改model_type参数即可指定模型

如，BERT后接FC全连接层，则直接设置model_type=bert；BERT后接CNN卷积层，则设置model_type=bert_cnn.

在本README的Support中列出了本项目中各个预训练模型支持的model_type。

最后，在终端直接运行shell文件即可，如：

bash run_classifier.sh

注：在中文RoBERTa、ERNIE、BERT_wwm这三种预训练语言模型中，均使用BERT的model_type进行加载。

2. 使用自定义数据集

在dataset文件夹里存放自定义的数据集文件夹，如TestData.
在根目录下的utils.py中，仿照class THUNewsProcessor写一个自己的类，如命名为class TestDataProcessor，并在tasks_num_labels, processors, output_modes三个dict中添加相应内容.
最后，在你需要运行的shell文件中修改TASK_NAME为你的任务名称，如TestData.

Environment

one 2080Ti, 12GB RAM
Python: 3.6.5
PyTorch: 1.3.1
TensorFlow: 1.14.0(仅为了支持TensorBoard，无其他作用)
Numpy: 1.14.6

Performance

数据集: THUNews/5_5000

epoch:1

train_steps: 5000

model	dev set best F1 and Acc	remark
bert_base	0.9308869881728941, 0.9324	BERT接FC层, batch_size 8, learning_rate 2e-5
bert_base+cnn	0.9136314735833212, 0.9156	BERT接CNN层, batch_size 8, learning_rate 2e-5
bert_base+lstm	0.9369254464106703, 0.9372	BERT接LSTM层, batch_size 8, learning_rate 2e-5
bert_base+gru	0.9379539112313108, 0.938	BERT接GRU层, batch_size 8, learning_rate 2e-5
roberta_large		RoBERTa接FC层, batch_size 2, learning_rate 2e-5
xlnet_mid	0.9530066512880131, 0.954	XLNet接FC层, batch_size 2, learning_rate 2e-5
xlnet_mid+lstm	0.9269927348553552, 0.9304	XLNet接LSTM层, batch_size 2, learning_rate 2e-5
xlnet_mid+gru	0.9494631023945569, 0.9508	XLNet接GRU层, batch_size 2, learning_rate 2e-5
albert_xlarge_183k

Download Chinese Pre-trained Models

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 158

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗