Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → yongzhuo → Keras Textclassification

yongzhuo / Keras Textclassification

Licence: mit

中文长文本分类、短句子分类、多标签分类、两句子相似度（Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short），字词句向量嵌入层（embeddings）和网络层（graph）构建基类，FastText，TextCNN，CharCNN，TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN

Programming Languages

python

139335 projects - #7 most used programming language

Labels

nlp keras transformer text-classification embeddings fasttext crnn rcnn

Projects that are alternatives of or similar to Keras Textclassification

Embedding As Service

One-Stop Solution to encode sentence to fixed length vectors from various embedding techniques

Stars: ✭ 151 (-83.48%)

Mutual labels: embeddings, fasttext, transformer

Textclassificationbenchmark

A Benchmark of Text Classification in PyTorch

Stars: ✭ 534 (-41.58%)

Mutual labels: text-classification, crnn, rcnn

Textclassification Keras

Text classification models implemented in Keras, including: FastText, TextCNN, TextRNN, TextBiRNN, TextAttBiRNN, HAN, RCNN, RCNNVariant, etc.

Stars: ✭ 621 (-32.06%)

Mutual labels: text-classification, fasttext, rcnn

Text Classification Models Pytorch

Implementation of State-of-the-art Text Classification Models in Pytorch

Stars: ✭ 379 (-58.53%)

Mutual labels: fasttext, rcnn, transformer

Fastrtext

R wrapper for fastText

Stars: ✭ 103 (-88.73%)

Mutual labels: text-classification, embeddings, fasttext

Text classification

all kinds of text classification models and more with deep learning

Stars: ✭ 7,179 (+685.45%)

Mutual labels: text-classification, fasttext

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-97.59%)

Mutual labels: text-classification, transformer

node-fasttext

Nodejs binding for fasttext representation and classification.

Stars: ✭ 39 (-95.73%)

Mutual labels: text-classification, fasttext

Bert Multitask Learning

BERT for Multitask Learning

Stars: ✭ 380 (-58.42%)

Mutual labels: text-classification, transformer

extremeText

Library for fast text representation and extreme classification.

Stars: ✭ 141 (-84.57%)

Mutual labels: text-classification, fasttext

Contextualized Topic Models

A python package to run contextualized topic modeling. CTMs combine BERT with topic models to get coherent topics. Also supports multilingual tasks. Cross-lingual Zero-shot model published at EACL 2021.

Stars: ✭ 318 (-65.21%)

Mutual labels: embeddings, transformer

Fast sentence embeddings

Compute Sentence Embeddings Fast!

Stars: ✭ 384 (-57.99%)

Mutual labels: embeddings, fasttext

Persian-Sentiment-Analyzer

Persian sentiment analysis ( آناکاوی سهش های فارسی | تحلیل احساسات فارسی )

Stars: ✭ 30 (-96.72%)

Mutual labels: embeddings, fasttext

medical-diagnosis-cnn-rnn-rcnn

分别使用rnn/cnn/rcnn来实现根据患者描述，进行疾病诊断

Stars: ✭ 39 (-95.73%)

Mutual labels: text-classification, rcnn

Eda nlp

Data augmentation for NLP, presented at EMNLP 2019

Stars: ✭ 902 (-1.31%)

Mutual labels: text-classification, embeddings

Text and Audio classification with Bert

Text Classification in Turkish Texts with Bert

Stars: ✭ 34 (-96.28%)

Mutual labels: text-classification, embeddings

Multi Class Text Classification Cnn

Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.

Stars: ✭ 410 (-55.14%)

Mutual labels: text-classification, embeddings

Bert language understanding

Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN

Stars: ✭ 933 (+2.08%)

Mutual labels: text-classification, fasttext

FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Stars: ✭ 204 (-77.68%)

Mutual labels: text-classification, transformer

Embedding

Embedding模型代码和学习笔记总结

Stars: ✭ 25 (-97.26%)

Mutual labels: transformer, fasttext

View All Similar Projects ➔

Keras-TextClassification

Install(安装)

pip install Keras-TextClassification

step2: download and unzip the dir of 'data.rar', 地址: https://pan.baidu.com/s/1I3vydhmFEQ9nuPG2fDou8Q 提取码: rket
       cover the dir of data to anaconda, like '/anaconda/3.5.1/envs/tensorflow13/Lib/site-packages/keras_textclassification/data'
step3: goto # Train&Usage(调用) and Predict&Usage(调用)

keras_textclassification（代码主体,未完待续...）

- Electra-fineture(todo)
- Albert-fineture
- Xlnet-fineture
- Bert-fineture
- FastText
- TextCNN
- charCNN
- TextRNN
- TextRCNN
- TextDCNN
- TextDPCNN
- TextVDCNN
- TextCRNN
- DeepMoji
- SelfAttention
- HAN
- CapsuleNet
- Transformer-encode
- SWEM
- LEAM
- TextGCN(todo)

run(运行, 以FastText为例)

- 1. 进入keras_textclassification/m01_FastText目录，
- 2. 训练: 运行 train.py,   例如: python train.py
- 3. 预测: 运行 predict.py, 例如: python predict.py
- 说明: 默认不带pre train的random embedding，训练和验证语料只有100条，完整语料移步下面data查看下载

run(多标签分类/Embedding/test/sample实例)

- bert,word2vec,random样例在test/目录下, 注意word2vec(char or word), random-word,  bert(chinese_L-12_H-768_A-12)未全部加载,需要下载
- multi_multi_class/目录下以text-cnn为例进行多标签分类实例，转化为multi-onehot标签类别，分类则取一定阀值的类
- sentence_similarity/目录下以bert为例进行两个句子文本相似度计算,数据格式如data/sim_webank/目录下所示
- predict_bert_text_cnn.py
- tet_char_bert_embedding.py
- tet_char_bert_embedding.py
- tet_char_xlnet_embedding.py
- tet_char_random_embedding.py
- tet_char_word2vec_embedding.py
- tet_word_random_embedding.py
- tet_word_word2vec_embedding.py

keras_textclassification/data

- 数据下载
  ** github项目中只是上传部分数据，需要的前往链接: https://pan.baidu.com/s/1I3vydhmFEQ9nuPG2fDou8Q 提取码: rket
- baidu_qa_2019（百度qa问答语料，只取title作为分类样本，17个类，有一个是空''，已经压缩上传）
   - baike_qa_train.csv
   - baike_qa_valid.csv
- byte_multi_news（今日头条2018新闻标题多标签语料，1070个标签，fate233爬取, 地址为: [byte_multi_news](https://github.com/fate233/toutiao-multilevel-text-classfication-dataset)）
   -labels.csv
   -train.csv
   -valid.csv
- embeddings
   - chinese_L-12_H-768_A-12/(取谷歌预训练好点的模型,已经压缩上传,
                              keras-bert还可以加载百度版ernie(需转换，[https://github.com/ArthurRizar/tensorflow_ernie](https://github.com/ArthurRizar/tensorflow_ernie)),
                              哈工大版bert-wwm(tf框架，[https://github.com/ymcui/Chinese-BERT-wwm](https://github.com/ymcui/Chinese-BERT-wwm))
   - albert_base_zh/(brightmart训练的albert, 地址为https://github.com/brightmart/albert_zh)
   - chinese_xlnet_mid_L-24_H-768_A-12/(哈工大预训练的中文xlnet模型[https://github.com/ymcui/Chinese-PreTrained-XLNet],24层)
   - term_char.txt(已经上传, 项目中已全, wiki字典, 还可以用新华字典什么的)
   - term_word.txt(未上传, 项目中只有部分, 可参考词向量的)
   - w2v_model_merge_short.vec(未上传, 项目中只有部分, 词向量, 可以用自己的)
   - w2v_model_wiki_char.vec(已上传百度网盘, 项目中只有部分, 自己训练的维基百科字向量, 可以用自己的)
- model
   - fast_text/预训练模型存放地址

项目说明

1. 构建了base基类(网络(graph)、向量嵌入(词、字、句子embedding)),后边的具体模型继承它们，代码简单
1. keras_layers存放一些常用的layer, conf存放项目数据、模型的地址, data存放数据和语料, data_preprocess为数据预处理模块,

模型与论文paper题与地址

FastText: Bag of Tricks for Efﬁcient Text Classiﬁcation
TextCNN： Convolutional Neural Networks for Sentence Classiﬁcation
charCNN-kim： Character-Aware Neural Language Models
charCNN-zhang: Character-level Convolutional Networks for Text Classiﬁcation
TextRNN： Recurrent Neural Network for Text Classification with Multi-Task Learning
RCNN： Recurrent Convolutional Neural Networks for Text Classification
DCNN: A Convolutional Neural Network for Modelling Sentences
DPCNN: Deep Pyramid Convolutional Neural Networks for Text Categorization
VDCNN: Very Deep Convolutional Networks
CRNN: A C-LSTM Neural Network for Text Classification
DeepMoji: Using millions of emojio ccurrences to learn any-domain represent ations for detecting sentiment, emotion and sarcasm
SelfAttention: Attention Is All You Need
HAN: Hierarchical Attention Networks for Document Classification
CapsuleNet: Dynamic Routing Between Capsules
Transformer(encode or decode): Attention Is All You Need
Bert: BERT: Pre-trainingofDeepBidirectionalTransformersfor LanguageUnderstanding
Xlnet: XLNet: Generalized Autoregressive Pretraining for Language Understanding
Albert: ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS
RoBERTa: RoBERTa: A Robustly Optimized BERT Pretraining Approach
ELECTRA: ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
TextGCN: Graph Convolutional Networks for Text Classification

参考/感谢

文本分类项目: https://github.com/mosu027/TextClassification
文本分类看山杯: https://github.com/brightmart/text_classification
Kashgari项目: https://github.com/BrikerMan/Kashgari
文本分类Ipty : https://github.com/lpty/classifier
keras文本分类: https://github.com/ShawnyXiao/TextClassification-Keras
keras文本分类: https://github.com/AlexYangLi/TextClassification
CapsuleNet模型: https://github.com/bojone/Capsule
transformer模型: https://github.com/CyberZHG/keras-transformer
keras_albert_model: https://github.com/TinkerMob/keras_albert_model

训练简单调用:

from keras_textclassification import train
train(graph='TextCNN', # 必填, 算法名, 可选"ALBERT","BERT","XLNET","FASTTEXT","TEXTCNN","CHARCNN",
                       # "TEXTRNN","RCNN","DCNN","DPCNN","VDCNN","CRNN","DEEPMOJI",
                       # "SELFATTENTION", "HAN","CAPSULE","TRANSFORMER"
     label=17,         # 必填, 类别数, 训练集和测试集合必须一样
     path_train_data=None, # 必填, 训练数据文件, csv格式, 必须含'label,ques'头文件, 详见keras_textclassification/data
     path_dev_data=None, # 必填, 测试数据文件, csv格式, 必须含'label,ques'头文件, 详见keras_textclassification/data
     rate=1,             # 可填, 训练数据选取比例
     hyper_parameters=None) # 可填, json格式, 超参数, 默认embedding为'char','random'

Reference

For citing this work, you can refer to the present GitHub project. For example, with BibTeX:

@misc{Keras-TextClassification,
    howpublished = {\url{https://github.com/yongzhuo/Keras-TextClassification}},
    title = {Keras-TextClassification},
    author = {Yongzhuo Mo},
    publisher = {GitHub},
    year = {2019}
}

*希望对你有所帮助!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 914

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗