Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → cjymz886 → Text_rnn_attention

cjymz886 / Text_rnn_attention

Licence: mit

嵌入Word2vec词向量的RNN+ATTENTION中文文本分类

Programming Languages

139335 projects - #7 most used programming language

Labels

tensorflow text-classification word2vec

Projects that are alternatives of or similar to Text rnn attention

Text Pairs Relation Classification

About Text Pairs (Sentence Level) Classification (Similarity Modeling) Based on Neural Network.

Stars: ✭ 182 (+55.56%)

Mutual labels: text-classification, word2vec

text-classification-cn

中文文本分类实践，基于搜狗新闻语料库，采用传统机器学习方法以及预训练模型等方法

Stars: ✭ 81 (-30.77%)

Mutual labels: text-classification, word2vec

An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.

Stars: ✭ 196 (+67.52%)

Mutual labels: text-classification, word2vec

Language Modeling and Text Classification in Malayalam Language using ULMFiT

Stars: ✭ 68 (-41.88%)

Mutual labels: text-classification, word2vec

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+5588.89%)

Mutual labels: text-classification, word2vec

FastText for Node.js

Stars: ✭ 127 (+8.55%)

Mutual labels: text-classification, word2vec

sarcasm-detection-for-sentiment-analysis

Sarcasm Detection for Sentiment Analysis

Stars: ✭ 21 (-82.05%)

Mutual labels: text-classification, word2vec

ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python

Stars: ✭ 127 (+8.55%)

Mutual labels: text-classification, word2vec

word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

Stars: ✭ 360 (+207.69%)

Mutual labels: text-classification, word2vec

嵌入Word2vec词向量的CNN中文文本分类

Stars: ✭ 298 (+154.7%)

Mutual labels: text-classification, word2vec

Product-Categorization-NLP

Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).

Stars: ✭ 30 (-74.36%)

Mutual labels: text-classification, word2vec

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+575.21%)

Mutual labels: text-classification, word2vec

基于Pytorch和torchtext的自然语言处理深度学习框架。

Stars: ✭ 739 (+531.62%)

Mutual labels: text-classification, word2vec

Few Shot Text Classification

Few-shot binary text classification with Induction Networks and Word2Vec weights initialization

Stars: ✭ 32 (-72.65%)

Mutual labels: text-classification, word2vec

Your Advanced Twitter stalking tool

Stars: ✭ 98 (-16.24%)

Mutual labels: text-classification

TextClf ：基于Pytorch/Sklearn的文本分类框架，包括逻辑回归、SVM、TextCNN、TextRNN、TextRCNN、DRNN、DPCNN、Bert等多种模型，通过简单配置即可完成数据处理、模型训练、测试等过程。

Stars: ✭ 105 (-10.26%)

Mutual labels: word2vec

Text Summarizer

Python Framework for Extractive Text Summarization

Stars: ✭ 96 (-17.95%)

Mutual labels: word2vec

Postgres Word2vec

utils to use word embedding like word2vec vectors in a postgres database

Stars: ✭ 96 (-17.95%)

Mutual labels: word2vec

Pytorch Rnn Text Classification

Word Embedding + LSTM + FC

Stars: ✭ 112 (-4.27%)

Mutual labels: text-classification

DELTA is a deep learning based natural language and speech processing platform.

Stars: ✭ 1,479 (+1164.1%)

Mutual labels: text-classification

View All Similar Projects ➔

Text classification with CNN and Word2vec

本文是继自己上的blog“text-cnn”后，基于同样的数据集，嵌入词级别所做的RNN+ATTENTION模型所做的文本分类实验结果；

本实验的主要目是为了探究在同样的数据情况，CNN模型与RNN+attention模型训练的效果对比，训练结果显示在验证集上CNN为96.5%，RNN+attention为96.8%；

有兴趣可以阅读我的：text-cnn

1 环境

python3
tensorflow 1.3以上CPU环境下
gensim
jieba
scipy
numpy
scikit-learn

2 RNN循环神经网络+attention机制

模型RNN+ATTENTION配置的参数在text_model.py中，具体为：

模型RNN+ATTENTION大致结构为：

3 数据集

本实验同样是使用THUCNews的一个子集进行训练与测试，数据集请自行到THUCTC：一个高效的中文文本分类工具包下载，请遵循数据提供方的开源协议;

文本类别涉及10个类别：categories = ['体育', '财经', '房产', '家居', '教育', '科技', '时尚', '时政', '游戏', '娱乐']，每个分类6500条数据；

cnews.train.txt: 训练集(5000*10)

cnews.val.txt: 验证集(500*10)

cnews.test.txt: 测试集(1000*10)

训练所用的数据，以及训练好的词向量可以下载：链接: https://pan.baidu.com/s/1cBZZE6UTsNb5utkg4k6TOQ，密码: 5y1a

4 预处理

本实验主要对训练文本进行分词处理，一来要分词训练词向量，二来输入模型的以词向量的形式；

另外，除掉文本的标点符号，也使用./data/stopwords.txt文件进行停用词过滤;

处理的程序都放在loader.py文件中；

5 运行步骤

python train_word2vec.py，对训练数据进行分词，利用Word2vec训练词向量(vector_word.txt)

python text_train.py，进行训练模型

python text_test.py，对模型进行测试

python text_predict.py，提供模型的预测

6 训练结果

运行：python text_train.py

本实验经过2轮的迭代，满足终止条件结束，在global_step=1500时在验证集得到最佳效果96.8%

7 测试结果

运行：python text_test.py

对测试数据集显示，test_loss=0.14，test_accuracy=95.8%，其中“体育”类测试为100%，整体的precision=recall=F1=96%;
而CNN模型的测试结果为：test_loss=0.13，test_accuracy=96.7%，precision=recall=F1=97%

8 预测结果

运行:python text_predict.py

随机从测试数据中挑选了五个样本，输出原文本和它的原文本标签和预测的标签，下图中5个样本预测的都是对的；

9 对比结论

在与cnn模型对比中发现，训练中在验证集上准确率96.8%是略优于cnn的，但是在测试集上，并没有cnn模型表现的好；我推测的其中原因是，CNN处理文本的长度为600，而RNN+ATTION处理的文本长度为200，而后者也不能处理太长的文本，文本越长，包含的特征信息越多，所以从整体上来看，我个人觉得CNN模型更适合长文本的分类任务。

10 参考

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 117

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗