ShaneTian / Textcnn
Licence: gpl-3.0
TextCNN by TensorFlow 2.0.0 ( tf.keras mainly ).
Stars: ✭ 37
Programming Languages
Labels
Projects that are alternatives of or similar to Textcnn
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+2035.14%)
Mutual labels: text-classification
Text classification
all kinds of text classification models and more with deep learning
Stars: ✭ 7,179 (+19302.7%)
Mutual labels: text-classification
Nlp xiaojiang
自然语言处理(nlp),小姜机器人(闲聊检索式chatbot),BERT句向量-相似度(Sentence Similarity),XLNET句向量-相似度(text xlnet embedding),文本分类(Text classification), 实体提取(ner,bert+bilstm+crf),数据增强(text augment, data enhance),同义句同义词生成,句子主干提取(mainpart),中文汉语短文本相似度,文本特征工程,keras-http-service调用
Stars: ✭ 954 (+2478.38%)
Mutual labels: text-classification
Bert language understanding
Pre-training of Deep Bidirectional Transformers for Language Understanding: pre-train TextCNN
Stars: ✭ 933 (+2421.62%)
Mutual labels: text-classification
Keras Textclassification
中文长文本分类、短句子分类、多标签分类、两句子相似度(Chinese Text Classification of Keras NLP, multi-label classify, or sentence classify, long or short),字词句向量嵌入层(embeddings)和网络层(graph)构建基类,FastText,TextCNN,CharCNN,TextRNN, RCNN, DCNN, DPCNN, VDCNN, CRNN, Bert, Xlnet, Albert, Attention, DeepMoji, HAN, 胶囊网络-CapsuleNet, Transformer-encode, Seq2seq, SWEM, LEAM, TextGCN
Stars: ✭ 914 (+2370.27%)
Mutual labels: text-classification
Tf Rnn Attention
Tensorflow implementation of attention mechanism for text classification tasks.
Stars: ✭ 735 (+1886.49%)
Mutual labels: text-classification
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-8.11%)
Mutual labels: text-classification
Nlp tensorflow project
Use tensorflow to achieve some NLP project, eg: classification chatbot ner attention QAetc.
Stars: ✭ 27 (-27.03%)
Mutual labels: text-classification
Omnicat Bayes
Naive Bayes text classification implementation as an OmniCat classifier strategy. (#ruby #naivebayes)
Stars: ✭ 30 (-18.92%)
Mutual labels: text-classification
Concise Ipython Notebooks For Deep Learning
Ipython Notebooks for solving problems like classification, segmentation, generation using latest Deep learning algorithms on different publicly available text and image data-sets.
Stars: ✭ 23 (-37.84%)
Mutual labels: text-classification
Text gcn
Graph Convolutional Networks for Text Classification. AAAI 2019
Stars: ✭ 945 (+2454.05%)
Mutual labels: text-classification
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+2037.84%)
Mutual labels: text-classification
Few Shot Text Classification
Few-shot binary text classification with Induction Networks and Word2Vec weights initialization
Stars: ✭ 32 (-13.51%)
Mutual labels: text-classification
Text2gender
Predict the author's gender from their text.
Stars: ✭ 14 (-62.16%)
Mutual labels: text-classification
Nlp Experiments In Pytorch
PyTorch repository for text categorization and NER experiments in Turkish and English.
Stars: ✭ 35 (-5.41%)
Mutual labels: text-classification
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-13.51%)
Mutual labels: text-classification
Cnn Question Classification Keras
Chinese Question Classifier (Keras Implementation) on BQuLD
Stars: ✭ 28 (-24.32%)
Mutual labels: text-classification
TextCNN
TextCNN by TensorFlow 2.0.0 ( tf.keras mainly ).
Software environments
- tensorflow-gpu 2.0.0-alpha0
- python 3.6.7
- pandas 0.24.2
- numpy 1.16.2
Data
- Vocabulary size: 3407
- Number of classes: 18
- Train/Test split: 20351/2261
Model architecture
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_data (InputLayer) [(None, 128)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 128, 512) 1744384 input_data[0][0]
__________________________________________________________________________________________________
add_channel (Reshape) (None, 128, 512, 1) 0 embedding[0][0]
__________________________________________________________________________________________________
convolution_3 (Conv2D) (None, 126, 1, 128) 196736 add_channel[0][0]
__________________________________________________________________________________________________
convolution_4 (Conv2D) (None, 125, 1, 128) 262272 add_channel[0][0]
__________________________________________________________________________________________________
convolution_5 (Conv2D) (None, 124, 1, 128) 327808 add_channel[0][0]
__________________________________________________________________________________________________
max_pooling_3 (MaxPooling2D) (None, 1, 1, 128) 0 convolution_3[0][0]
__________________________________________________________________________________________________
max_pooling_4 (MaxPooling2D) (None, 1, 1, 128) 0 convolution_4[0][0]
__________________________________________________________________________________________________
max_pooling_5 (MaxPooling2D) (None, 1, 1, 128) 0 convolution_5[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 1, 1, 384) 0 max_pooling_3[0][0]
max_pooling_4[0][0]
max_pooling_5[0][0]
__________________________________________________________________________________________________
flatten (Flatten) (None, 384) 0 concatenate[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 384) 0 flatten[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 18) 6930 dropout[0][0]
==================================================================================================
Total params: 2,538,130
Trainable params: 2,538,130
Non-trainable params: 0
__________________________________________________________________________________________________
Model parameters
- Padding size: 128
- Embedding size: 512
- Num channel: 1
- Filter size: [3, 4, 5]
- Num filters: 128
- Dropout rate: 0.5
- Regularizers lambda: 0.01
- Batch size: 64
- Epochs: 10
- Fraction validation: 0.05 (1018 samples)
- Total parameters: 2,538,130
Run
Train result
Use 20351 samples after 10 epochs:
Loss | Accuracy | Val loss | Val accuracy |
---|---|---|---|
0.1609 | 0.9683 | 0.3648 | 0.9185 |
Test result
Use 2261 samples:
Accuracy | Macro-Precision | Macro-Recall | Macro-F1 |
---|---|---|---|
0.9363 | 0.9428 | 0.9310 | 0.9360 |
Images
Accuracy
Loss
Confusion matrix
Usage
usage: train.py [-h] [-t TEST_SAMPLE_PERCENTAGE] [-p PADDING_SIZE]
[-e EMBED_SIZE] [-f FILTER_SIZES] [-n NUM_FILTERS]
[-d DROPOUT_RATE] [-c NUM_CLASSES] [-l REGULARIZERS_LAMBDA]
[-b BATCH_SIZE] [--epochs EPOCHS]
[--fraction_validation FRACTION_VALIDATION]
[--results_dir RESULTS_DIR]
This is the TextCNN train project.
optional arguments:
-h, --help show this help message and exit
-t TEST_SAMPLE_PERCENTAGE, --test_sample_percentage TEST_SAMPLE_PERCENTAGE
The fraction of test data.(default=0.1)
-p PADDING_SIZE, --padding_size PADDING_SIZE
Padding size of sentences.(default=128)
-e EMBED_SIZE, --embed_size EMBED_SIZE
Word embedding size.(default=512)
-f FILTER_SIZES, --filter_sizes FILTER_SIZES
Convolution kernel sizes.(default=3,4,5)
-n NUM_FILTERS, --num_filters NUM_FILTERS
Number of each convolution kernel.(default=128)
-d DROPOUT_RATE, --dropout_rate DROPOUT_RATE
Dropout rate in softmax layer.(default=0.5)
-c NUM_CLASSES, --num_classes NUM_CLASSES
Number of target classes.(default=18)
-l REGULARIZERS_LAMBDA, --regularizers_lambda REGULARIZERS_LAMBDA
L2 regulation parameter.(default=0.01)
-b BATCH_SIZE, --batch_size BATCH_SIZE
Mini-Batch size.(default=64)
--epochs EPOCHS Number of epochs.(default=10)
--fraction_validation FRACTION_VALIDATION
The fraction of validation.(default=0.05)
--results_dir RESULTS_DIR
The results dir including log, model, vocabulary and
some images.(default=./results/)
usage: test.py [-h] [-p PADDING_SIZE] [-c NUM_CLASSES] results_dir
This is the TextCNN test project.
positional arguments:
results_dir The results dir including log, model, vocabulary and
some images.
optional arguments:
-h, --help show this help message and exit
-p PADDING_SIZE, --padding_size PADDING_SIZE
Padding size of sentences.(default=128)
-c NUM_CLASSES, --num_classes NUM_CLASSES
Number of target classes.(default=18)
You need to know...
- You need to alter
load_data_and_write_to_file
function indata_helper.py
to match you data file; - This code used single channel input, you can use two channels from embedding vector, one is static and the other is dynamic. Maybe it is greater;
- The model is saved by
hdf5
file; - Tensorboard is available.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].