All Projects → Doragd → Text-Classification-PyTorch

Doragd / Text-Classification-PyTorch

Licence: other
Implementation of papers for text classification task on SST-1/SST-2

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Text-Classification-PyTorch

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-57.89%)
Mutual labels:  text-classification, sentiment-classification
Text Classification Pytorch
Text classification using deep learning models in Pytorch
Stars: ✭ 683 (+1098.25%)
Mutual labels:  text-classification, sentiment-classification
nsmc-zeppelin-notebook
Movie review dataset Word2Vec & sentiment classification Zeppelin notebook
Stars: ✭ 26 (-54.39%)
Mutual labels:  text-classification, sentiment-classification
NewsMTSC
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (-5.26%)
Mutual labels:  text-classification, sentiment-classification
Sentiment-analysis-amazon-Products-Reviews
NLP with NLTK for Sentiment analysis amazon Products Reviews
Stars: ✭ 37 (-35.09%)
Mutual labels:  text-classification, sentiment-classification
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-73.68%)
Mutual labels:  text-classification, sentiment-classification
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+15.79%)
Mutual labels:  text-classification, sentiment-classification
cnn-text-classification
Text classification with Convolution Neural Networks on Yelp, IMDB & sentence polarity dataset v1.0
Stars: ✭ 108 (+89.47%)
Mutual labels:  text-classification, sentiment-classification
Nlp Tutorial
A list of NLP(Natural Language Processing) tutorials
Stars: ✭ 1,188 (+1984.21%)
Mutual labels:  text-classification, sentiment-classification
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-40.35%)
Mutual labels:  text-classification, sentiment-classification
Text classification
all kinds of text classification models and more with deep learning
Stars: ✭ 7,179 (+12494.74%)
Mutual labels:  text-classification, textcnn
Context
ConText v4: Neural networks for text categorization
Stars: ✭ 120 (+110.53%)
Mutual labels:  text-classification, sentiment-classification
Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (+71.93%)
Mutual labels:  text-classification, sentiment-classification
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (+117.54%)
Mutual labels:  text-classification, sentiment-classification
Fancy Nlp
NLP for human. A fast and easy-to-use natural language processing (NLP) toolkit, satisfying your imagination about NLP.
Stars: ✭ 233 (+308.77%)
Mutual labels:  text-classification
clustext
Easy, fast clustering of texts
Stars: ✭ 18 (-68.42%)
Mutual labels:  text-classification
Pytorch Transformers Classification
Based on the Pytorch-Transformers library by HuggingFace. To be used as a starting point for employing Transformer models in text classification tasks. Contains code to easily train BERT, XLNet, RoBERTa, and XLM models for text classification.
Stars: ✭ 229 (+301.75%)
Mutual labels:  text-classification
Catalyst
Accelerated deep learning R&D
Stars: ✭ 2,804 (+4819.3%)
Mutual labels:  text-classification
overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
Stars: ✭ 41 (-28.07%)
Mutual labels:  text-classification
SharkStock
Automate swing trading using deep reinforcement learning. The deep deterministic policy gradient-based neural network model trains to choose an action to sell, buy, or hold the stocks to maximize the gain in asset value. The paper also acknowledges the need for a system that predicts the trend in stock value to work along with the reinforcement …
Stars: ✭ 63 (+10.53%)
Mutual labels:  sentiment-classification

Text-Classification-PyTorch 🐋

Here is a new boy 🙇 who wants to become a NLPer and his repository for Text Classification. Besides TextCNN and TextAttnBiLSTM, more models will be added in the near future.

Thanks for you Star, Fork and Watch!

Dataset

  • Stanford Sentiment Treebank(SST)
    • SST-1: 5 classes(fine-grained), SST-2: 2 classes(binary)
  • Preprocess
    • Map sentiment values to labels
    • Remove tokens consisting of all non-alphanumeric characters, such as ...

Pre-trained Word Vectors

  • Word2Vec : GoogleNews-vectors-negative300.bin
  • GloVe : glove.840B.300d.txt
    • Because the OOV Rate of GloVe is lower than Word2Vec and the experiment performance is also better than the other one, we use GloVe as pre-trained word vectors.
    • Options for different format word vectors are still preserved in the code.

Model

Result

  • Baseline from the paper
model SST-1 SST-2
CNN-rand 45.0 82.7
CNN-static 45.5 86.8
CNN-non-static 48.0 87.2
CNN-multichannel 47.4 88.1
  • Re-Implementation
model SST-1 SST-2
CNN-rand 34.841 74.500
CNN-static 45.056 84.125
CNN-non-static 46.974 85.886
CNN-multichannel 45.129 85.993
Attention + BiLSTM 47.015 85.632
Attention + BiGRU 47.854 85.102

Requirement

Please install the following library requirements first.

pandas==0.24.2
torch==1.1.0
fire==0.1.3
numpy==1.16.2
gensim==3.7.3

Structure

│  .gitignoreconfig.py            # Global Configurationdatasets.py          # Create Dataloadermain.pypreprocess.pyREADME.mdrequirements.txtutils.py   
│  
├─checkpoints           # Save checkpoint and best model
│      
├─data                  # pretrained word vectors and datasets
│  │  glove.6B.300d.txt
│  │  GoogleNews-vectors-negative300.bin
│  └─stanfordSentimentTreebank # datasets folder
│          
├─modelsTextAttnBiLSTM.pyTextCNN.py__init__.py
│      
└─output_data           # Preprocessed data and vocabulary, etc.

Usage

  • Set global configuration parameters in config.py

  • Preprocess the datasets

$python preprocess.py
  • Train
$python main.py run

You can set the parameters in the config.py and models/TextCNN.py or models/TextAttnBiLSTM.py in the command line.

$python main.py run [--option=VALUE]

For example,

$python main.py run --status='train' --use_model="TextAttnBiLSTM"
  • Test
$python main.py run --status='test' --best_model="checkpoints/BEST_checkpoint_SST-2_TextCNN.pth"

Conclusion

  • The TextCNN model uses the n-gram-like convolution kernel extraction feature, while the TextAttnBiLSTM model uses BiLSTM to capture semantics and long-term dependencies, combined with the attention mechanism for classification.
  • TextCNN Parameter tuning:
    • glove is better than word2vec
    • Use a smaller batch size
    • Add weight decay ($l_2$ constraint), learning rate decay, early stop, etc.
    • Do not set padding_idx=0 in embedding layer
  • TextAttnBiLSTM
    • Apply dropout on embedding layer, LSTM layer, and fully-connected layer

Acknowledge

Reference

[1] Convolutional Neural Networks for Sentence Classification

[2] A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification

[3] Attention-Based Bidirection LSTM for Text Classification

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].