All Projects → akurniawan → pytorch-sentiment-analysis

akurniawan / pytorch-sentiment-analysis

Licence: other
char-rnn implementation for sentiment analysis on twitter data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pytorch-sentiment-analysis

awesome-text-classification
Text classification meets word embeddings.
Stars: ✭ 27 (-15.62%)
Mutual labels:  sentiment-analysis
Real Time DataMining Software
携程/榛果民宿实时评论挖掘软件,包含数据的实时采集/数据清洗/结构化保存/ UGC 数据主题提取/情感分析/后结构化可视化等技术的综合性演示 Demo。基于在线民宿 UGC 数据的意见挖掘项目,包含数据挖掘和 NLP 相关的处理,负责数据采集、主题抽取、情感分析等任务。主要克服用户打分和评论不一致,实时对携程和美团在线民宿的满意度进行评测以及对额外数据进行可视化的综合性工具,多维度的对在线 UGC 进行数据挖掘并可视化,demo 视频演示见链接。
Stars: ✭ 43 (+34.38%)
Mutual labels:  sentiment-analysis
Dataset-Sentimen-Analisis-Bahasa-Indonesia
Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…
Stars: ✭ 38 (+18.75%)
Mutual labels:  sentiment-analysis
sentimentAnalysisLab
This lab is about how to add the AI and ML cloud service feature to your web application with React and the Amplify Framework.
Stars: ✭ 78 (+143.75%)
Mutual labels:  sentiment-analysis
pandas twitter
Analyzing Trump's tweets using Python (Pandas + Twitter workshop)
Stars: ✭ 81 (+153.13%)
Mutual labels:  sentiment-analysis
SentimentAnalysis
基于新浪微博数据的情感极性分析
Stars: ✭ 43 (+34.38%)
Mutual labels:  sentiment-analysis
arabic-sentiment-analysis
Sentiment Analysis in Arabic tweets
Stars: ✭ 64 (+100%)
Mutual labels:  sentiment-analysis
rosette-elasticsearch-plugin
Document Enrichment plugin for Elasticsearch
Stars: ✭ 25 (-21.87%)
Mutual labels:  sentiment-analysis
soroka
Узнай, хорошо или плохо говорят о тебе или твоей фирме в Интернете! Наша "Сорока" с искусственным интеллектом принесёт тебе это на своём хвосте.
Stars: ✭ 16 (-50%)
Mutual labels:  sentiment-analysis
AirBnbPricePrediction
Training and Testing a Set of Machine Learning/Deep Learning Models to Predict Airbnb Prices for NYC
Stars: ✭ 47 (+46.88%)
Mutual labels:  sentiment-analysis
Emotion and Polarity SO
An emotion classifier of text containing technical content from the SE domain
Stars: ✭ 74 (+131.25%)
Mutual labels:  sentiment-analysis
twitter-sentiment-analysis
Streaming tweets with spark, language detection & sentiment analysis, dashboard with Kibana
Stars: ✭ 100 (+212.5%)
Mutual labels:  sentiment-analysis
Text tone analyzer
Система, анализирующая тональность текстов и высказываний.
Stars: ✭ 15 (-53.12%)
Mutual labels:  sentiment-analysis
TLA
A comprehensive tool for linguistic analysis of communities
Stars: ✭ 47 (+46.88%)
Mutual labels:  sentiment-analysis
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-25%)
Mutual labels:  sentiment-analysis
tf-sentiment-docker
A docker image for sentiment analysis on tensorflow
Stars: ✭ 15 (-53.12%)
Mutual labels:  sentiment-analysis
sentiment-analysis-imdb
This is a classifier focused on sentiment analysis of movie reviews
Stars: ✭ 11 (-65.62%)
Mutual labels:  sentiment-analysis
athena
Opinion mining
Stars: ✭ 25 (-21.87%)
Mutual labels:  sentiment-analysis
char-cnn-text-classification-tensorflow
Simple Convolutional Neural Network (CNN) for sentiment classification of Chinese movie reviews.
Stars: ✭ 55 (+71.88%)
Mutual labels:  sentiment-analysis
amazon-reviews
Sentiment Analysis & Topic Modeling with Amazon Reviews
Stars: ✭ 26 (-18.75%)
Mutual labels:  sentiment-analysis

pytorch-rnn-sentiment-analysis

Description

Just assume this is my toy for learning pytorch for the first time (it's easy and definitely awesome!). In this repo you can find the implementation of both char-rnn and word-rnn to do sentiment analysis based on twitter data.

Not only sentiment analysis, you can also use this project as a sentence classification with multiple classes. Just put your class ids on the csv and you're good to go!

Implementation Details

  1. You can choose between LSTM and CNN-LSTM for the character decoder
  2. Each batches will be grouped in respect of their lengths

Implementation Limitations

  1. For current implementation, the dataset is set to tokenize the input based on characters. There is still no way to update the tokenization via config.
  2. Still no way to update RNN cell from config.
  3. Still no way to update optimizer from config.

How to run?

  1. Install pytorch
  2. Run pip install -r requirements.txt

Run python run.py with the following options

optional arguments:
  -h, --help            show this help message and exit
  --epochs EPOCHS       Number of epochs
  --dataset DATASET     Path for your training, validation and test dataset.
                        As this package uses torch text to load the data,
                        please follow the format by providing the path and
                        filename without its extension
  --batch_size BATCH_SIZE
                        The number of batch size for every step
  --log_interval LOG_INTERVAL
  --save_interval SAVE_INTERVAL
  --validation_interval VALIDATION_INTERVAL
  --char_level CHAR_LEVEL
                        Whether to use the model with character level or word
                        level embedding. Specify the option if you want to use
                        character level embedding
  --model_config MODEL_CONFIG
                        Location of model config
  --model_dir MODEL_DIR
                        Location to save the model

This is the example of how you can run it

python run.py --model_config config/cnn_rnn.yml --epochs 50 --model_dir models --dataset data/sentiment

Dataset

You can download the raw data from [1]. It contains 1,578,627 classified tweets, each row is classified as 1 for positive sentiment and 0 for negative sentiment. Kudos to [2] for providing the link to the data!. However, the data provided by [1] have 4 columns, while on this code we only need the text and the sentiment only, you can convert the data first by grabbing the first and the last columns before feeding into the algorithm. For the alternative, you can also download the data from [4], this contains the same number of data as the original, but I have already cleaned it up a bit and you can run the code without any further modification.

Want to run with your own data? No problem, create csv files for training and testing with two columns, the first one being the sentiment and the second being the text. Don't forget to use the same name for both files and differentiate it with suffix .train and .test.

Reference

[1] http://thinknook.com/wp-content/uploads/2012/09/Sentiment-Analysis-Dataset.zip
[2] http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/
[3] https://karpathy.github.io/2015/05/21/rnn-effectiveness/
[4] https://drive.google.com/file/d/1-1QNrYebNxge9vMP7YJJceehQkZtqAcO

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].