All Projects → changhuixu → LSTM-sentiment-analysis

changhuixu / LSTM-sentiment-analysis

Licence: other
LSTM sentiment analysis. Please look at my another repo for SVM and Naive algorithem

Programming Languages

python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to LSTM-sentiment-analysis

Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (+805.26%)
Mutual labels:  sentiment-analysis, lstm
Aspect-Based-Sentiment-Analysis
A python program that implements Aspect Based Sentiment Analysis classification system for SemEval 2016 Dataset.
Stars: ✭ 57 (+200%)
Mutual labels:  review, sentiment-analysis
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+868.42%)
Mutual labels:  sentiment-analysis, lstm
Rnn Text Classification Tf
Tensorflow Implementation of Recurrent Neural Network (Vanilla, LSTM, GRU) for Text Classification
Stars: ✭ 114 (+500%)
Mutual labels:  sentiment-analysis, lstm
Senti4SD
An emotion-polarity classifier specifically trained on developers' communication channels
Stars: ✭ 41 (+115.79%)
Mutual labels:  sentiment-analysis, sentiment
Context
ConText v4: Neural networks for text categorization
Stars: ✭ 120 (+531.58%)
Mutual labels:  sentiment-analysis, lstm
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+16789.47%)
Mutual labels:  sentiment-analysis, lstm
Contextual Utterance Level Multimodal Sentiment Analysis
Context-Dependent Sentiment Analysis in User-Generated Videos
Stars: ✭ 86 (+352.63%)
Mutual labels:  sentiment-analysis, lstm
chronist
Long-term analysis of emotion, age, and sentiment using Lifeslice and text records.
Stars: ✭ 23 (+21.05%)
Mutual labels:  sentiment-analysis, sentiment
Whatsapp-analytics
performing sentiment analysis on the whatsapp chats.
Stars: ✭ 20 (+5.26%)
Mutual labels:  sentiment-analysis, sentiment
Stock Market Prediction Web App Using Machine Learning And Sentiment Analysis
Stock Market Prediction Web App based on Machine Learning and Sentiment Analysis of Tweets (API keys included in code). The front end of the Web App is based on Flask and Wordpress. The App forecasts stock prices of the next seven days for any given stock under NASDAQ or NSE as input by the user. Predictions are made using three algorithms: ARIMA, LSTM, Linear Regression. The Web App combines the predicted prices of the next seven days with the sentiment analysis of tweets to give recommendation whether the price is going to rise or fall
Stars: ✭ 101 (+431.58%)
Mutual labels:  sentiment-analysis, lstm
sentiment analysis dict
sentiment analysis、情感分析、文本分类、基于字典、python、classification
Stars: ✭ 111 (+484.21%)
Mutual labels:  sentiment-analysis, dictionary
Twitter Sentiment Analysis
This script can tell you the sentiments of people regarding to any events happening in the world by analyzing tweets related to that event
Stars: ✭ 94 (+394.74%)
Mutual labels:  sentiment-analysis, sentiment
Amazon Product Recommender System
Sentiment analysis on Amazon Review Dataset available at http://snap.stanford.edu/data/web-Amazon.html
Stars: ✭ 158 (+731.58%)
Mutual labels:  sentiment-analysis, lstm
Pytreebank
😡😇 Stanford Sentiment Treebank loader in Python
Stars: ✭ 93 (+389.47%)
Mutual labels:  sentiment-analysis, sentiment
Sentiment
AFINN-based sentiment analysis for Node.js.
Stars: ✭ 2,469 (+12894.74%)
Mutual labels:  sentiment-analysis, sentiment
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+5857.89%)
Mutual labels:  sentiment-analysis, sentiment
Dialogue Understanding
This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
Stars: ✭ 77 (+305.26%)
Mutual labels:  sentiment-analysis, lstm
Sa Papers
📄 Deep Learning 中 Sentiment Analysis 論文統整與分析 😀😡☹️😭🙄🤢
Stars: ✭ 111 (+484.21%)
Mutual labels:  review, sentiment-analysis
applytics
Perform Sentiment Analysis on reviews of your apps
Stars: ✭ 21 (+10.53%)
Mutual labels:  review, sentiment-analysis

LSTM-sentiment-analysis

Due to computationly intensive of LSTM method, we only use two LSTM layes in our classifcation model. These two LSTM layes are bidirectional, which include a forwads LSTM and a backwards LSTM.

Feature extraction was done by reading all training reviews and tokenizing all english words, as well as removing stop words using nltk package.

Training in LSTM RNN contains two steps. First, run the neural network going forward. This sets the cell states. Then, you go backwards computing derivatives. This uses the cell states (what the network knows at a given point in time) to figure out how to change the network's weights. When LSTM updates cell states, we choose to use the default Adam optimizer (http://arxiv.org/abs/1412.6980v8), which is a method for Stochastic Optimization. The optimizer minimizes the loss function, which here is the mean square error between expected output and acutal output.

input matrix shape is (number of samples x maxlen)

number_of_samples here is 25000 reviews. All reviews are transform into sequences of word vector.

maxlen is the max length of each sequence. i.e., if a review has more than maxlen words, then this review will be truncated. However, if a review has less than maxlen words, then the sequence will pad 0's to make it a regular shape.

max_features is the dictionary size. The dictionary was created before data feed into LSTM RNN. Dictionary keys are purified words, dictionary values are the indicies, which is from 2 to 90000. Such that, the most frequent word has lowest index value. For those rarely occurred words, their indicies is large. We can use max_features to filter out uncommon words.

First, keeping the max_features = 20000, we tested the effect of maxlen, which varied from 25 to 200.

maxlen time (s) train accuracy test accuracy
25 618 0.9757 0.7589
50 1113 0.9876 0.8047
75 1507 0.9882 0.8243
100 2004 0.9813 0.8410
125 2435 0.9774 0.8384
150 2939 0.9725 0.8503
175 3352 0.9819 0.8359
200 3811 0.9831 0.8514

L1_LSTM The length of sentences are right skewed (Q1:67, Median 92, Q3:152). With squence length of 150, about 75% of reviews are covered. L1_LSTM

Second, keeping the maxlen = 150, we tested the effect of max_features, which varied from 2500 to 50000.

max_features train accuracy test accuracy
250 0.7828 0.7722
500 0.8392 0.8328
1500 0.8806 0.8554
2500 0.9119 0.8536
5000 0.9324 0.8553
10000 0.9664 0.8412
20000 0.9725 0.8503
30000 0.9850 0.8489
40000 0.9854 0.8321
50000 0.9843 0.8257
60000 0.9854 0.8470

L1_LSTM

It is interesting to notice that the most frequently appeared 2500 english words could largely determine the sentiment of movie reviews very well. Britain’s Guardian newspaper, in 1986, estimated the size of the average person’s vocabulary as developing from roughly 300 words at two years old, through 5,000 words at five years old, to some 12,000 words at the age of 12.

Future impovements

Something that could help cut down on extraneous words is pyenchant https://pythonhosted.org/pyenchant/api/enchant.html. Basic idea is to make your input text a list of words, and fix spelling errors (or recorrect words that shouldn't belong).

Useful Links

http://iamtrask.github.io/2015/11/15/anyone-can-code-lstm/

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

https://github.com/dmnelson/sentiment-analysis-imdb

https://github.com/asampat3090/sentiment-dl

https://github.com/wenjiesha/sentiment_lstm

http://blog.csdn.net/zouxy09/article/details/8775518/

http://ir.hit.edu.cn/~dytang/

https://apaszke.github.io/lstm-explained.html

https://github.com/cjhutto/vaderSentiment

http://www.nltk.org/book/

http://deeplearning.net/software/theano/install_windows.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].