All Projects → abdulfatir → Twitter Sentiment Analysis

abdulfatir / Twitter Sentiment Analysis

Licence: mit
Sentiment analysis on tweets using Naive Bayes, SVM, CNN, LSTM, etc.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Twitter Sentiment Analysis

Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+228.12%)
Mutual labels:  cnn, lstm, sentiment-analysis, sentiment-classification
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-96.73%)
Mutual labels:  sentiment-analysis, lstm, deeplearning, sentiment-classification
Keras basic
keras를 이용한 딥러닝 기초 학습
Stars: ✭ 39 (-96.01%)
Mutual labels:  cnn, deeplearning, lstm
Context
ConText v4: Neural networks for text categorization
Stars: ✭ 120 (-87.73%)
Mutual labels:  lstm, sentiment-analysis, sentiment-classification
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (-81.19%)
Mutual labels:  deeplearning, lstm, sentiment-analysis
Neural Networks
All about Neural Networks!
Stars: ✭ 34 (-96.52%)
Mutual labels:  cnn, lstm, sentiment-analysis
Multimodal Sentiment Analysis
Attention-based multimodal fusion for sentiment analysis
Stars: ✭ 172 (-82.41%)
Mutual labels:  lstm, sentiment-analysis, sentiment-classification
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-96.52%)
Mutual labels:  lstm, sentiment-analysis, sentiment-classification
Cs291k
🎭 Sentiment Analysis of Twitter data using combined CNN and LSTM Neural Network models
Stars: ✭ 287 (-70.65%)
Mutual labels:  cnn, lstm, sentiment-analysis
Personality Detection
Implementation of a hierarchical CNN based model to detect Big Five personality traits
Stars: ✭ 338 (-65.44%)
Mutual labels:  cnn, lstm, sentiment-analysis
Text Classification
Implementation of papers for text classification task on DBpedia
Stars: ✭ 682 (-30.27%)
Mutual labels:  cnn, lstm
Conv Emotion
This repo contains implementation of different architectures for emotion recognition in conversations.
Stars: ✭ 646 (-33.95%)
Mutual labels:  lstm, sentiment-analysis
Multi Class Text Classification Cnn Rnn
Classify Kaggle San Francisco Crime Description into 39 classes. Build the model with CNN, RNN (GRU and LSTM) and Word Embeddings on Tensorflow.
Stars: ✭ 570 (-41.72%)
Mutual labels:  cnn, lstm
Video Classification
Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
Stars: ✭ 543 (-44.48%)
Mutual labels:  cnn, lstm
Qa Rankit
QA - Answer Selection (Rank candidate answers for a given question)
Stars: ✭ 30 (-96.93%)
Mutual labels:  cnn, lstm
Lstm Char Cnn Tensorflow
in progress
Stars: ✭ 737 (-24.64%)
Mutual labels:  cnn, lstm
Getting Things Done With Pytorch
Jupyter Notebook tutorials on solving real-world problems with Machine Learning & Deep Learning using PyTorch. Topics: Face detection with Detectron 2, Time Series anomaly detection with LSTM Autoencoders, Object Detection with YOLO v5, Build your first Neural Network, Time Series forecasting for Coronavirus daily cases, Sentiment Analysis with BERT.
Stars: ✭ 738 (-24.54%)
Mutual labels:  lstm, sentiment-analysis
Tensorflow Tutorial
Some interesting TensorFlow tutorials for beginners.
Stars: ✭ 893 (-8.69%)
Mutual labels:  cnn, lstm
Deep Music Genre Classification
🎵 Using Deep Learning to Categorize Music as Time Progresses Through Spectrogram Analysis
Stars: ✭ 23 (-97.65%)
Mutual labels:  cnn, lstm
Deeplearning
深度学习入门教程, 优秀文章, Deep Learning Tutorial
Stars: ✭ 6,783 (+593.56%)
Mutual labels:  cnn, deeplearning

Sentiment Analysis on Tweets

Status badge

Update(21 Sept. 2018): I don't actively maintain this repository. This work was done for a course project and the dataset cannot be released because I don't own the copyright. However, everything in this repository can be easily modified to work with other datasets. I recommend reading the sloppily written project report for this project which can be found in docs/.

Dataset Information

We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type tweet_id,sentiment,tweet where the tweet_id is a unique integer identifying the tweet, sentiment is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.

Requirements

There are some general library requirements for the project and some which are specific to individual methods. The general requirements are as follows.

  • numpy
  • scikit-learn
  • scipy
  • nltk

The library requirements specific to some methods are:

  • keras with TensorFlow backend for Logistic Regression, MLP, RNN (LSTM), and CNN.
  • xgboost for XGBoost.

Note: It is recommended to use Anaconda distribution of Python.

Usage

Preprocessing

  1. Run preprocess.py <raw-csv-path> on both train and test data. This will generate a preprocessed version of the dataset.
  2. Run stats.py <preprocessed-csv-path> where <preprocessed-csv-path> is the path of csv generated from preprocess.py. This gives general statistical information about the dataset and will two pickle files which are the frequency distribution of unigrams and bigrams in the training dataset.

After the above steps, you should have four files in total: <preprocessed-train-csv>, <preprocessed-test-csv>, <freqdist>, and <freqdist-bi> which are preprocessed train dataset, preprocessed test dataset, frequency distribution of unigrams and frequency distribution of bigrams respectively.

For all the methods that follow, change the values of TRAIN_PROCESSED_FILE, TEST_PROCESSED_FILE, FREQ_DIST_FILE, and BI_FREQ_DIST_FILE to your own paths in the respective files. Wherever applicable, values of USE_BIGRAMS and FEAT_TYPE can be changed to obtain results using different types of features as described in report.

Baseline

  1. Run baseline.py. With TRAIN = True it will show the accuracy results on training dataset.

Naive Bayes

  1. Run naivebayes.py. With TRAIN = True it will show the accuracy results on 10% validation dataset.

Maximum Entropy

  1. Run logistic.py to run logistic regression model OR run maxent-nltk.py <> to run MaxEnt model of NLTK. With TRAIN = True it will show the accuracy results on 10% validation dataset.

Decision Tree

  1. Run decisiontree.py. With TRAIN = True it will show the accuracy results on 10% validation dataset.

Random Forest

  1. Run randomforest.py. With TRAIN = True it will show the accuracy results on 10% validation dataset.

XGBoost

  1. Run xgboost.py. With TRAIN = True it will show the accuracy results on 10% validation dataset.

SVM

  1. Run svm.py. With TRAIN = True it will show the accuracy results on 10% validation dataset.

Multi-Layer Perceptron

  1. Run neuralnet.py. Will validate using 10% data and save the best model to best_mlp_model.h5.

Reccurent Neural Networks

  1. Run lstm.py. Will validate using 10% data and save models for each epock in ./models/. (Please make sure this directory exists before running lstm.py).

Convolutional Neural Networks

  1. Run cnn.py. This will run the 4-Conv-NN (4 conv layers neural network) model as described in the report. To run other versions of CNN, just comment or remove the lines where Conv layers are added. Will validate using 10% data and save models for each epoch in ./models/. (Please make sure this directory exists before running cnn.py).

Majority Vote Ensemble

  1. To extract penultimate layer features for the training dataset, run extract-cnn-feats.py <saved-model>. This will generate 3 files, train-feats.npy, train-labels.txt and test-feats.npy.
  2. Run cnn-feats-svm.py which uses files from the previous step to perform SVM classification on features extracted from CNN model.
  3. Place all prediction CSV files for which you want to take majority vote in ./results/ and run majority-voting.py. This will generate majority-voting.csv.

Information about other files

  • dataset/positive-words.txt: List of positive words.
  • dataset/negative-words.txt: List of negative words.
  • dataset/glove-seeds.txt: GloVe words vectors from StanfordNLP which match our dataset for seeding word embeddings.
  • Plots.ipynb: IPython notebook used to generate plots present in report.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].