All Projects → danielegrattarola → Twitter Sentiment Cnn

danielegrattarola / Twitter Sentiment Cnn

An implementation in TensorFlow of a convolutional neural network (CNN) to perform sentiment classification on tweets.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Twitter Sentiment Cnn

Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (-31.47%)
Mutual labels:  sentiment-classification, twitter
hashformers
Hashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-87.41%)
Mutual labels:  twitter, sentiment-classification
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-13.29%)
Mutual labels:  sentiment-classification
Craft Seomatic
SEOmatic facilitates modern SEO best practices & implementation for Craft CMS 3. It is a turnkey SEO system that is comprehensive, powerful, and flexible.
Stars: ✭ 135 (-5.59%)
Mutual labels:  twitter
Chirp
🐦 A cross platform twitter application
Stars: ✭ 129 (-9.79%)
Mutual labels:  twitter
Subrake
A Subdomain Enumeration and Validation tool for Bug Bounty and Pentesters.
Stars: ✭ 125 (-12.59%)
Mutual labels:  twitter
Tweetscape
A WebVR experience displaying tweets in real-time along a 3D timeline
Stars: ✭ 132 (-7.69%)
Mutual labels:  twitter
Sharer.js
🔛 🔖 Create your own social share buttons. No jquery.
Stars: ✭ 1,624 (+1035.66%)
Mutual labels:  twitter
Puppeteer Social Image
Create dynamic social share images using HTML + CSS via puppeteer 🎁
Stars: ✭ 141 (-1.4%)
Mutual labels:  twitter
Account Activity Dashboard
Sample web app and helper scripts to get started with the premium Account Activity API
Stars: ✭ 129 (-9.79%)
Mutual labels:  twitter
Talon Twitter Holo
[Deprecated] The Holo version of my popular Android Talon for Twitter app, 100% open-source
Stars: ✭ 1,655 (+1057.34%)
Mutual labels:  twitter
Real Time Sentiment Tracking On Twitter For Brand Improvement And Trend Recognition
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
Stars: ✭ 127 (-11.19%)
Mutual labels:  twitter
Tsuru
desktop client.
Stars: ✭ 126 (-11.89%)
Mutual labels:  twitter
Twitter Sentiment Visualisation
🌍 The R&D of a sentiment analysis module, and the implementation of it on real-time social media data, to generate a series of live visual representations of sentiment towards a specific topic or by location in order to find trends.
Stars: ✭ 132 (-7.69%)
Mutual labels:  twitter
Absa keras
Keras Implementation of Aspect based Sentiment Analysis
Stars: ✭ 126 (-11.89%)
Mutual labels:  sentiment-classification
Tradingview Webhook Bot
⚙️ Send TradingView alerts to Telegram, Discord, Slack, Twitter and/or Email.
Stars: ✭ 135 (-5.59%)
Mutual labels:  twitter
Twurl
OAuth-enabled curl for the Twitter API
Stars: ✭ 1,648 (+1052.45%)
Mutual labels:  twitter
R
Sentiment analysis and visualization of real-time tweets using R
Stars: ✭ 127 (-11.19%)
Mutual labels:  twitter
Tweetduck
A Windows Client for TweetDeck. Not affiliated with Twitter.
Stars: ✭ 128 (-10.49%)
Mutual labels:  twitter
Coinflict Of Interest
Browser extension to show user biases on Crypto Twitter.
Stars: ✭ 142 (-0.7%)
Mutual labels:  twitter

Twitter sentiment classification by Daniele Grattarola

This is a TensorFlow implementation of a convolutional neural network (CNN) to perform sentiment classification on tweets.

This code is meant to have an educational value, to train the model by yourself and play with different configurations, and was not developed to be deployed as-is (although it has been used in professional contexts). The dataset used for training is taken from here (someone reported to me that the link to the dataset appears to be dead sometimes, so dataset_downloader.py might not work. I successfully ran the script on January 20, 2018, but please report it to me if you have any problems).

NOTE: this script is for Python 2.7 only

Setup

You'll need Tensorflow >=1.1.0 and its dependecies installed in order for the script to work (see here).

Once you've installed and configured Tensorflow, download the source files and cd into the folder:

$ git clone https://gitlab.com/danielegrattarola/twitter-sentiment-cnn.git
$ cd twitter-sentiment-cnn

Before being able to use the script, some setup is needed; download the dataset from the link above by running:

$ python dataset_downloader.py

Read the dataset from the CSV into two files (.pos and .neg) with:

$ python csv_parser.py

And generate a CSV with the vocabulary (and its inverse mapping) with:

$ python vocab_builder.py

The files will be created in the twitter-sentiment-dataset/ folder. Finally, create an output/ folder that will contain all session checkpoints needed to restore the trained models:

mkdir output

Now everything is set up and you're ready to start training the model.

Usage

The simplest way to run the script is:

$ python twitter-sentiment-cnn.py

which will load the dataset in memory, create the computation graph, and quit. Try to run the script like this to see if everything is set up correctly. To run a training session on the full dataset (and save the result so that we can reuse the network later, or perform more training) run:

python twitter-sentiment-cnn.py --train --save

After training, we can test the network as follows:

$ python twitter-sentiment-cnn.py --load path/to/ckpt/folder/ --custom_input 'I love neural networks!'

which will eventually output:

...
Processing custom input: I love neural networks!
Custom input evaluation: POS
Actual output: [ 0.19249919  0.80750078]
...

By running:

$ python twitter-sentiment-cnn.py -h

the script will output a list of all customizable flags and parameters. The parameters are:

  • train: train the network;
  • save: save session checkpoints;
  • save_protobuf: save model as binary protobuf;
  • evaluate_batch: evaluate the network on a held-out batch from the dataset and print the results (for debugging/educational purposes);
  • load: restore a model from the given path;
  • custom_input: evaluate the model on the given string;
  • filter_sizes: comma-separated filter sizes for the convolutional layers (default: '3,4,5');
  • dataset_fraction: fraction of the dataset to load in memory, to reduce memory usage (default: 1.0; uses all dataset);
  • embedding_size: size of the word embeddings (default: 128);
  • num_filters: number of filters per filter size (default: 128);
  • batch_size: batch size (default: 128);
  • epochs: number of training epochs (default: 3);
  • valid_freq: how many times per epoch to perform validation testing (default: 1);
  • checkpoint_freq: how many times per epoch to save the model (default: 1);
  • test_data_ratio: fraction of the dataset to use for validation (default: 0.1);
  • device: device to use for running the model (can be either 'cpu' or 'gpu').

Pre-trained model

User @Horkyze kindly trained the model for three epochs on the full dataset and shared the summary folder for quick deploy. The folder is available on Mega, to load the model simply unpack the zip file and use the --load flag as follows:

# Current directoty: twitter-sentiment-cnn/
$ unzip path/to/run20180201-231509.zip
$ python twitter-sentiment-cnn.py --load path/to/run20180201-231509/ --custom_input "I love neural networks!"

Running this command should give you something like:

======================= START! ========================
	data_helpers: loading positive examples...
	data_helpers: [OK]
	data_helpers: loading negative examples...
	data_helpers: [OK]
	data_helpers: cleaning strings...
	data_helpers: [OK]
	data_helpers: generating labels...
	data_helpers: [OK]
	data_helpers: concatenating labels...
	data_helpers: [OK]
	data_helpers: padding strings...
	data_helpers: [OK]
	data_helpers: building vocabulary...
	data_helpers: [OK]
	data_helpers: building processed datasets...
	data_helpers: [OK]

Flags:
	batch_size = 128
	checkpoint_freq = 1
	custom_input = I love neural networks!
	dataset_fraction = 0.001
	device = cpu
	embedding_size = 128
	epochs = 3
	evaluate_batch = False
	filter_sizes = 3,4,5
	load = output/run20180201-231509/
	num_filters = 128
	save = False
	save_protobuf = False
	test_data_ratio = 0.1
	train = False
	valid_freq = 1

Dataset:
	Train set size = 1421
	Test set size = 157
	Vocabulary size = 274562
	Input layer size = 36
	Number of classes = 2

Output folder: /home/phait/dev/twitter-sentiment-cnn/output/run20180208-112402
Data processing OK, loading network...
Evaluating custom input: I love neural networks!
Custom input evaluation: POS
Actual output: [0.04109644 0.95890355]

NOTE: loading this model won't work if you change anything in the default network architecture, so don't set the --filter_sizes flag.

According to the log.log file provided by @Horkyze, the model had a final validation accuracy of 0.80976, and a validation loss of 53.3314.

I sincerely thank @Horkyze for providing the computational power and sharing the model with me.

Model description

The network implemented in this script is a single layer CNN structured as follows:

  • Embedding layer: takes as input the tweets (as strings) and maps each word to an n-dimensional space so that it is represented as a sparse vector (see word2vec).
  • Convolution layers: a set of parallel 1D convolutional layers with the given filter sizes and 128 output channels. A filter's size is the number of embedded words that the filter covers.
  • Pooling layers: a set of pooling layers associated to each of the convolutional layers.
  • Concat layer: concatenates the output of the different pooling layers into a single tensor.
  • Dropout layer: performs neuron dropout (some neurons are randomly not considered during training).
  • Output layer: fully connected layer with a softmax activation function to perform classification.

The script will automatically log the session with Tensorboard. To visualize the computation graph and training metrics run:

$ tensorboard --logdir output/path/to/summaries/

and then navigate to localhost:6006 from your browser (you'll see the computation graph in the Graph section).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].