All Projects → MirunaPislar → Sarcasm Detection

MirunaPislar / Sarcasm Detection

Licence: mit
Detecting Sarcasm on Twitter using both traditonal machine learning and deep learning techniques.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Sarcasm Detection

Hierarchical Attention Networks
TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"
Stars: ✭ 75 (+2.74%)
Mutual labels:  text-classification, sentiment-analysis, attention-mechanism
Sentiment-analysis-amazon-Products-Reviews
NLP with NLTK for Sentiment analysis amazon Products Reviews
Stars: ✭ 37 (-49.32%)
Mutual labels:  sentiment-analysis, text-classification, lstm-neural-networks
Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (+34.25%)
Mutual labels:  text-classification, sentiment-analysis, twitter
Tensorflow Sentiment Analysis On Amazon Reviews Data
Implementing different RNN models (LSTM,GRU) & Convolution models (Conv1D, Conv2D) on a subset of Amazon Reviews data with TensorFlow on Python 3. A sentiment analysis project.
Stars: ✭ 34 (-53.42%)
Mutual labels:  text-classification, sentiment-analysis, lstm-neural-networks
Deep Atrous Cnn Sentiment
Deep-Atrous-CNN-Text-Network: End-to-end word level model for sentiment analysis and other text classifications
Stars: ✭ 64 (-12.33%)
Mutual labels:  deep-neural-networks, text-classification, sentiment-analysis
Datastories Semeval2017 Task4
Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".
Stars: ✭ 184 (+152.05%)
Mutual labels:  sentiment-analysis, attention-mechanism, twitter
Twitterldatopicmodeling
Uses topic modeling to identify context between follower relationships of Twitter users
Stars: ✭ 48 (-34.25%)
Mutual labels:  topic-modeling, twitter, tweets
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (+39.73%)
Mutual labels:  sentiment-analysis, topic-modeling, twitter
TwEater
A Python Bot for Scraping Conversations from Twitter
Stars: ✭ 16 (-78.08%)
Mutual labels:  twitter, tweets, sentiment-analysis
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-73.97%)
Mutual labels:  sentiment-analysis, attention-mechanism, lstm-neural-networks
Twitter Sentiment Analysis
This script can tell you the sentiments of people regarding to any events happening in the world by analyzing tweets related to that event
Stars: ✭ 94 (+28.77%)
Mutual labels:  sentiment-analysis, twitter, tweets
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+390.41%)
Mutual labels:  text-classification, sentiment-analysis, topic-modeling
Ml Projects
ML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
Stars: ✭ 127 (+73.97%)
Mutual labels:  text-classification, svm, lstm-neural-networks
overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification
NLP tutorial
Stars: ✭ 41 (-43.84%)
Mutual labels:  tweets, sentiment-analysis, text-classification
Twitter Sent Dnn
Deep Neural Network for Sentiment Analysis on Twitter
Stars: ✭ 270 (+269.86%)
Mutual labels:  deep-neural-networks, sentiment-analysis, twitter
Chatbot cn
基于金融-司法领域(兼有闲聊性质)的聊天机器人,其中的主要模块有信息抽取、NLU、NLG、知识图谱等,并且利用Django整合了前端展示,目前已经封装了nlp和kg的restful接口
Stars: ✭ 791 (+983.56%)
Mutual labels:  text-classification, sentiment-analysis, attention-mechanism
Skater
Python Library for Model Interpretation/Explanations
Stars: ✭ 973 (+1232.88%)
Mutual labels:  deep-neural-networks, lstm-neural-networks
French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Stars: ✭ 35 (-52.05%)
Mutual labels:  sentiment-analysis, tweets
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-47.95%)
Mutual labels:  text-classification, sentiment-analysis
Easy Deep Learning With Allennlp
🔮Deep Learning for text made easy with AllenNLP
Stars: ✭ 32 (-56.16%)
Mutual labels:  deep-neural-networks, text-classification

Sarcasm-Detection

Sarcasm is a form of verbal irony that is intended to express contempt or ridicule. Relying on the shared knowledge between the speaker and his audience, sarcasm requires wit to understand and wit to produce. In our daily interactions, we use gestures and mimics, intonation and prosody to hint the sarcastic intent. Since we do not have access to such paralinguistic cues, detecting sarcasm in written text is a much harder task.

I investigated various methods to detect sarcasm in tweets, using both traditional machine learning (SVMs and Logistic Regressors on discrete features) and deep learning models (CNNs, LSTMs, GRUs, Bi-directional LSTMs and attention-based LSTMs), evaluating them on 4 different Twitter datasets (details in res/).

This research project was completed in partial fulfilment of the requirements for the degree of Bachelor of Science in Computer Science at the University of Manchester and under the careful supervision of Mr John McNaught, my tutor and mentor.

The overall project achievements are explained in this video https://www.youtube.com/watch?v=ofrn3T76dHg.

Overview

  • src/ contains all the source code used to process, analyse, train and evaluate the datasets (as described in res/) in order to investigate sarcasm detection on Twitter data
  • res/ contains both the raw and processed datasets as well as some useful vocabularies, lists or selections of words/emojis that proved very useful in pre-processing the data
  • models/ contains all the pre-trained models contributing to the achievement of the claimed results as well as all the trained models, saved after training under the described parameters and DL architectures
  • plots/ contains a collection of interesting plots that should be useful in analysing and sustaining the results obtained
  • stats/ contains some comparisons between preprocessing phases as well as some raw statistical results collected while training/evaluating
  • images/ contains the visualizations obtained and some pictures of the architectures or models used in the report or screencast

Dependencies

The code included in this repository has been tested to work with Python 3.5 on an Ubuntu 16.04 machine, using Keras 2.0.8 with Tensorflow as the backend.

List of requirements

Installation and running

  1. Clone the repository and make sure that all the dependencies listed above are installed.
  2. Download all the resources from here and place them in the res/ directory
  3. Download the pre-trained models from here and place them in the models/ directory
  4. Go to the src/ directory
  5. For a thorough feature analysis, run:
python feature_analysis.py
  1. For training and evaluating a traditional machine learning model, run:
python ml_models.py
  1. For training and evaluating the embeddings (word and/or emojis/deepmojis), run:
python embeddings_model.py
  1. For training and evaluating various deep learning models, quickly implemented in Keras, run:
python dl_models.py
  1. For training and evaluating the attention-based LSTM model implemented in TensorFlow, run:
python tf_attention.py

By default, the dataset collected by Ghosh and Veale (2016) is used, but this can be easily replaced by changing the dataset parameter in the code (as for all other parameters).

Results

Here are the results obtained on the considered datasets.

Results

Visualizations

You can obtain a nice visualization of a deep layer by extracting the final weights and colour the hidden units distinctively. Running either of the two files below will produce a .html file in plots/html_visualizations/.

LSTM visualization

Visualize the LSTM weights for a selected example in the test set after you have trained the model (here we use a simpler architecture with fewer hidden units and no stacked LSTMs in order to visualize anything sensible). Excitatory units (weight > 0) are coloured in a reddish colour while inhibitory units (weight < 0) in a bluish colour. Colour gradients are used to distinguish the heavy from the weak weights. Run:

python src/visualize_hidden_units.py

In the sample visualization given below, doctor, late and even lame have heavier weights and therefore are contributing more to sarcasm recognition (since they receive more attention). Historically, we know that going to the doctor is regarded as an undesirable activity (so it is subject to strong sarcastic remarks) while late and lame are sentiment-bearing expressions, confirming previous results about sarcastic cues in written and spoken language.

LSTM visualization

LSTM visualization

Other visualizations are available in images/

Attention visualization

Visualize the attention words over the whole (or a selection of the) test set after you have trained the model. The network is paying attention to some specific words (supposedly, those who contribute more towards a sarcasm decision being made). A reddish colour is used to emphasize attention weights while colour gradients are used to distinguish the heavy from the weak weights. Run:

python src/visualize_tf_attention.py

In the sample visualization given below, strong sentiment-bearing words, stereotypical topics, emojis, punctuation, numerals and sometimes slang or ungrammatical words are receiving more attention from the network and therefore are contributing more to sarcasm recognition.

Attention visualization

Disclaimer

The purpose of this project was not to produce the most optimally efficient code, but to draw some useful conclusions about sarcasm detection in written text (specifically, for Twitter data). However, it is not disastrously inefficient - actually, it should be fast enough for most purposes. Although the code has been verified and reviewed, I cannot guarantee that there are absolutely no bugs or faults so use the code on your own responsibility.

License

The source code and all my pre-trained models are licensed under the MIT license.

References

[1] Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In EMNLP, volume 13, pages 704–714.

[2] Aniruddha Ghosh and Tony Veale. 2016. Fracking Sarcasm using Neural Network. 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA 2016). NAACL-HLT.

[3] Tomas Ptacek, Ivan Habernal, and Jun Hong. 2014. Sarcasm detection on Czech and English Twitter. In COLING, pages 213–223.

[4] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

[5] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. "GloVe: Global Vectors for Word Representation," in Proceedings of the 2014 Conference on Empirical Methods In Natural Language Processing (EMNLP 2014), October 2014.

[6] Ben Eisner, Tim Rocktäschel, Isabelle Augenstein, Matko Bošnjak, and Sebastian Riedel. “emoji2vec: Learning Emoji Representations from their Description,” in Proceedings of the 4th International Workshop on Natural Language Processing for Social Media at EMNLP 2016 (SocialNLP at EMNLP 2016), November 2016.

[7] Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. 1997. In Neural Computation 9(8):1735-80.

[8] Dzmitry Bahdanau, KyungHyun Cho and Yoshua Bengio. 2016. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv:1409.0473v7

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].