All Projects → declare-lab → Cascade

declare-lab / Cascade

This repo contains code to detect sarcasm from text in discussion forum using deep learning

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Cascade

Reddit flutter
Some demo app with Flutter, including reddit topics
Stars: ✭ 49 (-15.52%)
Mutual labels:  reddit
Text Classification Keras
📚 Text classification library with Keras
Stars: ✭ 53 (-8.62%)
Mutual labels:  lstm
Social Lstm Vehicletrajectory
Social LSTM using PyTorch for Vehicle Data
Stars: ✭ 57 (-1.72%)
Mutual labels:  lstm
Tweet ui
Flutter package to show Tweets from a Twitter API JSON on Android and iOS. Support for Tweets with 1-4 photos, Video, GIFs, hashtags, mentions, symbols, urls, quoted Tweets and retweets.
Stars: ✭ 50 (-13.79%)
Mutual labels:  tweets
Tensorflow Lstm Sin
TensorFlow 1.3 experiment with LSTM (and GRU) RNNs for sine prediction
Stars: ✭ 52 (-10.34%)
Mutual labels:  lstm
Pointer Networks Experiments
Sorting numbers with pointer networks
Stars: ✭ 53 (-8.62%)
Mutual labels:  lstm
Rnn Notebooks
RNN(SimpleRNN, LSTM, GRU) Tensorflow2.0 & Keras Notebooks (Workshop materials)
Stars: ✭ 48 (-17.24%)
Mutual labels:  lstm
Strimoid
Source code of Strm
Stars: ✭ 57 (-1.72%)
Mutual labels:  reddit
Time Attention
Implementation of RNN for Time Series prediction from the paper https://arxiv.org/abs/1704.02971
Stars: ✭ 52 (-10.34%)
Mutual labels:  lstm
Netcore Postgres Oauth Boiler
A basic .NET Core website boilerplate using PostgreSQL for storage, Adminer for db management, Let's Encrypt for SSL certificates and NGINX for routing.
Stars: ✭ 57 (-1.72%)
Mutual labels:  reddit
Jambot
Stars: ✭ 50 (-13.79%)
Mutual labels:  lstm
Ner blstm Crf
LSTM-CRF for NER with ConLL-2002 dataset
Stars: ✭ 51 (-12.07%)
Mutual labels:  lstm
Shine Unofficial
Unofficial fork of SHINE for Reddit.
Stars: ✭ 54 (-6.9%)
Mutual labels:  reddit
Deepseqslam
The Official Deep Learning Framework for Route-based Place Recognition
Stars: ✭ 49 (-15.52%)
Mutual labels:  lstm
Char rnn lm zh
language model in Chinese,基于Pytorch官方文档实现
Stars: ✭ 57 (-1.72%)
Mutual labels:  lstm
Twitterldatopicmodeling
Uses topic modeling to identify context between follower relationships of Twitter users
Stars: ✭ 48 (-17.24%)
Mutual labels:  tweets
Image Captioning
Image Captioning: Implementing the Neural Image Caption Generator with python
Stars: ✭ 52 (-10.34%)
Mutual labels:  lstm
Removed
An Android app for quickly viewing [removed] and [deleted] reddit comments.
Stars: ✭ 57 (-1.72%)
Mutual labels:  reddit
Lstm Context Embeddings
Augmenting word embeddings with their surrounding context using bidirectional RNN
Stars: ✭ 57 (-1.72%)
Mutual labels:  lstm
Speech Music Detection
Python framework for Speech and Music Detection using Keras.
Stars: ✭ 56 (-3.45%)
Mutual labels:  lstm

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Code for the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums (COLING 2018, New Mexico).

Description

In this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions (Reddit).

Requirements

  1. Clone this repo.
  2. Python (2.7 or 3.3-3.6)
  3. Install your preferred version of TensorFlow 1.4.0 (for CPU, GPU; from PyPI, compiled, etc).
  4. Install the rest of the requirements: pip install -r requirements.txt
  5. Download the FastText pre-trained embeddings and extract it somewhere.
  6. Download the comments.json dataset file [1] and place it in data/.
  7. If you want to run the Preprocessing steps (optional), install YAJL 2, download Preprocessing instructions. Otherwise, just download Running CASCADE section.

Preprocessing

User Embeddings

  1. User Embeddings: Stylometric features.

    The file data/comments.json has Reddit users and their corresponding comments. Per user, there might be multiple number of comments. Hence, we concatenate all the comments corresponding to the same user with the <END> tag:

    cd users
    python create_per_user_paragraph.py
    

    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

    python train_stylometric.py
    

    Generate user_stylometric.csv (user stylometric features) using the trained model:

    python generate_stylometric.py
    
  2. User Embeddings: Personality features

    Pre-train a CNN-based model to detect personality features from text. The code utilizes two datasets to train. The second dataset [2] can be obtained by requesting it to the original authors.

    python process_data.py [path/to/FastText_embedding]
    python train_personality.py
    

    Generate user_personality.csv (user personality features) using this model:

    python generate_user_personality.py
    

    To use the pre-trained model from our experiments, download the model weights and unzip them inside the folder user/.

  3. User Embeddings: Multi-view fusion

    Merge the user_stylometric.csv and user_personality.csv files into a single merged user_view_vectors.csv file:

    python merge_user_views.py
    

    Multi-view fusion of the user views (stylometric and personality) is performed using GCCA (~ CCA for two views). Generate fused user embeddings user_gcca_embeddings.npz using the following command:

    python user_wgcca.py --input user_embeddings/user_view_vectors.csv --output user_embeddings/user_gcca_embeddings.npz --k 100 --no_of_views 2
    

    This implementation of GCCA has been adapted from the wgcca repo.

    Finally:

    cd ..
    
  4. Discourse Embeddings

    Similar to user stylometric features, create the discourse features for each discussion forum (sub-reddit):

    cd discourse
    python create_per_discourse_paragraph.py
    

    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

    python train_discourse.py
    

    Generate discourse.csv (user stylometric features) using the trained model:

    python generate_discourse.py
    

    Finally:

    cd ..
    

Running CASCADE

Hybrid CNN

Hybrid CNN combining user-embeddings and discourse-features with textual modeling.

cd src
python process_data.py [path/to/FastText_embedding]
python train_cascade.py

The CNN codebase has been adapted from the repo cnn-text-classification-tf from Denny Britz.

Citation

If you use this code in your work then please cite the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums with the following:

@InProceedings{C18-1156,
  author = 	"Hazarika, Devamanyu
		and Poria, Soujanya
		and Gorantla, Sruthi
		and Cambria, Erik
		and Zimmermann, Roger
		and Mihalcea, Rada",
  title = 	"CASCADE: Contextual Sarcasm Detection in Online Discussion Forums",
  booktitle = 	"Proceedings of the 27th International Conference on Computational Linguistics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1837--1848",
  location = 	"Santa Fe, New Mexico, USA",
  url = 	"http://aclweb.org/anthology/C18-1156"
}

References

[1]. Khodak, Mikhail, Nikunj Saunshi, and Kiran Vodrahalli. "A large self-annotated corpus for sarcasm." Proceedings of the Eleventh International Conference on Language Resources and Evaluation. 2018.

[2]. Celli, Fabio, et al. "Workshop on computational personality recognition (shared task)." Proceedings of the Workshop on Computational Personality Recognition. 2013.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].