Alternatives and detailed information of datastories-semeval2017-task6

cbaziotis / datastories-semeval2017-task6

Licence: other

Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Programming Languages

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to datastories-semeval2017-task6

SentimentAnalysis

Sentiment Analysis: Deep Bi-LSTM+attention model

Stars: ✭ 32 (+60%)

Mutual labels: word-embeddings, embeddings, lstm, computational-linguistics, semeval, attention-mechanism, nlp-machine-learning, twitter-messages

Datastories Semeval2017 Task4

Deep-learning model presented in "DataStories at SemEval-2017 Task 4: Deep LSTM with Attention for Message-level and Topic-based Sentiment Analysis".

Stars: ✭ 184 (+820%)

Mutual labels: word-embeddings, embeddings, lstm, attention, glove, attention-mechanism, keras-models, nlp-machine-learning

automatic-personality-prediction

[AAAI 2020] Modeling Personality with Attentive Networks and Contextual Embeddings

Stars: ✭ 43 (+115%)

Mutual labels: recurrent-neural-networks, lstm, attention, attention-mechanism

sentiment-analysis-of-tweets-in-russian

Sentiment analysis of tweets in Russian using Convolutional Neural Networks (CNN) with Word2Vec embeddings.

Stars: ✭ 51 (+155%)

Mutual labels: word-embeddings, embeddings, computational-linguistics, nlp-machine-learning

NTUA-slp-nlp

💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA

Stars: ✭ 19 (-5%)

Mutual labels: word-embeddings, attention, attention-mechanism, nlp-machine-learning

word2vec-tsne

Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.

Stars: ✭ 59 (+195%)

Mutual labels: word-embeddings, embeddings, computational-linguistics, nlp-machine-learning

Image Caption Generator

A neural network to generate captions for an image using CNN and RNN with BEAM Search.

Stars: ✭ 126 (+530%)

Mutual labels: recurrent-neural-networks, lstm, attention, attention-mechanism

ntua-slp-semeval2018

Deep-learning models of NTUA-SLP team submitted in SemEval 2018 tasks 1, 2 and 3.

Stars: ✭ 79 (+295%)

Mutual labels: lstm, attention, semeval, attention-mechanism

DeepLearningReading

Deep Learning and Machine Learning mini-projects. Current Project: Deepmind Attentive Reader (rc-data)

Stars: ✭ 78 (+290%)

Mutual labels: embeddings, attention, nlp-machine-learning

Text Classification Keras

📚 Text classification library with Keras

Stars: ✭ 53 (+165%)

Mutual labels: lstm, attention, nlp-machine-learning

Ner Lstm

Named Entity Recognition using multilayered bidirectional LSTM

Stars: ✭ 532 (+2560%)

Mutual labels: recurrent-neural-networks, embeddings, lstm

Linear Attention Recurrent Neural Network

A recurrent attention module consisting of an LSTM cell which can query its own past cell states by the means of windowed multi-head attention. The formulas are derived from the BN-LSTM and the Transformer Network. The LARNN cell with attention can be easily used inside a loop on the cell state, just like any other RNN. (LARNN)

Stars: ✭ 119 (+495%)

Mutual labels: recurrent-neural-networks, lstm, attention-mechanism

Document Classifier Lstm

A bidirectional LSTM with attention for multiclass/multilabel text classification.

Stars: ✭ 136 (+580%)

Mutual labels: recurrent-neural-networks, lstm, attention-mechanism

Multimodal Sentiment Analysis

Attention-based multimodal fusion for sentiment analysis

Stars: ✭ 172 (+760%)

Mutual labels: lstm, attention, attention-mechanism

Embeddingsviz

Visualize word embeddings of a vocabulary in TensorBoard, including the neighbors

Stars: ✭ 40 (+100%)

Mutual labels: word-embeddings, embeddings, glove

Deep learning nlp

Keras, PyTorch, and NumPy Implementations of Deep Learning Architectures for NLP

Stars: ✭ 407 (+1935%)

Mutual labels: word-embeddings, recurrent-neural-networks, attention

Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses

Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow

Stars: ✭ 33 (+65%)

Mutual labels: lstm, attention, attention-mechanism

Magnitude

A fast, efficient universal vector embedding utility package.

Stars: ✭ 1,394 (+6870%)

Mutual labels: word-embeddings, embeddings, glove

Pytorch Sentiment Analysis

Tutorials on getting started with PyTorch and TorchText for sentiment analysis.

Stars: ✭ 3,209 (+15945%)

Mutual labels: word-embeddings, recurrent-neural-networks, lstm

GTAV-Self-driving-car

Self driving car in GTAV with Deep Learning

Stars: ✭ 15 (-25%)

Mutual labels: recurrent-neural-networks, keras-models

View All Similar Projects ➔

Overview

This repository contains the source code for the models used for DataStories team's submission for SemEval-2017 Task 6 “#HashtagWars: Learning a Sense of Humor”. The model is described in the paper "SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".

Citation:

@InProceedings{baziotis-pelekis-doulkeridis:2017:SemEval1,
  author    = {Baziotis, Christos  and  Pelekis, Nikos  and  Doulkeridis, Christos},
  title     = {DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison},
  booktitle = {Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)},
  month     = {August},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {381--386}
}

A higl-level overview of the model for SubTask A.

Notes

If what you are just interested in the source code for the model then just see models/task6A_models.py.
The models were trained using Keras 1.2. In order for the project to work with Keras 2 some minor changes will have to be made.

Prerequisites

1 - Install Requirements

pip install -r /datastories-semeval2017-task6/requirements.txt

2 - Download pre-trained Word Embeddings

The models were trained on top of word embeddings pre-trained on a big collection of Twitter messages. We collected a big dataset of 330M English Twitter messages posted from 12/2012 to 07/2016. For training the word embeddings we used GloVe. For preprocessing the tweets we used ekphrasis, which is also one of the requirements of this project.

You can download one of the following word embeddings:

datastories.twitter.50d.txt: 50 dimensional embeddings
datastories.twitter.100d.txt: 100 dimensional embeddings
datastories.twitter.200d.txt: 200 dimensional embeddings
datastories.twitter.300d.txt: 300 dimensional embeddings

Place the file(s) in /embeddings folder, for the program to find it.

Execution

Word Embeddings

In order to specify which word embeddings file you want to use, you have to set the values of WV_CORPUS and WV_WV_DIM in task6A.py and task6A_LOO.py respectively. The default values are:

WV_CORPUS = "datastories.twitter"
WV_DIM = 300

The convention we use to identify each file is:

{corpus}.{dimensions}d.txt

This means that if you want to use another file, for instance GloVe Twitter word embeddings with 200 dimensions, you have to place a file like glove.200d.txt inside /embeddings folder and set:

WV_CORPUS = "glove"
WV_DIM = 200

Model Training

You will find the programs for training the Keras models, in /models folder.

models
│   task6A_models.py : contains the Keras models
│   task6A.py        : program for training the model for Task6A
│   task6A_LOO.py    : program for Leave-One-Out cross validation

Semeval 2017 Task6A: For training a model for Semeval 2017 Task6A, then you have to run task6A.py. Read the source code and configure the program using the corresponding flags.

If running with flag PERSIST=True then the checkpointing will be ON. This means that the model weights with the corresponding word indices will be saved to disk:

models/cp_model_task6_sub1.hdf5
models/cp_model_task6_sub1_word_indices.pickle

Usually after 1 or 2 epochs the network will start to overfit so you can just stop the execution.

#HashtagWars evaluation: In order to test our model using the evaluation method (Leave-One-Out cross validation) in Potash et al. "# HashtagWars: Learning a Sense of Humor." arXiv:1612.03216, you have to run task6A_LOO.py. Read the source code and configure the program using the corresponding flags.

The program will save the results of each run and place them in /models/results/. You can evaluate those results by running /models/results/results_loo.py.

Generate submissions

The submissions/ folder contains a trained model with the corresponding word indices and the generated submission files. If you want to generate new submissions for the SemEval test set, just train a model with task6A.py and move the files

models/cp_model_task6_sub1.hdf5
models/cp_model_task6_sub1_word_indices.pickle

to the submissions/ folder.

You can generate new submissions and evaluate the performance of a model with submissions/submit_task6_1.py.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting