All Projects → pln-fing-udelar → pghumor

pln-fing-udelar / pghumor

Licence: Apache-2.0 license
Is This a Joke? Humor Detection in Spanish Tweets

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects

Projects that are alternatives of or similar to pghumor

Vehicle Detection And Tracking
Udacity Self-Driving Car Engineer Nanodegree. Project: Vehicle Detection and Tracking
Stars: ✭ 60 (+25%)
Mutual labels:  classifier, svm
Bag-of-Visual-Words
🎒 Bag of Visual words (BoW) approach for object classification and detection in images together with SIFT feature extractor and SVM classifier.
Stars: ✭ 39 (-18.75%)
Mutual labels:  classifier, svm
Scene Text Recognition
Scene text detection and recognition based on Extremal Region(ER)
Stars: ✭ 146 (+204.17%)
Mutual labels:  classifier, svm
Sarcasm Detection
Detecting Sarcasm on Twitter using both traditonal machine learning and deep learning techniques.
Stars: ✭ 73 (+52.08%)
Mutual labels:  tweets, svm
Deep-Learning-Experiments-implemented-using-Google-Colab
Colab Compatible FastAI notebooks for NLP and Computer Vision Datasets
Stars: ✭ 16 (-66.67%)
Mutual labels:  classifier, jokes
golinear
liblinear bindings for Go
Stars: ✭ 45 (-6.25%)
Mutual labels:  classifier, svm
sentiment-analysis-using-python
Large Data Analysis Course Project
Stars: ✭ 23 (-52.08%)
Mutual labels:  classifier, svm
regression-stock-prediction
Predicting Google’s stock price using regression
Stars: ✭ 54 (+12.5%)
Mutual labels:  svm
saliency
Contextual Encoder-Decoder Network for Visual Saliency Prediction [Neural Networks 2020]
Stars: ✭ 126 (+162.5%)
Mutual labels:  paper-implementations
logically
explorations in core.logic
Stars: ✭ 108 (+125%)
Mutual labels:  paper-implementations
SEDTWik-Event-Detection-from-Tweets
Segmentation based event detection from Tweets. Published at NAACL SRW 2019
Stars: ✭ 58 (+20.83%)
Mutual labels:  tweets
svm
Support Vector Machine in Javascript
Stars: ✭ 31 (-35.42%)
Mutual labels:  svm
speaker-recognition-papers
Share some recent speaker recognition papers and their implementations.
Stars: ✭ 92 (+91.67%)
Mutual labels:  paper-implementations
NN-scratch
Coding up a Neural Network Classifier from Scratch
Stars: ✭ 78 (+62.5%)
Mutual labels:  classifier
TwitterClone
Based on core principles of Twitter but different in many ways.
Stars: ✭ 27 (-43.75%)
Mutual labels:  tweets
weibopy
Sina Weibo API SDK
Stars: ✭ 23 (-52.08%)
Mutual labels:  tweets
COVID-19-tweets-for-check-worthiness
COVID-19 Infodemic Twitter dataset
Stars: ✭ 14 (-70.83%)
Mutual labels:  tweets
tensorflow-image-classifier
Easily train an image classifier and then use it to label/tag other images
Stars: ✭ 29 (-39.58%)
Mutual labels:  classifier
material-appearance-similarity
Code for the paper "A Similarity Measure for Material Appearance" presented in SIGGRAPH 2019 and published in ACM Transactions on Graphics (TOG).
Stars: ✭ 22 (-54.17%)
Mutual labels:  paper-implementations
VisualML
Interactive Visual Machine Learning Demos.
Stars: ✭ 104 (+116.67%)
Mutual labels:  svm

pgHumor: Humor detection in Spanish tweets

This thesis is about deciding if a tweet written in Spanish is humorous or not, applying Supervised Machine Learning. It was carried out by Matías Cubero and Santiago Castro, and supervised by Guillermo Moncecchi and Diego Garat. For detailed information, see the final report.

Abstract

Looking at this tweet:

— Yesterday, when leaving work I ran over a unicorn.

— No way, you got job?

which is the translated version of this one:

— Ayer, al salir del trabajo atropellé a un unicornio.

— No jodas, ¿tenés trabajo?

Make us think: what makes it funny? What is Humor? What generates laughter? This project tries to approach this. Theory does exist, however none manages to be completely accurate.

16,488 tweets where fetches from humorous accounts and 22,875 from non-humorous (news, philosophical phrases and interesting facts). A web app and an Android app were made so people could give their opinion about which ones are really humorous. 33,531 votes were received from early September to the end of October 2014 (thanks!). It turned out to be little humor in humorous accounts:

Humor ratio according to the people

This classifier was built based on features that search for informality, certain kind of format, topics that cause psychological tension, among others. It uses techniques such as SVM, kNN, Decision Trees and Naïve Bayes. It achieves a precision of 83.6% and a recall of 68.9% over the created corpus.

A demo was also developed to show the obtained results.

Additional work

We want to thank Diego Serra and Ignacio Acuña, who carried out their High Performance Computing course project about this job, supervised by Sergio Nesmachnow, with the aim of improving the performance of the algorithms when computing the features values. It can be seen in the hpc-entrega tag. The continuation of their line of work is in the hpc branch.

Installation

The main dependencies are:

  • Python 2.7 (and some libraries; please see the code)
  • MySQL
  • Freeling (SVN revision number 2588)

Setup

corpus.sql and chistesdotcom.sql dumps must be loaded.

In the file clasificador/config/environment.py write the Twitter API credentials and the database related information. An example of this files is the following:

# coding=utf-8
from __future__ import absolute_import, division, print_function, unicode_literals

import os

# Twitter API credentials
os.environ['CONSUMER_KEY'] = '--CONSUMER KEY--'
os.environ['CONSUMER_SECRET'] = '--CONSUMER SECRET--'
os.environ['ACCESS_KEY'] = '--ACCESS KEY--'
os.environ['ACCESS_SECRET'] = '--ACCESS SECRET--'

os.environ['DB_HOST'] = 'localhost'
os.environ['DB_USER'] = 'pghumor'
os.environ['DB_PASS'] = '--PASSWORD--'
os.environ['DB_NAME'] = 'corpus'
os.environ['DB_NAME_CHISTES_DOT_COM'] = 'chistesdotcom'

Export and save the env variable of Freeling too:

FREELINGSHARE=/usr/local/share/freeling
echo "export FREELINGSHARE=$FREELINGSHARE" >> ~/.bashrc

Run

Start Freeling servers (to compute the features):

./freeling.sh start

And then execute:

clasificador/main.py

To stop the Freeling servers:

./freeling.sh stop

Help

clasificador/main.py --help

Server mode

clasificador/main.py --servidor

To test it:

curl --data-urlencode texto="This is a test" localhost:5000/evaluar

Run tests

./tests.sh

Citation

If you use this work in research, please cite us:

@inproceedings{castro2016joke,
  title={Is This a Joke? Detecting Humor in Spanish Tweets},
  author={Castro, Santiago and Cubero, Mat{\'\i}as and Garat, Diego and Moncecchi, Guillermo},
  booktitle={Ibero-American Conference on Artificial Intelligence},
  pages={139--150},
  year={2016},
  organization={Springer}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].