All Projects → collab-uniba → Emotion_and_Polarity_SO

collab-uniba / Emotion_and_Polarity_SO

Licence: other
An emotion classifier of text containing technical content from the SE domain

Programming Languages

OpenEdge ABL
179 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Emotion and Polarity SO

Senti4SD
An emotion-polarity classifier specifically trained on developers' communication channels
Stars: ✭ 41 (-44.59%)
Mutual labels:  sentiment-analysis, sentiment, emotion, polarity
hfusion
Multimodal sentiment analysis using hierarchical fusion with context modeling
Stars: ✭ 42 (-43.24%)
Mutual labels:  sentiment-analysis, emotion-detection, emotion-recognition
chronist
Long-term analysis of emotion, age, and sentiment using Lifeslice and text records.
Stars: ✭ 23 (-68.92%)
Mutual labels:  sentiment-analysis, sentiment, emotion
XED
XED multilingual emotion datasets
Stars: ✭ 34 (-54.05%)
Mutual labels:  sentiment-analysis, emotion-detection, emotion-recognition
sklearn-audio-classification
An in-depth analysis of audio classification on the RAVDESS dataset. Feature engineering, hyperparameter optimization, model evaluation, and cross-validation with a variety of ML techniques and MLP
Stars: ✭ 31 (-58.11%)
Mutual labels:  emotion, emotion-detection, emotion-recognition
sentiment-analysis-using-python
Large Data Analysis Course Project
Stars: ✭ 23 (-68.92%)
Mutual labels:  classifier, sentiment-analysis, sentiment
afinn-111
AFINN 111 (list of English words rated for valence) in JSON
Stars: ✭ 44 (-40.54%)
Mutual labels:  sentiment, polarity
Personal-Emotional-Stylized-Dialog
A Paper List for Personalized, Emotional, and stylized Dialog
Stars: ✭ 112 (+51.35%)
Mutual labels:  sentiment, emotion
Resnet-Emotion-Recognition
Identifies emotion(s) from user facial expressions
Stars: ✭ 21 (-71.62%)
Mutual labels:  emotion, emotion-recognition
Twitter Sentiment Analysis
This script can tell you the sentiments of people regarding to any events happening in the world by analyzing tweets related to that event
Stars: ✭ 94 (+27.03%)
Mutual labels:  sentiment-analysis, sentiment
AffectiveTweets
A WEKA package for analyzing emotion and sentiment of tweets.
Stars: ✭ 74 (+0%)
Mutual labels:  sentiment, emotion
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (-71.62%)
Mutual labels:  sentiment-analysis, sentiment
Ml Classify Text Js
Machine learning based text classification in JavaScript using n-grams and cosine similarity
Stars: ✭ 38 (-48.65%)
Mutual labels:  classifier, sentiment-analysis
Nlp.js
An NLP library for building bots, with entity extraction, sentiment analysis, automatic language identify, and so more
Stars: ✭ 4,670 (+6210.81%)
Mutual labels:  classifier, sentiment-analysis
Whatsapp-analytics
performing sentiment analysis on the whatsapp chats.
Stars: ✭ 20 (-72.97%)
Mutual labels:  sentiment-analysis, sentiment
Sentiment
AFINN-based sentiment analysis for Node.js.
Stars: ✭ 2,469 (+3236.49%)
Mutual labels:  sentiment-analysis, sentiment
LSTM-sentiment-analysis
LSTM sentiment analysis. Please look at my another repo for SVM and Naive algorithem
Stars: ✭ 19 (-74.32%)
Mutual labels:  sentiment-analysis, sentiment
RECCON
This repository contains the dataset and the PyTorch implementations of the models from the paper Recognizing Emotion Cause in Conversations.
Stars: ✭ 126 (+70.27%)
Mutual labels:  emotion, emotion-recognition
GroupDocs.Classification-for-.NET
GroupDocs.Classification-for-.NET samples and showcase (text and documents classification and sentiment analysis)
Stars: ✭ 38 (-48.65%)
Mutual labels:  sentiment-analysis, sentiment
Text Analytics With Python
Learn how to process, classify, cluster, summarize, understand syntax, semantics and sentiment of text data with the power of Python! This repository contains code and datasets used in my book, "Text Analytics with Python" published by Apress/Springer.
Stars: ✭ 1,132 (+1429.73%)
Mutual labels:  sentiment-analysis, sentiment

EmoTxt

A toolkit for emotion detection from technical text. It is part of the Collab Emotion Mining Toolkit (EMTk).

Fair Use Policy

Please, cite the following paper if you intend to use our tool for your own research:

F. Calefato, F. Lanubile, N. Novielli. “EmoTxt: A Toolkit for Emotion Recognition from Text” In Proceedings of the Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos, {ACII} Workshops 2017, San Antonio, USA, Oct. 23-26, 2017, pp. 79-80, ISBN: 978-1-5386-0563-9.

Installation

You will need to install Git LFS extension to check out this project. Once installed and initialized, simply run:

$ git clone https://github.com/collab-uniba/Emotion_and_Polarity_SO.git

Requirements

  • Ram: 8GB

  • Python 2.7.x

    • Libraries
      • nltk, numpy, scikit_learn, scipy, pattern
        • Installation: open the command line and run $ pip install -r requirements.txt

        • Complete the nltk installation: Run the Python interpreter and type the commands

          >>> import nltk        
          >>> nltk.download()
          
  • Java 8+

    • Maven 3.x
      • if you want to build the jar yourself type the following commands
        cd java
        mvn clean install
      • The fat jar will be generated in the java/targetfolder with the name EmotionAndPolarity-0.0.1-SNAPSHOT-jar-with-dependencies.jar. Rename it as Emotion_and_Polarity_SO.jar and move it directly under the java folder.
  • R

    • Libraries:
      • caret, LiblinearR , e1071
        • Installation: open the command line and run

          $ Rscript requirements.R

Usage and dataset

In the following, we show first how to train a new model for emotion classification and, then, how to test the model on unseen data.

For testing purposes, you can use the sample.csv input file available in the root of the repo. Other, more complex examples, look at the dataset files available under the subfolder ./java/DatasetSO/StackOverflowCSV.

If you are looking for the entire experimental dataset of ~5K Stack Overflow posts annotated with emotion, it is available from this repository.

Training a new model for emotion classification (70/30% split for train and test)

$ sh train.sh -i file.csv -d delimiter [-g] [-p] -e emotion 

where:

  • -i file.csv: the input file coded in UTF-8 without BOM, containing the input corpus. Please, note that gold label are required for each item in the dataset. The format of the input file is the following:

    id;label;text
    ...
    22;NO;"""Excellent! This is exactly what I needed. Thanks!"""
    23;YES;"""FEAR!!!!!!!!!!!"""
    ...
    
  • -d delimiter: the delimiter used in the csv file (values in {c, sc}, where stands for comma and sc for semicolon). Please, note that all the example files provided here use semicolon as delimiter, so -d sc is a mandatory option during tests.

  • -g: enables the extraction of n-grams (i.e,. bigrams and unigrams). N-grams extraction is mandatory for the first run when you want to train a new classification model for a given emotion, using your own dataset for the first time. Because n-gram extraction is computationally expensive, it should be skipped if you retrain the model for the same emotion using the same input file.

  • -p: enables the extraction of features regarding politeness, mood and modality. Because this is computationally expensive, the switch is off by default.

  • -e emotion: the specific emotion for which you want to train a classification model, with values in {joy, anger, sadness, love, surprise, fear}.

As a result, the script will generate the following output files:

  • An output folder named training_<file.csv>_<emotion>/, containing:
    • n-grams/: a subfolder containing the extracted n-grams
    • idfs/: a subfolder containing the IDFs computed for n-grams and WordNet Affect emotion words
    • feature-<emotion>.csv: a .csv file with the features extracted from the input corpus and used for training the model
    • liblinear/:
      • there are two subfolders: DownSampling/ and NoDownSampling/. Each one contains:
        • trainingSet.csv
        • testingSet.csv
        • eight models trained with liblinear model_<emotion>_<IDMODEL>.Rda, where IDMODEL is the ID of the liblinear model, with values in {0,...,7}):
        • performance_<emotion>_<IDMODEL>.txt, containing the results of the parameter tuning for the model (best C) as performed by caret, the confusion matrix and the precision, recall and f-measure for the best cost for the specific emotion
        • predictions_<emotion>_<IDMODEL>.csv, containing the test instances with the predicted labels for the specific emotion

Emotion detection

$ sh classify.sh -i file.csv -d delimiter -e emotion [-m model] [-f idf] [-o n-grams] [-l] [-p]

where:

  • -i file.csv: the input csv file with header and coded in UTF-8 without BOM, containing the corpus to be classified; the format of the input file is the following:

    id;label;text
    ...
    22;NO;"""Excellent! This is exactly what I needed. Thanks!"""
    23;YES;"""FEAR!!!!!!!!!!!"""
    ...
    
  • -d delimiter: the delimiter used in the csv file (values in {c, sc}, where stands for comma and sc for semicolon). Please, note that all the example files provided here use semicolon as delimiter, so -d sc is a mandatory option during tests.

  • -e emotion: the specific emotion to be detected in the input file or text, defined in {joy, anger, sadness, love, surprise, fear}.

  • -m model: the model file learnt during the training step (e.g., model-anger.rda). If you don't specify the model name, the default model will be used, that is the one learnt on our Stack Overflow gold standard.

  • -o n-grams: if you specify a model name using -m (i.e., you don't want to use the default model for a given emotion) you are required to provide also the path to the folder containing the dictionaries extracted during the training step. This folder includes n-grams, i.e., UnigramsList.txt and BigramsList.txt.

  • -f idf: if you specify a model name using -m (i.e., you don't want to use the default model for a given emotion) you are required to specify also the path to the folder containing the dictionaries with IDFs computed during the training step. The folder includes IDFs for n-grams (uni- and bi-grams) and for WordNet Affect lists of emotion words.

  • -l: if presents , indicates <file.csv> contains a gold label in the column label.

  • -p: enables the extraction of features regarding politeness, mood and modality. Because this is computationally expensive, the switch is off by default.

As a result, the script will create an output folder named classification_<file.csv>_<emotion> containing:

  • predictions_<emotion>.csv: a csv file with header, containing a binary prediction (yes/no) for each line of the input corpus:

    id;predicted
    ...
    22;NO
    23;YES
    ...
    
  • performance_<emotion>.txt: a file created only if the input corpus <file.csv> contains the column label; the file contains several performance metrics (Precision, Recall, F1, confusion matrix).

For example, if you wanted to detect anger in the input file sample.csv, you would have to run:

$ sh classify.sh -i sample.csv -d sc -p -e anger
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].