All Projects → SEntiMoji → SEntiMoji

SEntiMoji / SEntiMoji

Licence: other
data, code, pre-trained models and experiment results for "SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering"

Programming Languages

python
139335 projects - #7 most used programming language
M
324 projects

Projects that are alternatives of or similar to SEntiMoji

Emojibase
🎮 A collection of lightweight, up-to-date, pre-generated, specification compliant, localized emoji JSON datasets, regex patterns, and more.
Stars: ✭ 248 (+818.52%)
Mutual labels:  emoji
todo-emojis
Track todos in Slack using custom checkbox emojis
Stars: ✭ 90 (+233.33%)
Mutual labels:  emoji
freesources
Repository of free resources for learning Software Development
Stars: ✭ 38 (+40.74%)
Mutual labels:  software-engineering
Console Dot Emoji
🍕 Custom Console Logging with Emoji
Stars: ✭ 251 (+829.63%)
Mutual labels:  emoji
sticker-finder
⚡ A telegram bot for searching all the stickers (just like @gif).
Stars: ✭ 90 (+233.33%)
Mutual labels:  emoji
refined.blog
curated list of personal blogs
Stars: ✭ 144 (+433.33%)
Mutual labels:  software-engineering
Typography
C# Font Reader (TrueType / OpenType / OpenFont / CFF / woff / woff2) , Glyphs Layout and Rendering
Stars: ✭ 246 (+811.11%)
Mutual labels:  emoji
text2emoji
Predict an emoji that is associated with a text
Stars: ✭ 30 (+11.11%)
Mutual labels:  emoji
LearningResources
A centralised hub for learner around the globe from A-Z. You can find collections of manuals, blogs, hacks, one liners, courses, other free learning-resources and more
Stars: ✭ 63 (+133.33%)
Mutual labels:  software-engineering
ArchitectureWeekly
Architecture Weekly - links and resources to boost your knowledge and developer skills
Stars: ✭ 1,060 (+3825.93%)
Mutual labels:  software-engineering
Emoji
🚀 Find the emoji(Unicode)
Stars: ✭ 250 (+825.93%)
Mutual labels:  emoji
Emojitwo
Fork of the last fully free EmojiOne™ 2 artwork
Stars: ✭ 250 (+825.93%)
Mutual labels:  emoji
AllGithubEmojis
A list of all supported github emojis updated weekly. https://jzeferino.github.io/AllGithubEmojis/
Stars: ✭ 82 (+203.7%)
Mutual labels:  emoji
Gitmoji Changelog
A changelog generator for gitmoji 😜
Stars: ✭ 250 (+825.93%)
Mutual labels:  emoji
rn-emoji-keyboard
Super performant, lightweight, fully customizable emoji picker 🚀
Stars: ✭ 228 (+744.44%)
Mutual labels:  emoji
Java Emoji Converter
Emoji转换工具,便于各种类型的客户端生成的Emoji字符串转换成另外一种格式
Stars: ✭ 249 (+822.22%)
Mutual labels:  emoji
ceil
Helmut Hoffer von Ankershoffen experimenting with auto-provisioned RPi cluster running K8S on bare-metal
Stars: ✭ 42 (+55.56%)
Mutual labels:  software-engineering
country-flags
A small package to convert a country code to the corresponding country flag emoji
Stars: ✭ 27 (+0%)
Mutual labels:  emoji
emoticon
List of emoticons
Stars: ✭ 41 (+51.85%)
Mutual labels:  emoji
GitHub-Custom-Emojis
:_gimme: This userscript allows you to use and add custom emojis on GitHub
Stars: ✭ 59 (+118.52%)
Mutual labels:  emoji

ReadMe

This repository contains the data, code, pre-trained models and experiment results for the paper: [SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering] .

SEntiMoji

This study proposes SEntiMoji, which leverages the texts containing emoji from both Github and Twitter to improve the sentiment analysis and emotion detection task in software engineering (SE) domain. SEntiMoji is demonstrated to be able to significantly outperform the exisiting SE-customized sentiment analysis and emotion detection methods on representative benchmark datasets.

Overview

  • data/ contains the data used in this study. It contains two subfolders:

  • code/ contains the scripts of SEntiMoji model. The variants of SEntiMoji share the same scripts with it.

    • SEntiMoji_script/ contains the representation learning code (Deepmoji/deepmoji), the pipeline code for training and evaluating (pipeline.py), the files mapping labels to class indexes (label2index/), and vocabulary dicts for each pre-trained representation model (vocabulary/).
    • Mtest.py is responsible for the McNemar’s test.
  • trained_model/ contains the pre-trained embeddings, representation models, and final sentiment classifier. It contains three subfolders:

    • word_embeddings/ contains the word embeddings trained on GitHub posts.
    • representation_model/ contains the pre-trained representation models used for SEntiMoji (i.e., model_SEntiMoji.hdf5), SEntiMoji-G (i.e., model_SEntiMoji-G.hdf5), and SEntiMoji-T (i.e., model_SEntiMoji-T.hdf5).

    ⚠️ Since the size of model and embedding exceeds the Github file size limit, we use git lfs to manage these large files. If you use git clone to download the whole project, these large files are not included so you will get error when you load them. You have to download them through one of these two following ways:

    1. Install git lfs first and use command git lfs pull to download the large files.
    2. Open the file in github website and click the download button to download the large files directly.
  • result/ contains the detailed results of five-fold cross-validation (summarized in the sheets of result_5fold_sentiment.xlsx and result_5fold_emotion.xlsx) instead of the mean performance shown in the paper. In addition, for each dataset, we show the predicted labels for all folds. In each result file, the first column is the text, the second column is the predicted label, and the third column is the ground truth label.

Running SEntiMoji

  1. We assume that you're using Python 3.6 with pip installed. As a backend you need to install either Theano (version 0.9+) or Tensorflow (version 1.3+). For the installation of depedencies, open the command line and run: pip install -r requirements.txt

  2. In order to train a sentiment classifer or emotion detector based on SEntiMoji (or the variants of SEntiMoji) model, you can run the scripts in the code/SEntiMoji_script directory.

  • Train model on provided benchmark datasets.

    • For sentiment classification task, you have to specify the pretrained model name, task and dataset name in command line. For example, if you want to train and evaluate the classifier on the Jira dataset using the SEntiMoji representation model, just run:python pipeline.py --model SEntiMoji --task sentiment --benchmark_dataset_name Jira.

    • For emotion detection task, you have to specify the pretrained model name, task, dataset name and emotion type in command line. For example, if you want to train and evaluate the classifier on the Jira LOVE dataset using the SEntiMoji representation model, just run: python pipeline.py --model SEntiMoji --task emotion --benchmark_dataset_name Jira --emotion_type love.

  • Train model on your own dataset.

    • Your train data file should contain two columns separated by \t, one is for text and the other is for class label. You should create a new folder to place the train data file in.
    • For training, you have to specify the pretrained model name, task, directory of data and filename of data in your command line. For example, if you save train data in ./data/train.txt and you want to train and evalute and classifier using SEntiMoji representation model, run command: python pipeline.py --use_own_dataset --model SEntiMoji --task sentiment --own_dataset_dir ./data/ --own_dataset_file train.txt

If you want to try another model or dataset, just change the arguments of the command line. Use command python pipeline.py --help to see the detailed decriptions for command line arguments.

  1. In order to do classification using trained model, you can run the scripts in the code/SEntiMoji_script directory. You have to specify path of trained model, path of your test data, number of classes and the name of pretrained model you used for training. Just run command: python classify.py --model_path path_of_obtained_model --test_file_path path_of_test_file --nb_classes number_of_classes --pretrained_model {SEntiMoji,SEntiMoji-T,SEntiMoji-G}

⚠️ Please notice that the number of classes and the name of pretrained model should be the same as the setting of training. For example, if pretrained model you used in training is SEntiMoji and train data you used is for binary classification, you should set pretrained_model=SEntiMoji and nb_classes=2.

  1. If you want to perform McNemar’s Test to compare the results of two classifiers, you can run Mtest.py in code/ directory. You have to specify the method name, dataset name and task name in the command line argument.
  • For sentiment classification task: For example, if you want to do mcnemar's test for the result of SEntiMoji and SEntiMoji-T on Jira dataset, run: python Mtest.py --methodA SEntiMoji --methodB SEntiMoji-T --dataset Jira --task sentiment.
  • For emotion detection task: For example, if you want to do mcnemar's test for the result of SEntiMoji and SEntiMoji-T on Jira LOVE dataset, run: python Mtest.py --methodA SEntiMoji --methodB SEntiMoji-T --dataset Jira --task emotion --emotion_type love.

If you want to try another model or dataset, just change the arguments of the command line. Use command python Mtest.py --help to see the detailed decriptions for command line arguments.

Declaration

  1. We upload all the benchmark datasets to this repository for convenience. As they were not generated and released by us, we do not claim any rights on them. If you use any of them, please make sure you fulfill the licenses that they were released with and consider citing the original papers. The scripts of baseline methods (SentiStrength, SentiStrength-SE, SentiCR, Senti4SD, EmoTxt, DEVA) are not included in this repository. You can turn to their homepage for downloading.

  2. The large-scale Tweets used to train DeepMoji are not released by Felbo et al. due to licensing restrictions. Therefore, we include the pre-trained DeepMoji released rather than the raw Tweet corpus in this repository.

  3. The large-scale GitHub data are collected by Lu et al. and not released publicly. After obtain their consent, in this repository, we release only the processed emoji-texts used to train our model, to increase reproducibility and replicability.

License

This code and the pretrained model is licensed under the MIT license (https://mit-license.org).

Citation

Please consider citing the following paper when using our code or pretrained models for your application.

@inproceedings{chencao2019,
  title={SEntiMoji: An Emoji-Powered Learning Approach for Sentiment Analysis in Software Engineering},
  author={Zhenpeng Chen and Yanbin Cao and Xuan Lu and Qiaozhu Mei and Xuanzhe Liu},
  booktitle={Proceedings of the 2019 ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE'19},
  year={2019}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].