All Projects → sebastianruder → Emotion_proposition_store

sebastianruder / Emotion_proposition_store

Construction and Analysis of an Emotion Proposition Store

Labels

#Construction and Analysis of an Emotion Proposition Store

This is the respository for my Bachelor's thesis dealing with the construction and analysis of an Emotion Proposition Store. This project includes the following contributions:

  • Designing and evaluating patterns that are frequent and clearly associated with an emotion. These patterns can be used as-is to extract tuples of emotion holders and causes from the web as well as from special domain corpora.
  • Acquiring more than 1,700,000 propositions from the Annotated Gigaword news corpus using these patterns, filtering, and generalizing them by employing co-reference resolution and named-entity recognition (NER). These propositions contain information about the emotion, the emotion holder, and the cause of said emotion.
  • Storing these propositions in an emotion proposition store, which we make available to the research community.
  • Analysing and evaluating them to gain further understanding about emotions in news text as well as the capabilities of the resource. Distributional analysis allows us to determine ambiguous concepts as well as single-word and compound expressions that are highly associated with an emotion. Through topic modelling, we explore underlying themes that are associated with certain emotions or shared between different ones.

##Structure of this repository

This repository is organized as follows:

  • NRC-Emotion-Lexicon-v0.92: The NRC Word-Emotion Association Lexicon by Saif Mohammad.
  • R: R code to generate summaries for agreement with the NRC Emotion Lexicon.
  • anno_gigaword: The forked Annotated Gigaword Java API by Courtney Napoles, Matthew Gormley, and Benjamin Van Durme.
  • annotation: The annotated files of the two annotation tasks. patterns_annotated contains the pattern annotation, while bigrams_annotated contains the annotation of the bigrams.
  • dependencies: The JCommon and JFreeChart libraries used for generating charts.
  • mallet: The pseudo-documents and topic models generated using MALLET.
  • emotion_word_sources: Related work that was used as a source for the patterns.
  • out: The directory of the results.
    • Emotion proposition store: The extracted propositions in shelves of 100,000 lines. They have the following format: ID \t emotion \t pattern \t emotion holder \t NP cause \t S cause subject \t S cause predicate \t S cause object \t S cause prepositional objects \t cause bag-of-words.
    • Patterns: The pattern templates and the regular expressions.
    • Scores: The lists of unigrams and bigrams ranked by point-wise mutual information (PMI) or chi-square for Plutchik's eight emotions, sorted by source (emotion holder, NP cause, S cause subject + predicate, S cause predicate + object). These can be used as an emotion lexicon.
    • Sentences: The extracted propositions along with the sentences that they were extracted from in chunks of 100,000 lines.
    • Stats: Statistics about the patterns and the extracted propositions.
  • src: The source directory.
  • thesis: The directory for the thesis LaTeX project.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].