All Projects → laugustyniak → Awesome Sentiment Analysis

laugustyniak / Awesome Sentiment Analysis

Repository with all what is necessary for sentiment analysis and related areas

Projects that are alternatives of or similar to Awesome Sentiment Analysis

Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (-22%)
Mutual labels:  sentiment-analysis, nlp-machine-learning, text-mining, text-analysis
Orange3 Text
🍊 📄 Text Mining add-on for Orange3
Stars: ✭ 83 (-81.92%)
Mutual labels:  sentiment-analysis, text-mining, text-analysis
NTUA-slp-nlp
💻Speech and Natural Language Processing (SLP & NLP) Lab Assignments for ECE NTUA
Stars: ✭ 19 (-95.86%)
Mutual labels:  sentiment-analysis, nlp-machine-learning, sentiment-classification
SentimentAnalysis
Sentiment Analysis: Deep Bi-LSTM+attention model
Stars: ✭ 32 (-93.03%)
Mutual labels:  sentiment-analysis, nlp-machine-learning, sentiment-classification
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-72.98%)
Mutual labels:  sentiment-analysis, nlp-machine-learning, sentiment-classification
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-97.39%)
Mutual labels:  text-mining, sentiment-analysis, text-analysis
brand-sentiment-analysis
Scripts utilizing Heartex platform to build brand sentiment analysis from the news
Stars: ✭ 21 (-95.42%)
Mutual labels:  sentiment-analysis, nlp-machine-learning, sentiment-classification
wink-sentiment
Accurate and fast sentiment scoring of phrases with #hashtags, emoticons :) & emojis 🎉
Stars: ✭ 51 (-88.89%)
Mutual labels:  sentiment-analysis, sentiment-classification
sensim
Sentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-96.73%)
Mutual labels:  text-mining, nlp-machine-learning
LSX
A word embeddings-based semi-supervised model for document scaling
Stars: ✭ 42 (-90.85%)
Mutual labels:  sentiment-analysis, text-analysis
support-tickets-classification
This case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-69.06%)
Mutual labels:  text-mining, text-analysis
DaDengAndHisPython
【微信公众号:大邓和他的python】, Python语法快速入门https://www.bilibili.com/video/av44384851 Python网络爬虫快速入门https://www.bilibili.com/video/av72010301, 我的联系邮箱[email protected]
Stars: ✭ 59 (-87.15%)
Mutual labels:  text-mining, text-analysis
kwx
BERT, LDA, and TFIDF based keyword extraction in Python
Stars: ✭ 33 (-92.81%)
Mutual labels:  text-mining, text-analysis
NewsMTSC
Target-dependent sentiment classification in news articles reporting on political events. Includes a high-quality data set of over 11k sentences and a state-of-the-art classification model.
Stars: ✭ 54 (-88.24%)
Mutual labels:  sentiment-analysis, sentiment-classification
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-95.42%)
Mutual labels:  text-mining, sentiment-analysis
Artificial Adversary
🗣️ Tool to generate adversarial text examples and test machine learning models against them
Stars: ✭ 348 (-24.18%)
Mutual labels:  text-mining, text-analysis
billboard
🎤 Lyrics/associated NLP data for Billboard's Top 100, 1950-2015.
Stars: ✭ 53 (-88.45%)
Mutual labels:  sentiment-analysis, sentiment-classification
hashformers
Hashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-96.08%)
Mutual labels:  sentiment-analysis, sentiment-classification
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (-85.62%)
Mutual labels:  sentiment-analysis, sentiment-classification
Customer satisfaction analysis
基于在线民宿 UGC 数据的意见挖掘项目,包含数据挖掘和NLP 相关的处理,负责数据采集、主题抽取、情感分析等任务。目的是克服用户打分和评论不一致,实时对在线民宿的满意度评测,包含在线评论采集和情感可视化分析。搭建了百度地图POI查询入口,可以进行自动化的批量查询 POI 信息的功能;构建了基于在线民宿语料的 LDA 自动主题聚类模型,利用主题中心词能找出对应的主题属性字典;以用户打分作为标注,然后 litNlp 自带的字符级 TextCNN 进行情感分析,将情感分类概率分布作为情感趋势,最后通过 POI 热力图的方式对不同地域的民宿满意度进行展示。软件版本请见链接。
Stars: ✭ 262 (-42.92%)
Mutual labels:  sentiment-analysis, nlp-machine-learning

Awesome Sentiment Analysis

A curated list of awesome sentiment analysis frameworks, libraries, software (by language), and of course academic papers and methods. In addition NLP lib useful in sentiment analysis. Inspired by awesome-machine-learning.

If you want to contribute to this list (please do), send me a pull request or contact me @luk_augustyniak

Table of Contents

Libraries

  • Python, Textlytics - set of sentiment analysis examples based on Amazon Data, SemEval, IMDB etc.

  • Java, Polish Sentiment Model - Sentiment analysis for polish language using SVM and BoW - within Docker.

  • Python, Spacy - Industrial-Strength Natural Language Processing in Python, one of the best and the fastest libs for NLP. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. Independent research has confirmed that spaCy is the fastest in the world. If your application needs to process entire web dumps, spaCy is the library you want to be using.

  • Python, TextBlob - TextBlob allows you to specify which algorithms you want to use under the hood of its simple API.

  • Python, pattern - The pattern.en module contains a fast part-of-speech tagger for English (identifies nouns, adjectives, verbs, etc. in a sentence), sentiment analysis, tools for English verb conjugation and noun singularization & pluralization, and a WordNet interface.

  • Java, CoreNLP by Stanford - NLP toolkit with Deeply Moving: Deep Learning for Sentiment Analysis.

  • R, TM - R text mining module including tm.plugin.sentiment.

  • Software, GATE - GATE is open source software capable of solving almost any text processing problem.

  • Java, LingPipe - LingPipe is tool kit for processing text using computational linguistics.

  • Python, NLTK - Natural Language Toolkit.

  • C++, MITIE - MIT Information Extraction.

  • Software, KNIME - KNIME® Analytics Platform is the leading open solution for data-driven innovation, helping you discover the potential hidden in your data, mine for fresh insights, or predict new futures. Our enterprise-grade, open source platform is fast to deploy, easy to scale and intuitive to learn. With more than 1000 modules, hundreds of ready-to-run examples, a comprehensive range of integrated tools, and the widest choice of advanced algorithms available, KNIME Analytics Platform is the perfect toolbox for any data scientist. Our steady course on unrestricted open source is your passport to a global community of data scientists, their expertise, and their active contributions.

  • Software, RapidMiner - software capable of solving almost any text processing problem. processing text using computational linguistics.

  • JAVA, OpenNLP - The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.

  • Dragon Sentiment Classifier C# - Dragon Sentiment API is a C# implementation of the Naive Bayes Sentiment Classifier to analyze the sentiment of a text corpus.

  • sentiment: Tools for Sentiment Analysis in R - sentiment is an R package with tools for sentiment analysis including bayesian classifiers for positivity/negativity and emotion classification.

  • ASUM Java - Aspect and Sentiment Unification Model for Online Review Analysis.

  • AFINN-based sentiment analysis for Node.js - Sentiment is a Node.js module that uses the AFINN-165 wordlist and Emoji Sentiment Ranking to perform sentiment analysis on arbitrary blocks of input text.

  • SentiMental - Putting the Mental in Sentimental in js - Sentiment analysis tool for node.js based on the AFINN-111 wordlist. Version 1.0 introduces performance improvements making it both the first, and now fastest, AFINN backed Sentiment Analysis tool for node.

Aspect-based Sentiment Analysis

Lexicons, Datasets, Word embeddings, and other resources

Lexicons:

  • Multidomain Sentiment Lexicons - lexicons from 10 domains based on Amazon Product Dataset extracted using method described in paper and used in paper.

  • AFINN - AFINN is a list of English words rated for valence with an integer between minus five (negative) and plus five (positive). The words have been manually labeled by Finn Årup Nielsen in 2009-2011.

  • SentiWordNet paper - Lexical resource based on WordNet

  • SentiWords - Collection of 155,000 English words with a sentiment score included between -1 and 1. Words are in the form lemma#PoS and are aligned with WordNet lists that include adjectives, nouns, verbs and adverbs.

  • SenticNet API - Words with a sentiment score included between -1 and 1.

  • WordStat - Context-specific sentiment analysis dictionary with categories Negative, Positive, Uncertainty, Litigiousness and Modal. This dataset is inspired from two papers, written by Loughran and McDonald (2011) and Young and Soroka (2011).

  • MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon - The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is a list of subjectivity clues that is part of OpinionFinder and also helps to determine text polarity.

  • NRC-Canada Lexcicons - the web page lists various word association lexicons that capture word-sentiment, word-emotion, and word-colour associations.

  • Sentiment140 - One of the NRC-Canada team lexicon - the Sentiment140 Lexicon is a list of words and their associations with positive and negative sentiment. The lexicon is provides sentiment score for unigrams, bigrams and unigram-bigram pairs.

  • MSOL - Macquarie Semantic Orientation Lexicon.

  • SemEval-2015 English Twitter Sentiment Lexicon - The lexicon was used as an official test set in the SemEval-2015 shared Task #10: Subtask E. The phrases in this lexicon include at least one of these negators.

  • SemEval-2016 Arabic Twitter Sentiment Lexicon - The lexicon was used as an official test set in the SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases. The phrases in this lexicon include at least one of these negators.

  • SemEval-2016 English Twitter Mixed Polarity Lexicon - This SCL, referred to as the Sentiment Composition Lexicon of Opposing Polarity Phrases (SCL-OPP), includes phrases that have at least one positive and at least one negative word—for example, phrases such as happy accident, best winter break, couldn’t stop smiling, and lazy sundays. We refer to such phrases as opposing polarity phrases. SCL-OPP has 265 trigrams, 311 bigrams, and 602 unigrams annotated with real-valued sentiment association scores through Best-Worst scaling (aka MaxDiff).

  • SemEval-2016 General English Sentiment Modifiers Lexicon - Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). Negators, modals, and degree adverbs can significantly affect the sentiment of the words they modify. We manually annotate a set of phrases that include negators (such as no and cannot), modals (such as would have been and could), degree adverbs (such as quite and less), and their combinations. Both the phrases and their constituent content words are annotated with real-valued scores of sentiment intensity using the technique Best–Worst Scaling (aka MaxDiff), which provides reliable annotations. We refer to the resulting lexicon as Sentiment Composition Lexicon of Negators, Modals, and Adverbs (SCL-NMA). The lexicon was used as an official test set in the SemEval-2016 shared Task #7: Detecting Sentiment Intensity of English and Arabic Phrases. The objective of that task was to automatically predict sentiment intensity scores for multi-word phrases.

  • The NRC Valence, Arousal, and Dominance Lexcion - The NRC Valence, Arousal, and Dominance (VAD) Lexicon includes a list of more than 20,000 English words and their valence, arousal, and dominance scores. For a given word and a dimension (V/A/D), the scores range from 0 (lowest V/A/D) to 1 (highest V/A/D). The lexicon with its fine-grained real-valued scores was created by manual annotation using Best--Worst Scaling.

Datasets:

  • Stanford Sentiment Treebank paper - Sentiment dataset with fine-grained sentiment annotations. The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. In their work on sentiment treebanks, Socher et al. used Amazon's Mechanical Turk to create fine-grained labels for all parsed phrases in the corpus. This competition presents a chance to benchmark your sentiment-analysis ideas on the Rotten Tomatoes dataset. You are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.

  • Amazon product dataset - This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs). The updated version of dataset - update as for 2018 is availalbe here https://nijianmo.github.io/amazon/index.html.

  • IMDB movies reviews dataset - This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. Authors provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing.

  • Sentiment Labelled Sentences Data Set The dataset contains sentences labelled with positive or negative sentiment. This dataset was created for the Paper From Group to Individual Labels using Deep Features, Kotzias et. al,. KDD 2015. It contains sentences labelled with positive or negative sentiment. Score is either 1 (for positive) or 0 (for negative) The sentences come from three different websites/fields: imdb.com, amazon .com, yelp.com. For each website, there exist 500 positive and 500 negative sentences. Those were selected randomly for larger datasets of reviews. We attempted to select sentences that have a clearly positive or negative connotaton, the goal was for no neutral sentences to be selected.

  • sentic.net - concept-level sentiment analysis, that is, performing tasks such as polarity detection and emotion recognition by leveraging on semantics and linguistics in stead of solely relying on word co-occurrence frequencies.

Word Embeddings:

  • WordNet2Vec - Corpora Agnostic Word Vectorization Method based on WordNet.

  • GloVe paper - Algorithm for obtaining word vectors. Pretrained word vectors available for download.

  • Word2Vec by Mikolov paper - Google's original code and pretrained word embeddings.

  • Word2Vec Python lib - Google's word2vec reimplementation written in Python (cython). There are also doc2vec and topic modelling method.

  • WN-Affect emotion lexicon - WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Similarly to our method for domain labels, we assigned to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses.

  • EmoLex NRC Word-Emotion Association Lexicon - the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.

SemEval Challenges International Workshop on Semantic Evaluation web:

  • SemEval2014

  • SemEval2015

  • SemEval2016

  • SemEval2017

  • SemEval2018 New challenge for 2018 year, waiting for confirmation about tasks etc.

  • WN-Affect emotion lexicon - WordNet-Affect is an extension of WordNet Domains, including a subset of synsets suitable to represent affective concepts correlated with affective words. Similarly to our method for domain labels, we assigned to a number of WordNet synsets one or more affective labels (a-labels). In particular, the affective concepts representing emotional state are individuated by synsets marked with the a-label emotion. There are also other a-labels for those concepts representing moods, situations eliciting emotions, or emotional responses.

  • EmoLex NRC Word-Emotion Association Lexicon - the NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.

  • Multidimensional Stance Lexicon - A Multidimensional Lexicon for Interpersonal Stancetaking. Pavalanathan, Fitzpatrick, Kiesling, and Eisenstein. ACL 2017.

Tutorials

  • SAS2015 iPython Notebook brief introduction to Sentiment Analysis in Python @ Sentiment Analysis Symposium 2015. Scikit-learn + BoW + SemEval Data.

  • LingPipe Sentiment - This tutorial covers assigning sentiment to movie reviews using language models. There are many other approaches to sentiment. One we use fairly often is sentence based sentiment with a logistic regression classifier. Contact us if you need more information. For movie reviews we focus on two types of classification problem: Subjective (opinion) vs. Objective (fact) sentences Positive (favorable) vs. Negative (unfavorable) movie reviews

  • Stanford's cs224d lectures on Deep Learning for Natural Language Processing - course provided by Richard Socher.

Papers, books

Multimodal sentiment analysis:

Demos

API

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].