All Projects → pysentimiento → pysentimiento

pysentimiento / pysentimiento

Licence: other
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to pysentimiento

hashformers
Hashformers is a framework for hashtag segmentation with transformers.
Stars: ✭ 18 (-93.43%)
Mutual labels:  sentiment-analysis, transformers
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-91.24%)
Mutual labels:  sentiment-analysis, transformers
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+818.98%)
Mutual labels:  sentiment-analysis, transformers
converse
Conversational text Analysis using various NLP techniques
Stars: ✭ 147 (-46.35%)
Mutual labels:  sentiment-analysis, transformers
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+1071.17%)
Mutual labels:  sentiment-analysis, transformers
Ask2Transformers
A Framework for Textual Entailment based Zero Shot text classification
Stars: ✭ 102 (-62.77%)
Mutual labels:  transformers
Chinese-Minority-PLM
CINO: Pre-trained Language Models for Chinese Minority (少数民族语言预训练模型)
Stars: ✭ 133 (-51.46%)
Mutual labels:  transformers
Introduction-to-Deep-Learning-and-Neural-Networks-Course
Code snippets and solutions for the Introduction to Deep Learning and Neural Networks Course hosted in educative.io
Stars: ✭ 33 (-87.96%)
Mutual labels:  transformers
ginza-transformers
Use custom tokenizers in spacy-transformers
Stars: ✭ 15 (-94.53%)
Mutual labels:  transformers
transformer generalization
The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We significantly improve the systematic generalization of transformer models on a variety of datasets using simple tricks and careful considerations.
Stars: ✭ 58 (-78.83%)
Mutual labels:  transformers
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (-16.42%)
Mutual labels:  transformers
code-transformer
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".
Stars: ✭ 130 (-52.55%)
Mutual labels:  transformers
COVID19-FeedbackApplication
A simple application is developed to get feedback from a user and analyzing the text to predict the sentiment.
Stars: ✭ 13 (-95.26%)
Mutual labels:  sentiment-analysis
RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Stars: ✭ 473 (+72.63%)
Mutual labels:  transformers
Twitter-Sentiment-Analysis
A Django App to perform Sentiment Analysis of Twitter Hashtags
Stars: ✭ 20 (-92.7%)
Mutual labels:  sentiment-analysis
german-sentiment
A data set and model for german sentiment classification.
Stars: ✭ 37 (-86.5%)
Mutual labels:  sentiment-analysis
SentimentVisionDemo
🌅 iOS11 demo application for visual sentiment prediction.
Stars: ✭ 34 (-87.59%)
Mutual labels:  sentiment-analysis
german-sentiment-lib
An easy to use python package for deep learning-based german sentiment classification.
Stars: ✭ 33 (-87.96%)
Mutual labels:  sentiment-analysis
ar-embeddings
Sentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec
Stars: ✭ 83 (-69.71%)
Mutual labels:  sentiment-analysis
awesome-huggingface
🤗 A list of wonderful open-source projects & applications integrated with Hugging Face libraries.
Stars: ✭ 436 (+59.12%)
Mutual labels:  transformers

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Tests

Test it in Colab

A Transformer-based library for SocialNLP tasks.

Currently supports:

  • Sentiment Analysis (Spanish, English)
  • Emotion Analysis (Spanish, English)
  • Hate Speech Detection (Spanish, English)
  • Named Entity Recognition (Spanish + English)
  • POS Tagging (Spanish + English)

Just do pip install pysentimiento and start using it:

Getting Started

from pysentimiento import create_analyzer
analyzer = create_analyzer(task="sentiment", lang="es")

analyzer.predict("Qué gran jugador es Messi")
# returns AnalyzerOutput(output=POS, probas={POS: 0.998, NEG: 0.002, NEU: 0.000})
analyzer.predict("Esto es pésimo")
# returns AnalyzerOutput(output=NEG, probas={NEG: 0.999, POS: 0.001, NEU: 0.000})
analyzer.predict("Qué es esto?")
# returns AnalyzerOutput(output=NEU, probas={NEU: 0.993, NEG: 0.005, POS: 0.002})

analyzer.predict("jejeje no te creo mucho")
# AnalyzerOutput(output=NEG, probas={NEG: 0.587, NEU: 0.408, POS: 0.005})
"""
Emotion Analysis in English
"""

emotion_analyzer = create_analyzer(task="emotion", lang="en")

emotion_analyzer.predict("yayyy")
# returns AnalyzerOutput(output=joy, probas={joy: 0.723, others: 0.198, surprise: 0.038, disgust: 0.011, sadness: 0.011, fear: 0.010, anger: 0.009})
emotion_analyzer.predict("fuck off")
# returns AnalyzerOutput(output=anger, probas={anger: 0.798, surprise: 0.055, fear: 0.040, disgust: 0.036, joy: 0.028, others: 0.023, sadness: 0.019})

"""
Hate Speech (misogyny & racism)
"""
hate_speech_analyzer = create_analyzer(task="hate_speech", lang="es")

hate_speech_analyzer.predict("Esto es una mierda pero no es odio")
# returns AnalyzerOutput(output=[], probas={hateful: 0.022, targeted: 0.009, aggressive: 0.018})
hate_speech_analyzer.predict("Esto es odio porque los inmigrantes deben ser aniquilados")
# returns AnalyzerOutput(output=['hateful'], probas={hateful: 0.835, targeted: 0.008, aggressive: 0.476})

hate_speech_analyzer.predict("Vaya guarra barata y de poca monta es XXXX!")
# returns AnalyzerOutput(output=['hateful', 'targeted', 'aggressive'], probas={hateful: 0.987, targeted: 0.978, aggressive: 0.969})

Also, you might use pretrained models directly with transformers library.

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("pysentimiento/robertuito-sentiment-analysis")

model = AutoModelForSequenceClassification.from_pretrained("pysentimiento/robertuito-sentiment-analysis")

Preprocessing

pysentimiento features a tweet preprocessor specially suited for tweet classification with transformer-based models.

from pysentimiento.preprocessing import preprocess_tweet

# Replaces user handles and URLs by special tokens
preprocess_tweet("@perezjotaeme debería cambiar esto http://bit.ly/sarasa") # "@usuario debería cambiar esto url"

# Shortens repeated characters
preprocess_tweet("no entiendo naaaaaaaadaaaaaaaa", shorten=2) # "no entiendo naadaa"

# Normalizes laughters
preprocess_tweet("jajajajaajjajaajajaja no lo puedo creer ajajaj") # "jaja no lo puedo creer jaja"

# Handles hashtags
preprocess_tweet("esto es #UnaGenialidad")
# "esto es una genialidad"

# Handles emojis
preprocess_tweet("🎉🎉", lang="en")
# 'emoji party popper emoji emoji party popper emoji'

Trained models so far

Check CLASSIFIERS.md for details on the reported performances of each model.

Instructions for developers

  1. Clone and install
git clone https://github.com/pysentimiento/pysentimiento
pip install poetry
poetry shell
poetry install
  1. Get the data and put it under data/

Open an issue or email us if you are not able to get the it.

  1. Run script to train models

Check TRAIN.md for further information on how to train your models

  1. Upload models to Huggingface's Model Hub

Check "Model sharing and upload" instructions in huggingface docs.

License

pysentimiento is an open-source library. However, please be aware that models are trained with third-party datasets and are subject to their respective licenses, many of which are for non-commercial use

  1. TASS Dataset license (License for Sentiment Analysis in Spanish, Emotion Analysis in Spanish & English)

  2. SEMEval 2017 Dataset license (Sentiment Analysis in English)

  3. LinCE Datasets (License for NER & POS tagging)

Suggestions and bugfixes

Please use the repository issue tracker to point out bugs and make suggestions (new models, use another datasets, some other languages, etc)

Citation

If you use pysentimiento in your work, please cite this paper

@misc{perez2021pysentimiento,
      title={pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks},
      author={Juan Manuel Pérez and Juan Carlos Giudici and Franco Luque},
      year={2021},
      eprint={2106.09462},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Also, pleace cite related pre-trained models and datasets for the specific models you use:

%%%%%%%%%%%%%%%%%%%%%%%%%%
% Pretrained models      %
%%%%%%%%%%%%%%%%%%%%%%%%%%
% RoBERTuito
@article{perez2021robertuito,
  title={RoBERTuito: a pre-trained language model for social media text in Spanish},
  author={P{\'e}rez, Juan Manuel and Furman, Dami{\'a}n A and Alemany, Laura Alonso and Luque, Franco},
  journal={arXiv preprint arXiv:2111.09453},
  year={2021}
}
% BETO
@article{canete2020spanish,
  title={Spanish pre-trained bert model and evaluation data},
  author={Canete, Jos{\'e} and Chaperon, Gabriel and Fuentes, Rodrigo and Ho, Jou-Hui and Kang, Hojin and P{\'e}rez, Jorge},
  journal={Pml4dc at iclr},
  volume={2020},
  pages={2020},
  year={2020}
}
% BERTweet
@inproceedings{nguyen2020bertweet,
  title={BERTweet: A pre-trained language model for English Tweets},
  author={Nguyen, Dat Quoc and Vu, Thanh and Nguyen, Anh Tuan},
  booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  pages={9--14},
  year={2020}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%
% Datasets               %
%%%%%%%%%%%%%%%%%%%%%%%%%%
% TASS 2020 (sentiment in Spanish)

@article{garcia2020overview,
  title={Overview of TASS 2020: introducing emotion detection},
  author={Garc{\'\i}a-Vegaa, Manuel and D{\'\i}az-Galianoa, Manuel Carlos and Garc{\'\i}a-Cumbrerasa, Miguel {\'A} and del Arcoa, Flor Miriam Plaza and Montejo-R{\'a}eza, Arturo and Jim{\'e}nez-Zafraa, Salud Mar{\'\i}a and C{\'a}marab, Eugenio Mart{\'\i}nez and Aguilarc, C{\'e}sar Antonio and Antonio, Marco and Cabezudod, Sobrevilla and others},
  year={2020}
}

% EmoEvent (Emotion Analysis Spanish & English)

@inproceedings{del2020emoevent,
  title={EmoEvent: A multilingual emotion corpus based on different events},
  author={del Arco, Flor Miriam Plaza and Strapparava, Carlo and Lopez, L Alfonso Urena and Mart{\'\i}n-Valdivia, M Teresa},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={1492--1498},
  year={2020}
}

% Hate Speech Detection (Spanish & English)


@inproceedings{hateval2019semeval,
  title={SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter},
  author={Basile, Valerio and Bosco, Cristina and Fersini, Elisabetta and Nozza, Debora and Patti, Viviana and Rangel, Francisco and Rosso, Paolo and Sanguinetti, Manuela},
  booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019)},
  year={2019},
  publisher= {Association for Computational Linguistics}
}
% Sentiment Analysis in English

@article{nakov2019semeval,
  title={SemEval-2016 task 4: Sentiment analysis in Twitter},
  author={Nakov, Preslav and Ritter, Alan and Rosenthal, Sara and Sebastiani, Fabrizio and Stoyanov, Veselin},
  journal={arXiv preprint arXiv:1912.01973},
  year={2019}
}

% LinCE (NER & POS Tagging)

@inproceedings{aguilar2020lince,
  title={LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation},
  author={Aguilar, Gustavo and Kar, Sudipta and Solorio, Thamar},
  booktitle={Proceedings of the 12th Language Resources and Evaluation Conference},
  pages={1803--1813},
  year={2020}
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].