All Projects → monologg → GoEmotions-pytorch

monologg / GoEmotions-pytorch

Licence: Apache-2.0 license
Pytorch Implementation of GoEmotions 😍😢😱

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to GoEmotions-pytorch

question generator
An NLP system for generating reading comprehension questions
Stars: ✭ 188 (+97.89%)
Mutual labels:  transformers, bert
wechsel
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.
Stars: ✭ 39 (-58.95%)
Mutual labels:  transformers, bert
oreilly-bert-nlp
This repository contains code for the O'Reilly Live Online Training for BERT
Stars: ✭ 19 (-80%)
Mutual labels:  transformers, bert
KoELECTRA-Pipeline
Transformers Pipeline with KoELECTRA
Stars: ✭ 37 (-61.05%)
Mutual labels:  pipeline, transformers
label-studio-transformers
Label data using HuggingFace's transformers and automatically get a prediction service
Stars: ✭ 117 (+23.16%)
Mutual labels:  transformers, bert
gpl
Powerful unsupervised domain adaptation method for dense retrieval. Requires only unlabeled corpus and yields massive improvement: "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval" https://arxiv.org/abs/2112.07577
Stars: ✭ 216 (+127.37%)
Mutual labels:  transformers, bert
banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chap…
Stars: ✭ 186 (+95.79%)
Mutual labels:  bert, emotion-classification
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+2452.63%)
Mutual labels:  transformers, bert
Text-Summarization
Abstractive and Extractive Text summarization using Transformers.
Stars: ✭ 38 (-60%)
Mutual labels:  transformers, bert
anonymisation
Anonymization of legal cases (Fr) based on Flair embeddings
Stars: ✭ 85 (-10.53%)
Mutual labels:  transformers, bert
Nlp Architect
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
Stars: ✭ 2,768 (+2813.68%)
Mutual labels:  transformers, bert
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (-41.05%)
Mutual labels:  transformers, bert
Pytorch Sentiment Analysis
Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
Stars: ✭ 3,209 (+3277.89%)
Mutual labels:  transformers, bert
Pytorch-NLU
Pytorch-NLU,一个中文文本分类、序列标注工具包,支持中文长文本、短文本的多类、多标签分类任务,支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…
Stars: ✭ 151 (+58.95%)
Mutual labels:  transformers, bert
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+2550.53%)
Mutual labels:  transformers, bert
backprop
Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.
Stars: ✭ 229 (+141.05%)
Mutual labels:  transformers, bert
Haystack
🔍 Haystack is an open source NLP framework that leverages Transformer models. It enables developers to implement production-ready neural search, question answering, semantic document search and summarization for a wide range of applications.
Stars: ✭ 3,409 (+3488.42%)
Mutual labels:  transformers, bert
Fast Bert
Super easy library for BERT based NLP models
Stars: ✭ 1,678 (+1666.32%)
Mutual labels:  transformers, bert
Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
Stars: ✭ 2,828 (+2876.84%)
Mutual labels:  transformers, bert
classifier multi label
multi-label,classifier,text classification,多标签文本分类,文本分类,BERT,ALBERT,multi-label-classification
Stars: ✭ 127 (+33.68%)
Mutual labels:  multi-label-classification, bert

GoEmotions Pytorch

Pytorch Implementation of GoEmotions with Huggingface Transformers

What is GoEmotions

Dataset labeled 58000 Reddit comments with 28 emotions

  • admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise + neutral

Training Details

  • Use bert-base-cased (Same as the paper's code)

  • In paper, 3 Taxonomies were used. I've also made the data with new taxonomy labels for hierarchical grouping and ekman.

    1. Original GoEmotions (27 emotions + neutral)
    2. Hierarchical Grouping (positive, negative, ambiguous + neutral)
    3. Ekman (anger, disgust, fear, joy, sadness, surprise + neutral)

Vocabulary

  • I've replace [unused1], [unused2] to [NAME], [RELIGION] in the vocab, respectively.
[PAD]
[NAME]
[RELIGION]
[unused3]
[unused4]
...
  • I've also set special_tokens_map.json as below, so the tokenizer won't split the [NAME] or [RELIGION] into its word pieces.
{
  "unk_token": "[UNK]",
  "sep_token": "[SEP]",
  "pad_token": "[PAD]",
  "cls_token": "[CLS]",
  "mask_token": "[MASK]",
  "additional_special_tokens": ["[NAME]", "[RELIGION]"]
}

Requirements

  • torch==1.4.0
  • transformers==2.11.0
  • attrdict==2.0.1

Hyperparameters

You can change the parameters from the json files in config directory.

Parameter
Learning rate 5e-5
Warmup proportion 0.1
Epochs 10
Max Seq Length 50
Batch size 16

How to Run

For taxonomy, choose original, group or ekman

$ python3 run_goemotions.py --taxonomy {$TAXONOMY}

$ python3 run_goemotions.py --taxonomy original
$ python3 run_goemotions.py --taxonomy group
$ python3 run_goemotions.py --taxonomy ekman

Results

Best Result of Macro F1

Macro F1 (%) Dev Test
original 50.16 50.30
group 69.41 70.06
ekman 62.59 62.38

Pipeline

  • Inference for multi-label classification was made possible by creating a new MultiLabelPipeline class.
  • Already uploaded finetuned model on Huggingface S3.
    • Original GoEmotions Taxonomy: monologg/bert-base-cased-goemotions-original
    • Hierarchical Group Taxonomy: monologg/bert-base-cased-goemotions-group
    • Ekman Taxonomy: monologg/bert-base-cased-goemotions-ekman

1. Original GoEmotions Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-original")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-original")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "it’s happened before?! love my hometown of beautiful new ken 😂😂",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
 [{'labels': ['neutral'], 'scores': [0.9750906]},
 {'labels': ['curiosity', 'love'], 'scores': [0.9694574, 0.9227462]},
 {'labels': ['love'], 'scores': [0.993483]},
 {'labels': ['anger'], 'scores': [0.99225825]}]

2. Group Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-group")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-group")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "it’s happened before?! love my hometown of beautiful new ken 😂😂",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
[{'labels': ['positive'], 'scores': [0.9989434]},
 {'labels': ['ambiguous', 'positive'], 'scores': [0.99801123, 0.99845874]},
 {'labels': ['positive'], 'scores': [0.99930394]},
 {'labels': ['negative'], 'scores': [0.9984231]}]

3. Ekman Taxonomy

from transformers import BertTokenizer
from model import BertForMultiLabelClassification
from multilabel_pipeline import MultiLabelPipeline
from pprint import pprint

tokenizer = BertTokenizer.from_pretrained("monologg/bert-base-cased-goemotions-ekman")
model = BertForMultiLabelClassification.from_pretrained("monologg/bert-base-cased-goemotions-ekman")

goemotions = MultiLabelPipeline(
    model=model,
    tokenizer=tokenizer,
    threshold=0.3
)

texts = [
    "Hey that's a thought! Maybe we need [NAME] to be the celebrity vaccine endorsement!",
    "it’s happened before?! love my hometown of beautiful new ken 😂😂",
    "I love you, brother.",
    "Troll, bro. They know they're saying stupid shit. The motherfucker does nothing but stink up libertarian subs talking shit",
]

pprint(goemotions(texts))

# Output
 [{'labels': ['joy', 'neutral'], 'scores': [0.30459446, 0.9217335]},
 {'labels': ['joy', 'surprise'], 'scores': [0.9981395, 0.99863845]},
 {'labels': ['joy'], 'scores': [0.99910116]},
 {'labels': ['anger'], 'scores': [0.9984291]}]

Reference

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].