crux82 / ganbert-pytorch

Licence: Apache-2.0 license

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Programming Languages

11667 projects

Projects that are alternatives of or similar to ganbert-pytorch

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-63.33%)

Mutual labels: text-classification, bert, huggingface

ganbert

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks

Stars: ✭ 205 (+241.67%)

Mutual labels: generative-adversarial-network, semi-supervised-learning, bert

TabFormer

Code & Data for "Tabular Transformers for Modeling Multivariate Time Series" (ICASSP, 2021)

Stars: ✭ 209 (+248.33%)

Mutual labels: bert, huggingface

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (+53.33%)

Mutual labels: text-classification, bert

seededlda

Semisupervided LDA for theory-driven text analysis

Stars: ✭ 46 (-23.33%)

Mutual labels: text-classification, semi-supervised-learning

protonet-bert-text-classification

finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet

Stars: ✭ 28 (-53.33%)

Mutual labels: text-classification, bert

DrFAQ

DrFAQ is a plug-and-play question answering NLP chatbot that can be generally applied to any organisation's text corpora.

Stars: ✭ 29 (-51.67%)

Mutual labels: bert, huggingface

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+281.67%)

Mutual labels: text-classification, bert

Spark Nlp

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+4096.67%)

Mutual labels: text-classification, bert

classifier multi label

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification

Stars: ✭ 127 (+111.67%)

Mutual labels: text-classification, bert

classifier multi label seq2seq attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Stars: ✭ 26 (-56.67%)

Mutual labels: text-classification, bert

trove

Weakly supervised medical named entity classification

Stars: ✭ 55 (-8.33%)

Mutual labels: text-classification, bert

Kashgari

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Stars: ✭ 2,235 (+3625%)

Mutual labels: text-classification, bert

Bible text gcn

Pytorch implementation of "Graph Convolutional Networks for Text Classification"

Stars: ✭ 90 (+50%)

Mutual labels: text-classification, semi-supervised-learning

ERNIE-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained ERNIE model for text classification.

Stars: ✭ 49 (-18.33%)

Mutual labels: text-classification, bert

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+10993.33%)

Mutual labels: text-classification, bert

parsbert-ner

🤗 ParsBERT Persian NER Tasks

Stars: ✭ 15 (-75%)

Mutual labels: bert, huggingface

kwx

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (-45%)

Mutual labels: text-classification, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+95%)

Mutual labels: text-classification, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-60%)

Mutual labels: text-classification, bert

View All Similar Projects ➔

GAN-BERT (in Pytorch and compatible with HuggingFace)

This is an implementation in Pytorch (and HuggingFace) of the GAN-BERT method from https://github.com/crux82/ganbert which is available in Tensorflow. While the original GAN-BERT was an extension of BERT, this implementation can be adapted to several architectures, ranging from Roberta to Albert!

IMPORTANT: Since this implementation is slightly different from the original Tensorflow one, some results may vary. Any feedback or suggestions for improving this first version would be appreciated.

GANBERT

This is the code for the paper "GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples" published in the ACL 2020 - short paper by Danilo Croce (Tor Vergata, University of Rome), Giuseppe Castellucci (Amazon) and Roberto Basili (Tor Vergata, University of Rome).

GAN-BERT is an extension of BERT which uses a Generative Adversarial setting to implement an effective semi-supervised learning schema. It allows training BERT with datasets composed of a limited amount of labeled examples and larger subsets of unlabeled material. GAN-BERT can be used in sequence classification tasks (also involving text pairs).

As in the original implementation in Tensorflow, this code runs the GAN-BERT experiment over the TREC dataset for the fine-grained Question Classification task. We provide in this package the code as well as the data for running an experiment by using 2% of the labeled material (109 examples) and 5343 unlabeled examples. The test set is composed of 500 annotated examples.

The Model

GAN-BERT is an extension of the BERT model within the Generative Adversarial Network (GAN) framework (Goodfellow et al, 2014). In particular, the Semi-Supervised GAN (Salimans et al, 2016) is used to make the BERT fine-tuning robust in such training scenarios where obtaining annotated material is problematic. When fine-tuned with very few labeled examples the BERT model is not able to provide sufficient performances. With GAN-BERT we extend the fine-tuning stage by introducing a Discriminator-Generator setting, where:

the Generator G is devoted to producing "fake" vector representations of sentences;
the Discriminator D is a BERT-based classifier over k+1 categories.

D has the role of classifying an example concerning the k categories of the task of interest, and it should recognize the examples that are generated by G (the k+1 category). G, instead, must produce representations as much similar as possible to the ones produced by the model for the "real" examples. G is penalized when D correctly classifies an example as fake.

In this context, the model is trained on both labeled and unlabeled examples. The labeled examples contribute to the computation of the loss function concerning the task k categories. The unlabeled examples contribute to the computation of the loss functions as they should not be incorrectly classified as belonging to the k+1 category (i.e., the fake category).

The resulting model is demonstrated to learn text classification tasks starting from very few labeled examples (50-60 examples) and to outperform the classical BERT fine-tuned models by a large margin in this setting.

More details are available at https://github.com/crux82/ganbert

Citation

If this software is usefull for your research, please cite the following paper:

@inproceedings{croce-etal-2020-gan,
    title = "{GAN}-{BERT}: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples",
    author = "Croce, Danilo  and
      Castellucci, Giuseppe  and
      Basili, Roberto",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.191",
    pages = "2114--2119"
}

Acknowledgments

We would like to thank Osman Mutlu and Ali Hürriyetoğlu for their implementation of GAN-BERT in Pytorch that inspired our porting. You can find their initial repository at this link. We would like to thank Claudia Breazzano (Tor Vergata, University of Rome) that supported this porting.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

crux82 / ganbert-pytorch

Programming Languages

Labels

Projects that are alternatives of or similar to ganbert-pytorch

GAN-BERT (in Pytorch and compatible with HuggingFace)

GANBERT

The Model

Citation

Acknowledgments