Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jingyuanz → protonet-bert-text-classification

jingyuanz / protonet-bert-text-classification

Licence: other

finetune bert for small dataset text classification in a few-shot learning manner using ProtoNet

Programming Languages

139335 projects - #7 most used programming language

77523 projects

Labels

nlp text-classification bert small-dataset protonet few-shot-learning

Projects that are alternatives of or similar to protonet-bert-text-classification

[ACM-WSDM] 3rd place solution at WSDM Cup 2019, Fake News Classification on Kaggle.

Stars: ✭ 62 (+121.43%)

Mutual labels: text-classification, bert

Text and Audio classification with Bert

Text Classification in Turkish Texts with Bert

Stars: ✭ 34 (+21.43%)

Mutual labels: text-classification, bert

Text preprocessing, representation, similarity calculation, text search and classification. Let's go and play with text!

Stars: ✭ 33 (+17.86%)

Mutual labels: text-classification, bert

COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers

Rank 1 / 216

Stars: ✭ 24 (-14.29%)

Mutual labels: text-classification, bert

BERT, LDA, and TFIDF based keyword extraction in Python

Stars: ✭ 33 (+17.86%)

Mutual labels: text-classification, bert

Pytorch-NLU，一个中文文本分类、序列标注工具包，支持中文长文本、短文本的多类、多标签分类任务，支持中文命名实体识别、词性标注、分词等序列标注任务。 Ptorch NLU, a Chinese text classification and sequence annotation toolkit, supports multi class and multi label classification tasks of Chinese long text and short text, and supports sequence annotation tasks such as Chinese named entity recognition, part of speech ta…

Stars: ✭ 151 (+439.29%)

Mutual labels: text-classification, bert

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Stars: ✭ 166 (+492.86%)

Mutual labels: text-classification, bert

Kevinpro-NLP-demo

All NLP you Need Here. 个人实现了一些好玩的NLP demo，目前包含13个NLP应用的pytorch实现

Stars: ✭ 117 (+317.86%)

Mutual labels: text-classification, bert

policy-data-analyzer

Building a model to recognize incentives for landscape restoration in environmental policies from Latin America, the US and India. Bringing NLP to the world of policy analysis through an extensible framework that includes scraping, preprocessing, active learning and text analysis pipelines.

Stars: ✭ 22 (-21.43%)

Mutual labels: text-classification, bert

Filipino-Text-Benchmarks

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Stars: ✭ 22 (-21.43%)

Mutual labels: text-classification, bert

Weakly supervised medical named entity classification

Stars: ✭ 55 (+96.43%)

Mutual labels: text-classification, bert

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+23671.43%)

Mutual labels: text-classification, bert

classifier multi label

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification

Stars: ✭ 127 (+353.57%)

Mutual labels: text-classification, bert

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (+114.29%)

Mutual labels: text-classification, bert

classifier multi label seq2seq attention

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Stars: ✭ 26 (-7.14%)

Mutual labels: text-classification, bert

A PyTorch-based toolkit for natural language processing

Stars: ✭ 85 (+203.57%)

Mutual labels: text-classification, bert

BERT-chinese-text-classification-pytorch

This repo contains a PyTorch implementation of a pretrained BERT model for text classification.

Stars: ✭ 92 (+228.57%)

Mutual labels: text-classification, bert

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+717.86%)

Mutual labels: text-classification, bert

Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT

Stars: ✭ 15 (-46.43%)

Mutual labels: text-classification, bert

State of the Art Natural Language Processing

Stars: ✭ 2,518 (+8892.86%)

Mutual labels: text-classification, bert

View All Similar Projects ➔

Introduction This project targets problems of insufficient data in text classification tasks. By using some few-shot learning tricks (ProtoNet, etc.), performance on tasks sees improvement, and has potential to furthur improve, but the convergence speed for ProtoNet+bert is much slower than normal Bert finetuning, and GPU memory is also a key limitation on its improvement (cannot set large number of supports at evaluation time, #TODO to fix this in the future)

*文本小样本多分类模型(仅测试过短文本), 目前用bert初始化, 可换用sentence-bert做初始化, 效果更佳

Classification Models

ProtoNet+Bert (optimized for fewshot, can achieve better performance on some small dataset)
Ordinary Bert classification (for normal dataset, also works for fewshot thanks to the strength of BERT pretraining)
A Mysterious Algorithm from my colleague (optimized for matching tasks, do not train this for normal classification tasks, just for experimental purporse, just for fun)

Usage:

put your data into ./data folder
write your own script (or use some pre-given function in data_formatter.py) to format your training/evaluation data into "sentence and its label separated by tab" per line
modify configuration in conf/config.py under the Config class for your chosen model,
- Mandatory settings:
  - for Bert classifier: set number of classes and max sentence length,
  - for ProtoNet: set "k" and "shot", k must be between 20% to 100% of total number of classes, shot commonly between 2 and 10 depending on datasize
- Optional settings:
  - for Bert classifier: batch_size
  - for ProtoNet: n_support, eval_n_support (number of supporting samples for each class, read the paper on ProtoNet for more details), you can just leave them unchanged, the bigger the better, but may exceeds GPU memory limits, especially at evaluation time, when number of classes is big.
  - general settings: learning rate, warmup, paths to essential data/modelfiles, device, etc..
Alternatively, if you are sick of modifying the config file, or you want to train multiple models with different configs, you can just use <python scripts/api.py> directly, all kinds of settings can be re-defined here, overriding what's in config.py. type <python scripts/api.py -h> for more details.
choose to run from three shell script on your demand
predict with the other three shell scripts, don't forget to check all kinds of load paths before running

Requirements: pytorch, transformers, pytorch_pretrained_bert, keras, sklearn, etc..

Note: Recommended hyperparameters are left as they are in conf/config.py except those that are task specific. All experiments are using bert-chinese-base, not tested for other languages, but you can always try it (remember to change bert_type in config).

TODO:

support unlimited number of supports at evaluation/prediction time
support Meta-Learning
replace Euclidean distance with RE2 and BCEloss

-------------------------------------------------------------------------------------------------------:

	ProtoNet+Bert	Bert	Training size	Test size	Balanced	Class Count
Intent Classification (downsampled to 1%)	88.3%	87.4%	≈60*15	1333	True	15
Intent Classification (downsampled to 10%)	91.9%	91.7%	≈600*15	1333	True	15
Intent Classification	>93.7%(too slow to train)	94.6%(?)	≈6000*15	1333	True	15
Anonymous Dataset 1	87.8%	87.2%	3200	352	False	86
Anonymous Dataset 2	84.9%	84.3%	1300	434	False	20
Anonymous Dataset 3	88.1%	83.9%	5000	320	True	68

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 28

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗