All Projects → CLUEbenchmark → Electra

CLUEbenchmark / Electra

中文 预训练 ELECTRA 模型: 基于对抗学习 pretrain Chinese Model

Projects that are alternatives of or similar to Electra

Optimus
Optimus: the first large-scale pre-trained VAE language model
Stars: ✭ 180 (+36.36%)
Mutual labels:  language-model, pretrained-models
PerceptualGAN
Pytorch implementation of Image Manipulation with Perceptual Discriminators paper
Stars: ✭ 119 (-9.85%)
Mutual labels:  gan, adversarial-networks
gap-text2sql
GAP-text2SQL: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training
Stars: ✭ 83 (-37.12%)
Mutual labels:  pretrained-models, language-model
PCPM
Presenting Collection of Pretrained Models. Links to pretrained models in NLP and voice.
Stars: ✭ 21 (-84.09%)
Mutual labels:  pretrained-models, language-model
Adversarialnetspapers
Awesome paper list with code about generative adversarial nets
Stars: ✭ 6,219 (+4611.36%)
Mutual labels:  gan, adversarial-networks
Awesome Sentence Embedding
A curated list of pretrained sentence and word embedding models
Stars: ✭ 1,973 (+1394.7%)
Mutual labels:  language-model, pretrained-models
open clip
An open source implementation of CLIP.
Stars: ✭ 1,534 (+1062.12%)
Mutual labels:  pretrained-models, language-model
Clue
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+1737.12%)
Mutual labels:  language-model, pretrained-models
Adversarial video generation
A TensorFlow Implementation of "Deep Multi-Scale Video Prediction Beyond Mean Square Error" by Mathieu, Couprie & LeCun.
Stars: ✭ 662 (+401.52%)
Mutual labels:  gan, adversarial-networks
All About The Gan
All About the GANs(Generative Adversarial Networks) - Summarized lists for GAN
Stars: ✭ 630 (+377.27%)
Mutual labels:  gan, adversarial-networks
Azureml Bert
End-to-End recipes for pre-training and fine-tuning BERT using Azure Machine Learning Service
Stars: ✭ 342 (+159.09%)
Mutual labels:  language-model, pretrained-models
Text Gan Tensorflow
TensorFlow GAN implementation using Gumbel Softmax
Stars: ✭ 87 (-34.09%)
Mutual labels:  gan, language-model
Man
Multinomial Adversarial Networks for Multi-Domain Text Classification (NAACL 2018)
Stars: ✭ 72 (-45.45%)
Mutual labels:  gan, adversarial-networks
Transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Stars: ✭ 55,742 (+42128.79%)
Mutual labels:  language-model, pretrained-models
Robbert
A Dutch RoBERTa-based language model
Stars: ✭ 120 (-9.09%)
Mutual labels:  language-model
Kogpt2 Finetuning
🔥 Korean GPT-2, KoGPT2 FineTuning cased. 한국어 가사 데이터 학습 🔥
Stars: ✭ 124 (-6.06%)
Mutual labels:  language-model
Chromagan
Official Implementation of ChromaGAN: An Adversarial Approach for Picture Colorization
Stars: ✭ 117 (-11.36%)
Mutual labels:  adversarial-networks
Pi Rec
🔥 PI-REC: Progressive Image Reconstruction Network With Edge and Color Domain. 🔥 图像翻译,条件GAN,AI绘画
Stars: ✭ 1,619 (+1126.52%)
Mutual labels:  gan
Awesome Gan For Medical Imaging
Awesome GAN for Medical Imaging
Stars: ✭ 1,814 (+1274.24%)
Mutual labels:  gan
Dynamic Memory Networks Plus Pytorch
Implementation of Dynamic memory networks plus in Pytorch
Stars: ✭ 123 (-6.82%)
Mutual labels:  language-model

ELECTRA

中文 预训练 ELECTREA 模型: 基于对抗学习 pretrain Chinese Model

code Repost from google official code: https://github.com/google-research/electra

具体使用说明:参考 官方链接

Electra Chinese tiny模型路径

google drive

electra-tiny google-drive

baidu drive

electra-tiny baidu-pan code:rs99

模型说明

  1. 与 tinyBERT 的 配置相同
  2. generator 为 discriminator的 1/4

How to use official code

Steps

  1. 修改 configure_pretraining.py 里面的 数据路径、tpu、gpu 配置
  2. 修改 model_size:可在 code/util/training_utils.py 里面 自行定义模型大小
  3. 数据输入格式:原始的input_ids, input_mask, segment_ids,训练过程中会在线 做 uniform mask sampling(不需要离线 生成 masked input ids)

Performance

gen+disc:

electra-tiny | metric | value | | --- | --- | | disc_accuracy | 0.95093095 | | disc_auc | 0.9762006 | | disc_loss | 0.14071295 | | disc_precision | 0.8018275 | | disc_recall | 0.6088053 | | loss | 9.516352 | | masked_lm_accuracy | 0.46732807 | | masked_lm_loss | 2.8209455 | | sampled_masked_lm_accuracy | 0.3504382 |

The model are trained on CLUE 10G Chinese Corpus with 1M-steps

Downstream finetuning on CLUE benchmark:

注:only use pretrained electra-tiny with layer-wise learning rate decay without any distilaltion、data-augmentation. learning rate is set to 1e-4 for each task and run 10-epochs. (According to official results, the results may have large variance)

AFQMC TNEWS IFLYTEK CMNLI WSC CSL
Metrics Acc Acc Acc Acc Acc Acc
ELECTRA-tiny 70.319 54.280 53.538 73.745 64.336 78.700
Roberta-tiny 69.904 54.150 56.808 74.037 64.336 74.133

注:

  1. electra 在 多分类问题上面 可能会有 performance 下降
  2. gen、disc的规模 配比 比较hacky,与 mask的方法 等相关

报名NLPCC-高性能小模型测评

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].