All Projects → Geotrend-research → smaller-transformers

Geotrend-research / smaller-transformers

Licence: Apache-2.0 License
Load What You Need: Smaller Multilingual Transformers for Pytorch and TensorFlow 2.0.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to smaller-transformers

BERT-NER
Using pre-trained BERT models for Chinese and English NER with 🤗Transformers
Stars: ✭ 114 (+72.73%)
Mutual labels:  transformers
ttt
A package for fine-tuning Transformers with TPUs, written in Tensorflow2.0+
Stars: ✭ 35 (-46.97%)
Mutual labels:  transformers
TorchBlocks
A PyTorch-based toolkit for natural language processing
Stars: ✭ 85 (+28.79%)
Mutual labels:  transformers
robustness-vit
Contains code for the paper "Vision Transformers are Robust Learners" (AAAI 2022).
Stars: ✭ 78 (+18.18%)
Mutual labels:  transformers
text2keywords
Trained T5 and T5-large model for creating keywords from text
Stars: ✭ 53 (-19.7%)
Mutual labels:  transformers
KoELECTRA-Pipeline
Transformers Pipeline with KoELECTRA
Stars: ✭ 37 (-43.94%)
Mutual labels:  transformers
remixer-pytorch
Implementation of the Remixer Block from the Remixer paper, in Pytorch
Stars: ✭ 37 (-43.94%)
Mutual labels:  transformers
text2class
Multi-class text categorization using state-of-the-art pre-trained contextualized language models, e.g. BERT
Stars: ✭ 15 (-77.27%)
Mutual labels:  transformers
golgotha
Contextualised Embeddings and Language Modelling using BERT and Friends using R
Stars: ✭ 39 (-40.91%)
Mutual labels:  transformers
nuwa-pytorch
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Stars: ✭ 347 (+425.76%)
Mutual labels:  transformers
ParsBigBird
Persian Bert For Long-Range Sequences
Stars: ✭ 58 (-12.12%)
Mutual labels:  transformers
small-text
Active Learning for Text Classification in Python
Stars: ✭ 241 (+265.15%)
Mutual labels:  transformers
spark-transformers
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (-40.91%)
Mutual labels:  transformers
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (+43.94%)
Mutual labels:  transformers
minicons
Utility for analyzing Transformer based representations of language.
Stars: ✭ 28 (-57.58%)
Mutual labels:  transformers
MISE
Multimodal Image Synthesis and Editing: A Survey
Stars: ✭ 214 (+224.24%)
Mutual labels:  transformers
Product-Categorization-NLP
Multi-Class Text Classification for products based on their description with Machine Learning algorithms and Neural Networks (MLP, CNN, Distilbert).
Stars: ✭ 30 (-54.55%)
Mutual labels:  transformers
serverless-transformers-on-aws-lambda
Deploy transformers serverless on AWS Lambda
Stars: ✭ 100 (+51.52%)
Mutual labels:  transformers
Text and Audio classification with Bert
Text Classification in Turkish Texts with Bert
Stars: ✭ 34 (-48.48%)
Mutual labels:  transformers
robo-vln
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"
Stars: ✭ 34 (-48.48%)
Mutual labels:  transformers

Smaller Multilingual Transformers

This repository shares smaller versions of multilingual transformers that keep the same representations offered by the original ones. The idea came from a simple observation: after massively multilingual pretraining, not all embeddings are needed to perform finetuning and inference. In practice one would rarely require a model that supports more than 100 languages as the original mBERT. Therefore, we extracted several smaller versions that handle fewer languages. Since most of the parameters of multilingual transformers are located in the embeddings layer, our models are up to 64% smaller in size.

The table bellow compares two of our exracted versions with the original mBERT. It shows the models size, memory footprint and the obtained accuracy on the XNLI dataset (Cross-lingual Transfer from english for french). These measurements have been computed on a Google Cloud n1-standard-1 machine (1 vCPU, 3.75 GB).

Model Num parameters Size Memory Accuracy
bert-base-multilingual-cased 178 million 714 MB 1400 MB 73.8
Geotrend/bert-base-15lang-cased 141 million 564 MB 1098 MB 74.1
Geotrend/bert-base-en-fr-cased 112 million 447 MB 878 MB 73.8

Reducing the size of multilingual transformers facilitates their deployment on public cloud platforms. For instance, Google Cloud Platform requires that the model size on disk should be lower than 500 MB for serveless deployments (Cloud Functions / Cloud ML).

For more information, please refer to our paper: Load What You Need.

*** New August 2021: smaller versions of distil-mBERT are now available ! ***

Model Num parameters Size Memory
distilbert-base-multilingual-cased 178 million 542 MB 1200 MB
Geotrend/distilbert-base-en-fr-cased 69 million 277 MB 740 MB

🚀 To our knowledge, these distil-mBERT based versions are the smallest and fastest multilingual transformers that we are aware of.

Available Models

Until now, we generated a total of 138 models (70 extracted from mBERT and 68 extracted from distil-mBERT). These models have been uploaded to the Hugging Face Model Hub in order to facilitate their use: https://huggingface.co/Geotrend.

They can be downloaded easily using the transformers library:

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("Geotrend/bert-base-en-fr-cased")
model = AutoModel.from_pretrained("Geotrend/bert-base-en-fr-cased")

More models will be released soon.

Generating new Models

We also share a python script that allows users to generate smaller transformers by their own based on a subset of the original vocabulary (the method does not only concern multilingual transformers):

pip install -r requirements.txt

python3 reduce_model.py \
	--source_model bert-base-multilingual-cased \
	--vocab_file vocab_5langs.txt \
	--output_model bert-base-5lang-cased \
	--convert_to_tf False

Where:

  • --source_model is the multilingual transformer to reduce
  • --vocab_file is the intended vocabulary file path
  • --output_model is the name of the final reduced model
  • --convert_to_tf tells the scipt whether to generate a tenserflow version or not

How to Cite

@inproceedings{smallermbert,
  title={Load What You Need: Smaller Versions of Multilingual BERT},
  author={Abdaoui, Amine and Pradel, Camille and Sigel, Grégoire},
  booktitle={SustaiNLP / EMNLP},
  year={2020}
}

Contact

Please contact [email protected] for any question, feedback or request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].