All Projects → abelriboulot → Onnxt5

abelriboulot / Onnxt5

Licence: apache-2.0
Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Onnxt5

fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (+194.41%)
Mutual labels:  inference, transformer, onnx
Paribhasha
paribhasha.herokuapp.com/
Stars: ✭ 21 (-85.31%)
Mutual labels:  sentiment-analysis, summarization, nlp-machine-learning
Delta
DELTA is a deep learning based natural language and speech processing platform.
Stars: ✭ 1,479 (+934.27%)
Mutual labels:  text-classification, inference, text-generation
Text Classification Keras
📚 Text classification library with Keras
Stars: ✭ 53 (-62.94%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Text mining resources
Resources for learning about Text Mining and Natural Language Processing
Stars: ✭ 358 (+150.35%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
COVID-19-Tweet-Classification-using-Roberta-and-Bert-Simple-Transformers
Rank 1 / 216
Stars: ✭ 24 (-83.22%)
Mutual labels:  sentiment-analysis, text-classification, transformer
Doc2vec
📓 Long(er) text representation and classification using Doc2Vec embeddings
Stars: ✭ 92 (-35.66%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
PlanSum
[AAAI2021] Unsupervised Opinion Summarization with Content Planning
Stars: ✭ 25 (-82.52%)
Mutual labels:  sentiment-analysis, text-generation, summarization
Rust Bert
Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)
Stars: ✭ 510 (+256.64%)
Mutual labels:  translation, sentiment-analysis, transformer
Dan Jurafsky Chris Manning Nlp
My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.
Stars: ✭ 124 (-13.29%)
Mutual labels:  text-classification, sentiment-analysis, nlp-machine-learning
Scientificsummarizationdatasets
Datasets I have created for scientific summarization, and a trained BertSum model
Stars: ✭ 100 (-30.07%)
Mutual labels:  summarization, transformer
Tia
Your Advanced Twitter stalking tool
Stars: ✭ 98 (-31.47%)
Mutual labels:  text-classification, sentiment-analysis
Njunmt Tf
An open-source neural machine translation system developed by Natural Language Processing Group, Nanjing University.
Stars: ✭ 97 (-32.17%)
Mutual labels:  translation, transformer
Mivisionx
MIVisionX toolkit is a set of comprehensive computer vision and machine intelligence libraries, utilities, and applications bundled into a single toolkit. AMD MIVisionX also delivers a highly optimized open-source implementation of the Khronos OpenVX™ and OpenVX™ Extensions.
Stars: ✭ 100 (-30.07%)
Mutual labels:  inference, onnx
Yolov5 Rt Stack
Yet another yolov5, with its runtime stack for libtorch, onnx, tvm and specialized accelerators. You like torchvision's retinanet? You like yolov5? You love yolort!
Stars: ✭ 107 (-25.17%)
Mutual labels:  inference, onnx
Sa Papers
📄 Deep Learning 中 Sentiment Analysis 論文統整與分析 😀😡☹️😭🙄🤢
Stars: ✭ 111 (-22.38%)
Mutual labels:  sentiment-analysis, summarization
Nlp Papers
Papers and Book to look at when starting NLP 📚
Stars: ✭ 111 (-22.38%)
Mutual labels:  sentiment-analysis, summarization
Text classification
Text Classification Algorithms: A Survey
Stars: ✭ 1,276 (+792.31%)
Mutual labels:  text-classification, nlp-machine-learning
Kadot
Kadot, the unsupervised natural language processing library.
Stars: ✭ 108 (-24.48%)
Mutual labels:  text-classification, text-generation
Bertqa Attention On Steroids
BertQA - Attention on Steroids
Stars: ✭ 112 (-21.68%)
Mutual labels:  nlp-machine-learning, transformer

ONNX T5 Actions Status Actions Status Version Downloads Slack

Summarization, translation, Q&A, text generation and more at blazing speed using a T5 version implemented in ONNX.

This package is still in alpha stage, therefore some functionalities such as beam searches are still in development.

Installation

ONNX-T5 is available on PyPi.

pip install onnxt5

For the dev version you can run the following.

git clone https://github.com/abelriboulot/onnxt5
cd onnxt5
pip install -e .

Usage

The simplest way to get started for generation is to use the default pre-trained version of T5 on ONNX included in the package.

NOTE: Please note that the first time you call get_encoder_decoder_tokenizer, the models are being downloaded which might take a minute or two.

from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
# output_text: "J'ai été victime d'une série d'accidents."

Other tasks just require to change the prefix in your prompt, for instance for summarization:

prompt = 'summarize: <PARAGRAPH>'
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)

If you want to get the embeddings of text, you can run the following

from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
prompt = 'Listen, Billy Pilgrim has come unstuck in time.'
encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)

ONNXT5 also lets you export and use your own models. See the examples\ folder for more detailed examples.

T5 works with tokens such as summarize:, translate English to German:, or question: ... context:. You can see a list of the pretrained tasks and token in the appendix D of the original paper.

Functionalities

  • Run any of the T5 trained tasks in a line (translation, summarization, sentiment analysis, completion, generation)
  • Export your own T5 models to ONNX easily
  • Utility functions to generate what you need quickly
  • Up to 4X speedup compared to PyTorch execution for smaller contexts

Benchmarks

The outperformance varies heavily based on the length of the context. For contexts less than ~500 words, ONNX outperforms greatly, going up to a 4X speedup compared to PyTorch. However, the longer the context, the smaller the speedup of ONNX, with Pytorch being faster above 500 words.

GPU Benchmark, Embedding Task

Benchmark Embedding

GPU Benchmark, Generation Task

Benchmark Generation

Contributing

The project is still in its infancy, so I would love your feedback, to know what problems you are trying to solve, hear issues you're encountering, and discuss features that would help you. Therefore feel free to shoot me an e-mail (see my profile for the address!) or join our slack community.

Acknowledgements

This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.

Original T5 Paper

@article{2019t5,
  author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
  title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
  journal = {arXiv e-prints},
  year = {2019},
  archivePrefix = {arXiv},
  eprint = {1910.10683},
}

Microsoft onnxruntime repo

HuggingFace implementation of T5

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].