All Projects → huggingface → optimum

huggingface / optimum

Licence: Apache-2.0 license
🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to optimum

fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Stars: ✭ 421 (-25.75%)
Mutual labels:  inference, quantization, onnx, onnxruntime
onnxruntime-rs
Rust wrapper for Microsoft's ONNX Runtime (version 1.8)
Stars: ✭ 149 (-73.72%)
Mutual labels:  inference, onnx, onnxruntime
chainer-fcis
[This project has moved to ChainerCV] Chainer Implementation of Fully Convolutional Instance-aware Semantic Segmentation
Stars: ✭ 45 (-92.06%)
Mutual labels:  training, inference
torch-model-compression
针对pytorch模型的自动化模型结构分析和修改工具集,包含自动分析模型结构的模型压缩算法库
Stars: ✭ 126 (-77.78%)
Mutual labels:  quantization, onnx
object-size-detector-python
Monitor mechanical bolts as they move down a conveyor belt. When a bolt of an irregular size is detected, this solution emits an alert.
Stars: ✭ 26 (-95.41%)
Mutual labels:  intel, inference
sagemaker-xgboost-container
This is the Docker container based on open source framework XGBoost (https://xgboost.readthedocs.io/en/latest/) to allow customers use their own XGBoost scripts in SageMaker.
Stars: ✭ 93 (-83.6%)
Mutual labels:  training, inference
deepvac
PyTorch Project Specification.
Stars: ✭ 507 (-10.58%)
Mutual labels:  quantization, onnx
intruder-detector-python
Build an application that alerts you when someone enters a restricted area. Learn how to use models for multiclass object detection.
Stars: ✭ 16 (-97.18%)
Mutual labels:  intel, inference
graphsignal
Graphsignal Python agent
Stars: ✭ 158 (-72.13%)
Mutual labels:  inference, onnxruntime
motor-defect-detector-python
Predict performance issues with manufacturing equipment motors. Perform local or cloud analytics of the issues found, and then display the data on a user interface to determine when failures might arise.
Stars: ✭ 24 (-95.77%)
Mutual labels:  intel, inference
bert-squeeze
🛠️ Tools for Transformers compression using PyTorch Lightning ⚡
Stars: ✭ 56 (-90.12%)
Mutual labels:  transformers, quantization
studio-lab-examples
Example notebooks for working with SageMaker Studio Lab. Sign up for an account at the link below!
Stars: ✭ 319 (-43.74%)
Mutual labels:  training, inference
ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Stars: ✭ 281 (-50.44%)
Mutual labels:  quantization, onnx
popart
Poplar Advanced Runtime for the IPU
Stars: ✭ 62 (-89.07%)
Mutual labels:  onnx, graphcore
object-flaw-detector-cpp
Detect various irregularities of a product as it moves along a conveyor belt.
Stars: ✭ 19 (-96.65%)
Mutual labels:  intel, inference
mediapipe plus
The purpose of this project is to apply mediapipe to more AI chips.
Stars: ✭ 38 (-93.3%)
Mutual labels:  inference, onnx
Bmw Labeltool Lite
This repository provides you with a easy to use labeling tool for State-of-the-art Deep Learning training purposes.
Stars: ✭ 145 (-74.43%)
Mutual labels:  training, inference
Dawn Bench Entries
DAWNBench: An End-to-End Deep Learning Benchmark and Competition
Stars: ✭ 254 (-55.2%)
Mutual labels:  training, inference
ai-serving
Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints
Stars: ✭ 122 (-78.48%)
Mutual labels:  inference, onnx
safety-gear-detector-python
Observe workers as they pass in front of a camera to determine if they have adequate safety protection.
Stars: ✭ 54 (-90.48%)
Mutual labels:  intel, inference

ONNX Runtime

Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware.

The AI ecosystem evolves quickly and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables users to efficiently use any of these platforms with the same ease inherent to transformers.

Integration with Hardware Partners

🤗 Optimum aims at providing more diversity towards the kind of hardware users can target to train and finetune their models.

To achieve this, we are collaborating with the following hardware manufacturers in order to provide the best transformers integration:

  • Graphcore IPUs - IPUs are a completely new kind of massively parallel processor to accelerate machine intelligence. More information here.
  • Habana Gaudi Processor (HPU) - HPUs are designed to maximize training throughput and efficiency. More information here.
  • Intel - Enabling the usage of Intel tools to accelerate end-to-end pipelines on Intel architectures. More information here.
  • More to come soon!

Optimizing models towards inference

Along with supporting dedicated AI hardware for training, Optimum also provides inference optimizations towards various frameworks and platforms.

Optimum enables the usage of popular compression techniques such as quantization and pruning by supporting ONNX Runtime along with Intel Neural Compressor (INC).

Features ONNX Runtime Intel Neural Compressor
Post-training Dynamic Quantization ✔️ ✔️
Post-training Static Quantization ✔️ ✔️
Quantization Aware Training (QAT) Stay tuned! ✔️
Pruning N/A ✔️

Installation

🤗 Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

Accelerator Installation
ONNX Runtime python -m pip install optimum[onnxruntime]
Intel Neural Compressor (INC) python -m pip install optimum[intel]
Graphcore IPU python -m pip install optimum[graphcore]
Habana Gaudi Processor (HPU) python -m pip install optimum[habana]

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you can install the base library from source as follows:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, you can install them by appending #egg=optimum[accelerator_type] to the pip command, e.g.

python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]

Quickstart

At its core, 🤗 Optimum uses configuration objects to define parameters for optimization on different accelerators. These objects are then used to instantiate dedicated optimizers, quantizers, and pruners.

Quantization

For example, here's how you can apply dynamic quantization with ONNX Runtime:

from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer

# The model we wish to quantize
model_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained(model_checkpoint, feature="sequence-classification")

# Quantize the model!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)

In this example, we've quantized a model from the Hugging Face Hub, but it could also be a path to a local model directory. The feature argument in the from_pretrained() method corresponds to the type of task that we wish to quantize the model for. The result from applying the export() method is a model-quantized.onnx file that can be used to run inference. Here's an example of how to load an ONNX Runtime model and generate predictions with it:

from functools import partial
from datasets import Dataset
from optimum.onnxruntime.model import ORTModel

# Load quantized model
ort_model = ORTModel("model-quantized.onnx", quantizer._onnx_config)
# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": ["I love burritos!"]})
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
    return tokenizer(ex["sentence"])

tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.preprocessor))
ort_outputs = ort_model.evaluation_loop(tokenized_ds)
# Extract logits!
ort_outputs.predictions

Similarly, you can apply static quantization by simply setting is_static to True when instantiating the QuantizationConfig object:

qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)

Static quantization relies on feeding batches of data through the model to estimate the activation quantization parameters ahead of inference time. To support this, 🤗 Optimum allows you to provide a calibration dataset. The calibration dataset can be a simple Dataset object from the 🤗 Datasets library, or any dataset that's hosted on the Hugging Face Hub. For this example, we'll pick the sst2 dataset that the model was originally trained on:

from optimum.onnxruntime.configuration import AutoCalibrationConfig

# Create the calibration dataset
calibration_dataset = quantizer.get_calibration_dataset(
    "glue",
    dataset_config_name="sst2",
    preprocess_function=partial(preprocess_fn, tokenizer=quantizer.preprocessor),
    num_samples=50,
    dataset_split="train",
)
# Create the calibration configuration containing the parameters related to calibration.
calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)
# Perform the calibration step: computes the activations quantization ranges
ranges = quantizer.fit(
    dataset=calibration_dataset,
    calibration_config=calibration_config,
    onnx_model_path="model.onnx",
    operators_to_quantize=qconfig.operators_to_quantize,
)
# Quantize the same way we did for dynamic quantization!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    calibration_tensors_range=ranges,
    quantization_config=qconfig,
)

Graph optimization

Then let's take a look at applying graph optimizations techniques such as operator fusion and constant folding. As before, we load a configuration object, but this time by setting the optimization level instead of the quantization approach:

from optimum.onnxruntime.configuration import OptimizationConfig

# Here the optimization level is selected to be 1, enabling basic optimizations such as redundant
# node eliminations and constant folding. Higher optimization level will result in a hardware
# dependent optimized graph.
optimization_config = OptimizationConfig(optimization_level=1)

Next, we load an optimizer to apply these optimisations to our model:

from optimum.onnxruntime import ORTOptimizer

optimizer = ORTOptimizer.from_pretrained(
    model_checkpoint,
    feature="sequence-classification",
)

# Export the optimized model
optimizer.export(
    onnx_model_path="model.onnx",
    onnx_optimized_model_output_path="model-optimized.onnx",
    optimization_config=optimization_config,
)

And that's it - the model is now optimized and ready for inference!

As you can see, the process is similar in each case:

  1. Define the optimization / quantization strategies via an OptimizationConfig / QuantizationConfig object
  2. Instantiate an ORTQuantizer or ORTOptimizer class
  3. Apply the export() method
  4. Run inference

Training

Besides supporting ONNX Runtime inference, 🤗 Optimum also supports ONNX Runtime training, reducing the memory and computations needed during training. This can be achieved by using the class ORTTrainer, which possess a similar behavior than the Trainer of 🤗 Transformers:

-from transformers import Trainer
+from optimum.onnxruntime import ORTTrainer

# Step 1: Create your ONNX Runtime Trainer
-trainer = Trainer(
+trainer = ORTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    feature="sequence-classification",
)

# Step 2: Use ONNX Runtime for training and evalution!🤗
train_result = trainer.train()
eval_metrics = trainer.evaluate()

By replacing Trainer by ORTTrainer, you will be able to leverage ONNX Runtime for fine-tuning tasks.

Check out the examples directory for more sophisticated usage.

Happy optimizing 🤗!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].