Serving Runtime
Exposes a serialized machine learning model through a HTTP API written in Java.
This project is under active development
Description
The purpose of this project is to expose a generic HTTP API from a machine learning serialized models.
Supported serialized models are :
- ONNX
1.5
- TensorFlow
<=1.15
SavedModel or HDF5 - HuggingFace Tokenizer
Prerequisites
- Maven for compiling the project
Java 11
for running the project
HDF5
serialization format is supported through a conversion into SavedModel
format. That conversion relies on following dependencies :
- Python
3.7
- TensorFlow
<=1.15
(pip install tensorflow
)
For HuggingFace tokenizer :
- Cargo (Rust stable)
HDF5 support (Optional)
If you use the API from the docker image this step is not necessary as it will be built within the image.
The Tensorflow module requires the support of HDF5 files through the creation of an executable h5_converter
wich exports the model from HDF5 file to a Tensorflow SavedModel (.pb
).
To generate the converter simply use the initialize_tensorflow goal of the Makefile
:
make initialize_tensorflow
The generated executable can be found here: evaluator-tensorflow/h5_converter/dist/h5_converter
HuggingFace (Optional)
To build the Java binding use the initialize_huggingface goal of the Makefile
:
make initialize_huggingface
Torch (Optional)
To install libtorch use initialize_torch goal of the Makefile
:
make initialize_torch
Convert pyTorch model and more
Build & Launch the project locally
Several profiles are available depending on the support you require for the built project.
full
which includes both Tensorflow and ONNX, requires the ONNX support, HDF5 support and Torch support.tensorflow
which only includes Tensorflow, requires the HDF5 supportonnx
which only includes ONNX, requires the ONNX support.torch
which only includes Torch, requires the Torch support.
Set your desired profile:
export MAVEN_PROFILE=<your-profile>
If not specified the default profile is set to full
.
Launch tests
make test MAVEN_PROFILE=$MAVEN_PROFILE
Building JAR
make build MAVEN_PROFILE=$MAVEN_PROFILE
The JAR could then be found in api/target/api-*.jar
Launching JAR
In the following command, replace <jar-path>
with the path on your compiled jar and <model-path>
with the directory where to find your serialized model.
java -Dfiles.path=<model-path> -jar <jar-path>
If you wish to load a model from a HDF5 model you will need to specify the path to the executable generated in HDF5 support.
java -Dfiles.path=<model-path> -Devaluator.tensorflow.h5_converter.path=<path-to-h5-converter> -jar <jar-path>
Inside the <model-path>
it will look for the first file ending with :
.onnx
for an ONNX model.pb
for a TensorFlow SavedModel.h5
for a HDF5 model
Available parameters
On the launch command you can also specify the following parameters :
-Dserver.port
: the host port to request for the http server-Dswagger.title
: The title that will be dispayed on the swagger-Dswagger.description
: The description that will be displayed on the swagger
Build & Launch the project using docker
Building the docker container
make docker-build-api MAVEN_PROFILE=$MAVEN_PROFILE
It will build the docker image serving-runtime-$MAVEN_PROFILE:latest
Running the docker container
In the following command, replace <model-path>
with the absolute path on directory where to find your serialized model.
docker run --rm -it -p 8080:8080 -v <model-path>:/deployments/models serving-runtime-$MAVEN_PROFILE:latest
Using the API
By default the API will be running on http://localhost:8080
. Reaching this URL in your browser will display the SwaggerUI describing the API for your model.
There is 2 routes available in each models :
/describe
: Describe your model (what are the inputs, outputs and transformations)/eval
: Send expected inputs on model and receive expected outputs results
Describe the models inputs and outputs
Each serialized model takes a list of named tensors as inputs and also returns a list of named tensors as outputs.
A named tensors is a N-Dimensional array with :
- A identifier name. Example:
my-tensor-name
- A data type. Example:
integer
ordouble
orstring
- A shape. Example:
(5)
for a vector of length 5,(3, 2)
for a matrix which first dimension is of size 3 and second dimension is of size 2. Etc.
You can get access to the model inputs and outputs by calling the http GET
method on /describe
path of the model.
Example of a describe query with curl
curl \
-X GET \
http://<your-model-url>/describe
Example of a describe response
You will get a JSON object describing the list of inputs tensors that are needed to query your model as well as the list of outputs tensors that will be returning.
{
"inputs": [
{
"name": "sepal_length",
"type": "float",
"shape": [-1]
},
{
"name": "sepal_width",
"type": "float",
"shape": [-1]
},
{
"name": "petal_length",
"type": "float",
"shape": [-1]
},
{
"name": "petal_width",
"type": "float",
"shape": [-1]
}
],
"outputs": [
{
"name": "output_label",
"type": "long",
"shape": [-1]
},
{
"name": "output_probability",
"type": "float",
"shape": [-1, 2]
}
]
}
In this example, the deployed model is waiting for 4 tensors as inputs :
sepal_length
of shape(-1)
(i.e. a vector of any size)sepal_width
of shape(-1)
(i.e. a vector of any size)petal_length
of shape(-1)
(i.e. a vector of any size)petal_width
of shape(-1)
(i.e. a vector of any size)
It will answer a response with 2 tensors as outputs :
output_label
of shape(-1)
(i.e. a vector of any size)output_probability
of shape(-1, 2)
(i.e. a matrix which first dimension is of any size and which second dimension is of size 2)
Query the model
Once you know what kind of input tensors are needed by the model, just fill a correct body on your HTTP query with your wanted representation of tensor (see below) and send it to the model with a POST
method on the path /eval
.
Two attached headers are available for your query:
- The Content-Type header indicating the media type of your input tensors data contained in your body message.
- The (optional) Accept header indicating what kind of media type your want to receive for output tensors in the response body. The default
Accept
header if you don't provide one will beapplication/json
.
Supported Content-Type headers
-
application/json
: A json document which key are the input tensors names and values are the n-dimensional json arrays matching your tensors. -
image/png
: A bytes content which representation is a png encoded image. -
image/jpeg
: A bytes content which representation is a jpeg encoded image.
image/png
andimage/jpeg
are only available for models taking a single tensor as input. That tensor's shape should also be compatible with an image representation.
multipart/form-data
: A multipart body, each part of which is named by an input tensor.
Each part (i.e. tensor) in the multipart should have its own Content-Type
Supported Accept headers
-
application/json
: A JSON document which key is the output tensors names and values are the n-dimensional json arrays matching your tensors. -
image/png
: A bytes content which representation is a png encoded image. -
image/jpeg
: A bytes content which representation is a jpeg encoded image.
image/png
andimage/jpeg
are only available for models returning a single tensor as output. That tensor's shape should also be compatible with an image representation.
text/html
: A HTML document displaying the output tensors representation.multipart/form-data
: A multipart body, each part of which is named by an output tensor and the content is the tensor json representation.
If you want some of the output tensors in
multipart/form-data
andtext/html
header to be interpreted as an image, you can specify it as a parameter in the header.Example : The header
text/html; tensor_1=image/png; tensor_2=image/png
returns the global response as HTML content. Inside the HTML page,tensor_1
andtensor_2
are displayed as png images.
Tensor interpretable as image
For a tensor to be interpretable as image raw data, it should be of a compatible shape in your exported model. Here are the supported ones :
(x, y, z, 1)
: Batch of x grayscale images with y pixels height and z pixels width(x, 1, y, z)
: Batch of x grayscale images with y pixels height and z pixels width(x, y, z, 3)
: Batch of x RGB images with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(x, 3, y, z)
: Batch of x RGB images with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(y, z, 1)
: Single grayscale image with y pixels height and z pixels width(1, y, z)
: Single grayscale image with y pixels height and z pixels width(y, z, 3)
: Single RGB image with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.(3, y, z)
: Single RGB image with y pixels height and z pixels width. The last dimension should be the array of(red, green, blue)
components.
Examples
Example of a query with curl for a single prediction
In the following example, we want to receive a prediction from our model for the following item :
sepal_length
: 0.1sepal_width
: 0.2petal_length
: 0.3petal_width
: 0.4
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": 0.1,
"stepal_width": 0.2,
"petal_length": 0.3,
"petal_width": 0.4
}' \
http://<your-model-url>/eval
Example of response for a single prediction
- HTTP Status code:
200
- Header:
Content-Type: application/json
{
"output_label": 0,
"output_probability": [0.88, 0.12]
}
In this example, our model predicts the output_label for our input item to be 0
with the following probabilities :
- 88% of chance to be
0
- 12% of chance to be
1
Example of query with curl for several predictions in one call
In the following example, we want to receive a prediction from our model for the two following items :
First Item
sepal_length
: 0.1sepal_width
: 0.2petal_length
: 0.3petal_width
: 0.4
Second Item
sepal_length
: 0.2sepal_width
: 0.3petal_length
: 0.4petal_width
: 0.5
Query
curl \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' \
-X POST \
-d '{
"stepal_length": [0.1, 0.2],
"stepal_width": [0.2, 0.3],
"petal_length": [0.3, 0.4],
"petal_width": [0.4, 0.5]
}' \
http://<your-model-url>/eval
Example of response for several predictions in one call
- HTTP Status code:
200
- Header:
Content-Type: application/json
{
"output_label": [0, 1],
"output_probability": [
[0.88, 0.12],
[0.01, 0.99]
]
}
In this example, our model predicts the output_label for our first input item to be 0
with the following probabilities :
- 88% of chance to be
0
- 12% of chance to be
1
It also predicts the output_label for our second input item to be 1
with the following probabilities :
- 1% of chance to be
0
- 99% of chance to be
1
Related links
- Contribute: https://github.com/ovh/serving-runtime/blob/master/CONTRIBUTING.md
- Report bugs: https://github.com/ovh/serving-runtime/issues
License
See https://github.com/ovh/serving-runtime/blob/master/LICENSE