Optimizing Models for CPU-based Deployments in Keras

Overview

Model Optimisation plays an important role in terms of deployment. Currently, it is much easier to train machine learning or deep learning than to deploy it in production. For deploying machine learning or deep learning models, we have many factors involved, where optimization comes into play. This article will discuss optimizing your machine-learning models for CPU-based deployments.

Introduction to ONNX

Open Neural Network Exchange, known as ONNX, is a machine learning library in Python. The main objective of ONNX is to behave like an open format for deep learning models and move easily from one framework to another. Facebook and Microsoft develop it.

If you want to know more about ONNX, this link might be useful to you.

Advantages of ONNX

You can easily convert onxx models to frameworks like (TensorFlow, Keras, or PyTorch).
The latency of ONNX models is less than the other framework-based models.
You can easily deploy ONNX models in C, C++, or Java environments.

Optimizing Keras Models with ONNX

This section will walk through converting a Keras model to an ONNX format.

Setup

We will require a few libraries. The libraries are listed below:

Note: You can install these libraries via pip. For example, if you want to install numpy, you can run this command in your terminal.

Convert the Model

We will use a pre-trained model to convert the Keras model into the ONNX model. You can use your custom Keras model to replicate this conversion.

Let's look into the code,

After executing the above code, we get two models, i.e., model-resnet50-final.h5 and model-resnet50-final.onnx. This is how we can convert Keras models into ONNX format.

Inference of the ONNX Model

Now let's infer the results with the ONNX model.

Output

Comparison Between Evaluation Metric, Model Sizes, Latency, and Throughput

Now in the next section, let's compare which one is better.

Let's consider the model size of both models. The file size of the Keras model is 317.4 MB whereas the ONNX model is about 286.3 MB. So we can see the difference.

Now let's test the model loading time for Keras and ONNX.

Output:

Let's calculate the time for the ONNX model.

Output:

You can point out the winner.

Now it's time to capture the inference results.

Output:

We can conclude that the ONNX model is 2.7x times faster than the Keras model.

Conclusion

In this article, we discussed model optimization with ONNX.
We also went through the process of how to optimize models using ONNX
We also compared how the ONNX model is optimized to Keras.