Saving and Loading Models in Keras
Overview
After model training and optimization, the next important part is saving and loading the Model because we need to deploy the trained Model or resume the model training. Therefore, the article is constructed to explain every aspect of the saving and loading model in depth.
Pre-requisites
To understand every aspect of this article, it is very important to have a basic idea of the following topics:
- Model Checkpoints in Keras.
- Callbacks in Keras.
Introduction
Any Deep Learning (DL) model trained on any dataset must be deployed to make real-time predictions. The Model can be saved as a whole or in parts, i.e., separately. The Model is saved for retraining the Model, deployment of the Model, or inspecting or examining the Model. That is why the respective Model associates the input data points with the respective label (XAI). This article will discuss every aspect of saving and loading the Model, i.e., saving the Model as a whole or saving the model components.
Saving Model Composition
Any Deep Learning (DL) model is made up of the following components:
Architecture: The architecture specifies the structure of the model layers and how they are connected with the neurons present in the same layer and to the other layer.
Weights: Weights is the model's state, i.e., numerical values responsible for the model prediction.
Optimizer: It is defined when we compile the Model. It is responsible for the backward propagation of the errors.
Loss or Metric: Loss is measuring the Model's performance. The lower the loss higher the model performance. The metric measures the model's efficiency during training, testing, or prediction.
In this section, we will save the Model's components, which consist of 7 steps shown below:
Step 1: Importing Libraries
The code snippets import the libraries we will use to create the Model in Keras.
Step 2. Data Preprocessing
For the explanation purpose, I have used the MNIST-Digit dataset, which consists of 60,000 greyscales (composed of only one channel) images of handwritten digits from 0-9. The dataset is split into two sections, i.e., the Train set, which consists of 50,000 images, and the Test set, which consists of 10,000 images.
I have also preprocessed the dataset by normalizing and converting the labels into categorical values and reshaping the dataset into a shape because we will implement the Convolution Neural Network 2D model.
The dataset is loaded from the Keras MNIST-Digit dataset, which is already divided into training and testing sets. Then, the dataset is scaled by dividing each pixel by 255 so that the value of each pixel is between 0 and 255 and reshaped into . The below code snippets depict the preprocessing steps of the MNIST Digit dataset.
The dataset Shape and Samples are shown below:
Output
Step 3. Model Creation
After preprocessing the dataset, it's time to create the Model we will train. For the sake of simplicity, I have constructed a simple 2D-Convolutional Neural Network model with the following layers and activations:
Classification Layer: Since we have ten classes ( 10 different objects to classify), we have added ten neurons in the classification layer, which comprises one dense layer with ten neurons and the activation function of Softmax. Softmax is the activation function that outputs the probability of the class associated with the input. It is ranged between 0 and 1.
Convolutional Neural Network: The model architecture also constitutes two Convolutional Neural Network Layers with feature maps of 64 and 128 with a kernel size of , respectively. After each Convolutional Neural Network layer
Max Pooling Layer: I have added the max pooling layer with the pooling size of . The Max pooling layer will extract the maximum value from the feature map patches, and hence the feature maps obtained from the Convolutional Neural Network will be down-sampled.
Dropout Layer: The dropout layer with a probability of 0.5 signifies that during training time, fifty percent of the neurons in the layer will be deactivated randomly to reduce the chance of overfitting.
The below code snippets depict code for creating the Model as discussed above.
After model creation, we display the summary in the output cell. The model summary consists of parameters trainable or non-trainable, layer name, and output shape of the layer. The summary of the created Model is shown below:
Output
Step 4. Compile and Training the Model
In this section, we are going to compile our Model. But first, we must specify the loss function, optimizer, and metrics to compile the Model.
- Categorical_crossentropy as our loss function because our dataset is multilabel.
- Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).
- For metrics, we have used accuracy for simplicity. You can use any metric based on your problem statement.
The below snippets depict the code for model compilation.
After successfully compiling the Model, our final step is to train the Model. First, the dataset will be split into two sets, i.e., training and testing sets. The argument validation_split denotes the ratio by which the dataset will be split. In our case, it is 0.1, which signifies that ten percent of the dataset will be used for testing, and the remaining ninety percent will be used for training the Model with a batch size of 128. The below snippets depict the code for model training.
Finally, while executing the training code snippets, we will see the outputs as shown below image.
Output
Step 5. Saving Model Weights
Once the model training is completed, we need to save the model weights. We can use the save weights for deploying the Model, or we can retrain the Model if the prediction from the Model is not up to the mark. The below snippets will save the model weights after training in .h5 format, which can be used for making predictions.
Step 6. Saving Model Architecture
Model architecture is one of the basic building blocks of the Model. The architecture of the Model specifies the model weights. The below snippets depict the syntax to save the model architecture in JSON format.
Syntax
Argument
kwargs: Additional keyword arguments which will be passed to json.dumps()
The below snippets will save the model architecture in a JSON file.
Step 7. Saving Optimizer
Optimizers are used to adjust the parameters for a model for our Model. We have specified Adam as an optimizer to adjust the weight of our Model during the training process. The snippets below will save the optimizer of the model weights and the arguments passed to the optimizer while initializing the weights in the numpy file and saving it.
Saving and Loading only Architecture
This section will discuss how we Save and Load the model architecture. Model architecture is the basic building block of any Machine Learning (ML) or Deep Learning (DL) model because the model weights, shape, and other parts are constructed based on the model architecture. Saving Model architecture allows us to use different weights, but models should have the same architecture. The Saving and Loading model architecture comprises 6 steps discussed below.
Step 1: Importing Libraries
The code snippets import the libraries we will use to create the Model in Keras.
Step 2. Data Preprocessing
For the explanation purpose, I have used the MNIST-Digit dataset, which consists of 60,000 greyscales (composed of only one channel) images of handwritten digits from 0-9. The dataset is split into two sections, i.e., the Train set, which consists of 50,000 images, and the Test set, which consists of 10,000 images.
I have also preprocessed the dataset by normalizing and converting the labels into categorical values and reshaping the dataset into a shape because we will implement the Convolution Neural Network 2D model.
The dataset is loaded from the Keras MNIST-Digit dataset, which is already divided into training and testing sets. Then, the dataset is scaled by dividing each pixel by 255 so that the value of each pixel is between 0 and 255 and reshaped into . The below code snippets depict the preprocessing steps of the MNIST Digit dataset.
Dataset Shape and Test and Training Samples are shown below:
Output
Step 3. Model Creation
After preprocessing the dataset, it's time to create the Model we will train. For the sake of simplicity, I have constructed a simple 2D-Convolutional Neural Network model with the following layers and activations:
Classification Layer: Since we have ten classes ( 10 different objects to classify), we have added ten neurons in the classification layer, which comprises one dense layer with ten neurons and the activation function of Softmax. Softmax is the activation function that outputs the probability of the class associated with the input that is ranged between 0 and 1.
Convolutional Neural Network: The model architecture also constitutes two Convolutional Neural Network Layers with feature maps of 64 and 128 with a kernel size of , respectively.After each Convolutional Neural Network layer
Max Pooling Layer: I have added the max pooling layer with the pooling size of . The Max pooling layer will extract the maximum value from the feature map patches, and hence the feature maps obtained from the Convolutional Neural Network will be down-sampled.
Dropout Layer: A dropout layer with a probability of 0.5 signifies that during training time fifty percent of the neurons in the layer will be deactivated randomly to reduce the chance of overfitting.
The below code snippets depict code for creating the Model as discussed above.
After model creation, we display the summary in the output cell. The model summary consists of parameters trainable or non-trainable, layer name, and output shape of the layer. The summary of the created Model is shown below:
Output
Step 4. Compile the Model
In this section, we are going to compile our Model. But, first, we must specify the loss function, optimizer, and metrics for compiling the Model.
- Categorical_crossentropy as our loss function because our dataset is multilabel.
- Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).
- For metrics, we have used accuracy for simplicity. You can use any metric based on your problem statement.
The below snippets depict the code for model compilation.
Step 5. Saving Model Architecture
After compiling the Model, we extract the model architecture using the model.to_json() function, giving Output as the JSON. The Output is then saved in the JSON file, which will be loaded into the Model in the next steps.
Step 6. Loading Model Architecture
After loading the Model from the JSON file that stores the model architecture, we display the summary in the outputs. The below code snippets depict the syntax of loading the model architecture from a JSON File.
Syntax
Argument
filepath: Path where the files will be saved.
Type: String
custom_objects: Name of the custom object or function.
Type: Dictionary
The below code snippets depict how we can load the model architecture only, which is already saved in the file.
After model creation, we display the summary in the output cell. The model summary consists of parameters trainable or non-trainable, layer name, and output shape of the layer. The loaded model architecture is shown below :
Output:
Step 7. Compile and Training the Model
In this section, we are going to compile our Model. But, first, we must specify the loss function, optimizer, and metrics for compiling the Model.
- Categorical_crossentropy as our loss function because our dataset is multilabel.
- Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).
- For metrics, we have used accuracy for simplicity. You can use any metric based on your problem statement.
The below snippets depict the code for model compilation.
After successfully compiling the Model, our final step is to train the Model. First, the dataset will be split into two sets, i.e., training and testing sets. The argument validation_split denotes the ratio by which the dataset will be split. In our case, it is 0.1, which signifies that ten percent of the dataset will be used for testing, and the remaining ninety percent will be used for training the Model with a batch size of 128. The below snippets depict the code for model training.
Finally, while executing the training code snippets, we will see the outputs as shown below image. The loaded model architecture is trained and has new weights.
Output
Saving and Loading Only Weights
Step 1: Importing Libraries
The code snippets import the libraries we will use to create the Model in Keras.
Step 2. Data Preprocessing
For the explanation purpose, I have used the MNIST-Digit dataset, which consists of 60,000 greyscales (composed of only one channel) images of handwritten digits from 0-9. The dataset is split into two sections, i.e., the Train set, which consists of 50,000 images, and the Test set, which consists of 10,000 images.
I have also preprocessed the dataset by normalizing and converting the labels into categorical values and reshaping the dataset into a shape because we will implement the 2D-Convolution Neural Network model.
The dataset is loaded from the Keras MNIST-Digit dataset, which is already divided into training and testing sets. Then, the dataset is scaled by dividing each pixel by 255 so that the value of each pixel is between 0 and 255 and reshaped into . The below code snippets depict the preprocessing steps of the MNIST Digit dataset.
The dataset Shape Test and Training Samples are shown below.
Output
Step 3. Model Creation
After preprocessing the dataset, it's time to create the Model we will train. For the sake of simplicity, I have constructed a simple 2D-Convolutional Neural Network model with the following layers and activations:
Classification Layer: Since we have ten classes ( 10 different objects to classify), we have added ten neurons in the classification layer, which comprises one dense layer with ten neurons and the activation function of Softmax. Softmax is the activation function that outputs the probability of the class associated with the input, ranging between 0 and 1.
Convolutional Neural Network: The model architecture also constitutes two Convolutional Neural Network Layers with feature maps of 64 and 128 with a kernel size of , respectively. After each Convolutional Neural Network layer
Max Pooling Layer: I have added the max pooling layer with the pooling size of . The Max pooling layer will extract the maximum value from the feature map patches, and hence the feature maps obtained from the Convolutional Neural Network will be down-sampled.
Dropout Layer: Dropout layer with a probability of 0.5 signifies that during training time, fifty percent of the neurons in the layer will be deactivated randomly to reduce the chance of overfitting.
The below code snippets depict code for creating the Model as discussed above.
After model creation, we display the summary in the output cell. The model summary consists of parameters- trainable or non-trainable, layer name, and output shape of the layer. The summary of the created Model is shown below:
Output
Step 4. Compile the Model
In this section, we are going to compile our Model. But first, we must specify the loss function, optimizer, and metrics to compile the Model.
- Categorical_crossentropy as our loss function because our dataset is multilabel.
- Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).
- For metrics, we have used accuracy for simplicity. You can use any metric based on your problem statement.
The below snippets depict the code for model compilation.
Step 5. Saving Weights before Training
In this section, we will save the model weights without training. The weights are not optimized and will be random values or noise. The below code snippets show the code snippets for saving the model weight before training.
Step 6. Saving Weights During Training
After successfully compiling the Model, our final step is to train the Model with the callback. The dataset will be split into two sets, i.e., training and testing sets. The argument validation_split denotes the ratio by which the dataset will be split. In our case, it is 0.1. It signifies that ten percent of the dataset will be used for testing, and the remaining ninety percent will be used for training the Model with a batch size of 128.
It is important to save model weights after every epoch or a specific interval. It enables us to have fallback options or backup weights if the model training stops in between due to some unfavorable circumstances. That's where Model checkpointing comes into action. Model checkpointing saves the model weight after every epoch or after a specific interval which can be used to deploy the Model or resume the training. Below code snippets are for saving the Model after every epoch.
Output
Step 7. Saving Weights After Training
Once the Model has been trained, it is time to save the Model. The below code snippets show how we can save the Model once the Model is trained.
Step 8. Loading Weight and Evaluating
Since we have saved the model weights, now it is time to load the saved Model and make a prediction from the saved Model.
Output
Saving and Loading Entire Models
In this section, I will discuss how we can Save and Load full Keras models, i.e., with their architecture, weights, and optimizer, all at once. The save Model can be used for many use cases. The most common of them is for deployment and resuming the model training. The whole process of Saving and Loading the Model comprises 6 steps discussed below.
Step 1: Importing Libraries
The code snippets import the libraries we will use to create the Model in Keras.
Step 2. Data Preprocessing
For the explanation purpose, I have used the MNIST-Digit dataset, which consists of 60,000 greyscales (composed of only one channel) images of handwritten digits from 0-9. The dataset is split into two sections, i.e., the Train set, which consists of 50,000 images, and the Test set, which consists of 10,000 images.
I have also preprocessed the dataset by normalizing and converting the labels into categorical values and reshaping the dataset into a shape because we will implement the Convolution Neural Network 2D model.
The dataset is loaded from the Keras MNIST-Digit dataset, which is already divided into training and testing sets. Then, the dataset is scaled by dividing each pixel by 255 so that the value of each pixel is between 0 and 255 and reshaped into . The below code snippets depict the preprocessing steps of the MNIST Digit dataset.
The dataset Shape Test and Training Samples are shown below.
Output
Step 3. Model Creation
After preprocessing the dataset, it's time to create the Model we will train. For the sake of simplicity, I have constructed a simple 2D-Convolutional Neural Network model with the following layers and activations:
Classification Layer: Since we have ten classes ( 10 different objects to classify), we have added ten neurons in the classification layer, which comprises one dense layer with ten neurons and the activation function of Softmax. Softmax is the activation function that outputs the probability of the class associated with the input that is ranged between 0 and 1.
Convolutional Neural Network: The model architecture also constitutes two Convolutional Neural Network Layers with feature maps of 64 and 128 with a kernel size of , respectively. After each Convolutional Neural Network layer
Max Pooling Layer: I have added the max pooling layer with the pooling size of . The Max pooling layer will extract the maximum value from the feature map patches, and hence the feature maps obtained from the Convolutional Neural Network will be down-sampled.
Dropout Layer: The dropout layer with a probability of 0.5 signifies that during training time, fifty percent of the neurons in the layer will be deactivated randomly to reduce the chance of overfitting.
The below code snippets depict code for creating the Model as discussed above.
After model creation, we display the summary in the output cell. The model summary consists of parameters trainable or non-trainable, layer name, and output shape of the layer. The summary of the created Model is shown below:
Output
Step 4. Compile and Training the Model
In this section, we are going to compile our Model. But first, we must specify the loss function, optimizer, and metrics to compile the Model.
- Categorical_crossentropy as our loss function because our dataset is multilabel.
- Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).
- For metrics, we have used accuracy for simplicity. You can use any metric based on your problem statement.
The below snippets depict the code for model compilation.
After successfully compiling the Model, our final step is to train the Model. First, the dataset will be split into two sets, i.e., training and testing sets. The argument validation_split denotes the ratio by which the dataset will be split. In our case, it is 0.1, which signifies that ten percent of the dataset will be used for testing, and the remaining ninety percent will be used for training the Model with a batch size of 128. The below snippets depict the code for model training.
Finally, we will see the outputs below while executing the training code snippets.
Output
Step 5. Saving Model
After training the Model, the next step is to save the Model. Below I have depicted the syntax for saving the Keras model.
Arguments:
filepath: Path where the files will be saved.
Type: String
overwrite: Whether to overwrite the file if it silently exists in the specified directory.
Type: Boolean
save_format : Whether to save the model in .h5 format or in .tf format. By default, the format is tf.
Type: String
signatures: Signature while saving the model. Only applicable when the saved format is tf.
Type: String
options: It is only applicable to the SavedModel format. Objects which specify options for saving to SavedModel.
Type: String
save_traces: It is only applicable to the SavedModel format. When enabled, the SavedModel will store the function traces for each layer. If this argument is disabled, only the configs of each layer are stored. By default, it is set to True. Disabling this will decrease serialization time and reduce file size, but it will require that all custom layers/models implement a get_config() method.
Type: Boolean
In this scenario, I will simultaneously save the whole model components, i.e., its weights, architecture, epochs, and optimizer state.
Step 6. Loading Model
Since we have saved the Model in Step 4 now, it is time to load it and make the prediction on the test dataset. The below syntax depicts how to load the Model.
Syntax
Optional dictionary mapping names (strings) to custom classes or functions to be considered during deserialization.
Arguments
filepath: Location of the file where the saved Model is stored.
Type: String
custom_objects: It is optional. If we have implemented any custom object or function, we need to specify the respective object or function as a dictionary.
Type: Dictionary
compile: Whether to compile the Model while loading by default, it is set to True.
Type: Boolean
The code snippets below depict the code for loading the saved Model, printing the summary, and evaluating the model performance on the Test Dataset.
Output
Conclusion
In this article, we have studied Model Saving and Model Loading. The following are the key takeaways from this article:
- Any Deep Learning Model composed of Architecture, Weights, and Optimizer State concerning the Saving and Loading Model
- We can save the model component independently, i.e., we can save weights, architecture, or optimizer state separately.
- After saving independently, we can combine them to restore the last model state with the help of weights, optimizer, and Model architecture.
- Model weights can be saved before the training procedure, during training, and as well as after the training procedure.
- We can retrain or resume the Model if the model weights are saved.