Customizing Training Loops in Keras with TensorFlow

Learn via video courses
Topics Covered

Overview

The fit approach is typically utilized when training models in Keras. On the other hand, you should have more control over the training process. A custom training loop must be created to accomplish that. It includes creating a custom capability to figure out more fine-grain control during the forward and backward propagation. You will learn how to do that by reading this article.

Pre-requisites

To extract all the information theoretical as well as the practical, the readers should have the basic idea related to the following topic:

  • Loss, Optimiser, and Metric in Tensorflow/Keras.
  • Data Input Pipelines in Tensorflow/Keras.

Introduction

Deep learning models can be built and trained with ease using TensorFlow.However, we may need to implement low-level operations to speed up or alter the default behavior, such as a custom loss function or any other logic which does not come under the predefined classes of function or classes.

This article will look at how to write a custom TensorFlow-Keras training loop slightly faster than the model.fit() on average.

What is a Customized Training Loop?

Keras is a high-level library, among all the other deep learning libraries, and we all love it for that. It abstracts most of the functions that TensorFlow brings to data on GPU.

Keras already has a predefined training loop that can be used in most projects or problem statements but comes with a predefined function. Customized training loops and predefined .fit have the same objective, but the custom training loop gives more flexibility in coding and integrating new logic. In addition, it gives full flexibility to have fine-grain control over the whole training process.

Why is Customizing Training Loops Important?

To have fine-grain control over the whole training process, we need to write our custom training loops for iterating over the batches of data fitting the dataset present in the batches. It has the same objective as the .fit, but by utilizing a custom training loop, we can have control of every minute point/step while training Machine Learning (ML) or Deep Learning (DL). Furthermore, the custom Training Loop allows us to incorporate custom loss functions, custom metrics, and other customized logic per our problem statement.

Creating the Custom Loop

As we have discussed in the above section, what a customized training loop is and why customization of the training loop is important in Keras, in this section, we will learn how we can implement the customized training loop in Keras. You can build loss functions that rely on information outside your direct batch of training inputs and labels or keep track of various model performance measures with greater ease by writing custom training loops. We can think of it as an addition to your model extension of our neural network tools.

Step 1. Importing Required Libraries

The first step is to import the required libraries. The snippets import the required libraries for Keras so that the classes and functions can be implemented in our code.

Step 2. Data Loading

For the explanation purpose, I have used the MNIST-Fashion dataset, which consists of 60,000 greyscales (composed of only one channel) images of handwritten digits from 0-9. The dataset is split into two sections, i.e., the Train set, which consists of 50,000 images, and the Test set, which consists of 10,000 images.

  • The dataset is scaled by dividing each pixel by 255 so that the value of each pixel is between 0 and 255 and then reshaped into 2828128*28*1

  • The dataset is preprocessed by normalization, converting the labels into categorical values, and reshaping the dataset into 28,28, and 1 shapes because we will implement the Convolution Neural Network 2D model.

The dataset is loaded from the Keras MNIST-Fashion classification dataset which is already divided into training and testing sets, so the requirement of dataset splitting into train and validation is already satisfied.

The below code snippets depict the preprocessing steps of the MNIST Fashion dataset.

The dataset Shape and Samples are shown below:

Output:

Step : 3 Creating Data Pipeline

The Data Input pipeline enables us to utilize the underlying function of Keras API, i.e., utilizing GPU by parallel processing, reducing training time. Most importantly, if we train the model using the Data Input Pipeline, we can easily transport the model to the production or UAT environment. In other words, a data pipeline is a set of tools used to extract, transform, and load data from one or more sources into a target system. It’s divided into three steps: data sourcing, processing, and delivery.

In this section, we will create a data input pipeline that will implement the following function:

batch(): Batch size is a term used in machine learning and refers to the number of training examples utilized in one iteration. The batch function groups the data samples into batches, which will be fed to the .fit function while training the model.

shuffle(): The shuffle function is used to shuffle the dataset.

map(): The map function is used to preprocess the samples in the dataset. We can use the lambda function inside the map or the custom/TensorFlow function inside the map function. The called function will be applied to all the samples in the dataset.

prefetch(): The prefetch function is used when we want to keep the next batch/batches ready so that once the GPU iterates over the current batch working on the forward and backward propagation, the next batch is immediately available. We can use this concept where the bached dataset is produced by the CPU and consumed by the GPU. We need to add an object. prefetch(1) at the end of the pipeline (after batching) so that at least one batch is ready at any time. We can even prefetch more than one batch.

In the below code snippets, I will load the MNIST-Fashion dataset into tf.data pipeline. I have created two pipelines, one for Test data and one for Train data, by implementing tf. data.Dataset.from_tensor_slices- loads the dataset directly from a NumPy array.

In the below snippets, I have made the data input pipelines in Keras for the MNIST Fashion classifier model. Scaling data in deep learning is a common practice because weights and biases of the network are initialized to small numbers between 0 and 1. So first, the function preprocesses the image and labels it as an argument. The image pixel is then divided by 255 so that the value of the image pixel is between 0 and 1. The process is also known as pixel scaling, and finally, the images are reshaped into 28*28*1 and returned with the labels.

The map function is called on the dataset input pipeline, invoking the preprocess function for all the samples present after preprocessing the dataset. The map function will send one sample at a time to the preprocess function. The next step is shuffling the dataset, i.e., randomly arranging the dataset sequence to rule out the association between the dataset sample. Finally, we are converting the dataset into a batch size of 32 and applying prefetch with AUTOTUNE. Prefetch will keep at least one batch ready at any point so that there is no delay while feeding the batches into the training phase of the model.

The identical data input pipeline is applied for the Test data pipeline, but we are not shuffling the dataset samples and are not Prefetching any of the batches.

Step 4: Visualization of the Dataset

Data visualization is one of the most important parts of any Machine Learning (ML) / Deep Learning (DL) project. So, here I have visualized the dataset using the matplot library.

I will plot 10 data samples from the training dataset, creating a subplot with two columns and five rows. In the below code, I am extracting ten samples from the training dataset and displaying them along with their associated labels using the matplot library. The code snippets are shown below for the training and testing sets, respectively:

Output: Output

Step 5: Model Creation

Now, after preprocessing the dataset, it’s time for the creation of the model which we will train. For simplicity, I have constructed a simple 2D-Convolutional Neural Network model (CNN).

The neural network expects the above dataset to be in a specific shape. Therefore, when training models with Keras, we pass the shape as image_width, image_height, and number_of_channels. Failure to do this will result in an error such as

ValueError: Exception Input 0 of layer "conv2d" is incompatible with the layer: expected min_ndim=4, found ndim=3.

Input Layer: The first layer of the model is the Input layer which specifies the input shape of the dataset, i.e., None28281None*28*28*1, which specifies the number of samples in the batch, the height and width of the image along with the number of image channels.

Convolutional Layer: The model architecture also constitutes two Convolutional Neural Network Layers with feature maps of 64 and 128 with a kernel size of 33, respectively.

Max Pooling Layers: After each Convolutional Neural Network layer, I have added the max pooling layer with the pooling size of 222*2. The Max pooling layer will extract the maximum value from the feature map patches, and hence the feature maps obtained from the Convolutional Neural Network will be down-sampled.

Dropout Layers: A dropout layer with a probability of 0.5 signifies that during training time, fifty percent of the neurons in the layer will be deactivated randomly to reduce the chance of overfitting.

Classification Layer: The last layer is the Classification Layer. It comprises the activation function known as Softmax, which outputs the probability of the classes associated with the datapoint and has a range between 0-1. This layer has ten neurons because we have to classify ten different objects.

The code snippets for the model are shown below:

After model creation, we display the summary in the output cell. The model summary consists of parameters trainable or non-trainable, layer name, and output shape of the layer. The summary of the created model is shown below:

Output:

Step 6: Training Loop

After creating our model it is time for customizing training loops Keras. The model is divided into interlinked sub-processes in Machine Learning (ML) training. This section will focus on creating the Customized Training Loop in Keras.

Step 6.1: Keras Training Workflow

This is the theoretical part of this section. In this section, we will discuss the training workflow theoretically step by step, which is mentioned below:

Step 6.2: Creating Loss Function

In this step, we will define the loss function. We will use Categorical_Crossentropy because the labels are one hot encoded. If labels are not one-hot encoded, Categorical_Crossentropy is used instead. The goal is to reduce the errors between the true and predicted values. The Categorical_Crossentropy function takes probability predictions and returns the average loss. We have used from_logits to False because we have created a model in such a way that the classification layer has softmax

Step 6.3: Creating Optimiser

An optimizer function uses the computed gradients to adjust the model weights and biases to minimize the loss. This iterative process aims to find the model parameters that result in the least error. We apply the common Adam optimizer function. Adam was selected as the optimizer to propagate the error backward. Adam is an extension of the Stochastic Gradient Descent and a combination of the Root Mean Square Propagation (RMSProp) and Adaptive Gradient Algorithm (AdaGrad).

Step 6.4: Creating Metric for Measure

Functions called Keras metrics are used to measure how well your deep learning model works. The metric selection in Machine Learning depends on the model objective and problem statement. For example, I have shown CategoricalAccuracy() for our Test and Training Set scenario.

Step 6.5: Graph Creation

In the above section, I have declared the required individual component we will use to train our model. In this section, I will combine and explain to you how it works and how the training graph will be created.

The training loop feeds the training images to the network while computing the metrics. We have used the CategoricalAccuracy to compute the accuracy because the labels are one hot encoded, not integers. If labels are integers, the SparseCategoricalAccuracy is used. The breakdown of the Training steps is shown below:

  • Pass the training data to the network for one epoch.

  • Obtain the training images and labels for each batch.

  • Activate the Gradient.tape()

  • Predict the model by passing the dataset into the model

  • Calculate the Loss by computing Categorical_Crossentropy from prediction and labels

  • Deactivate the Gradient.tape()

  • Calculate the Gradient

  • Update model parameters using the Adam optimizer.

  • Calculate the metric value and update the respective metric for visualizations.

  • Repeat the process for the specified number of epochs.

After the end of the epoch, we are also evaluating the model so that we have an idea about the model performance after every epoch. That is why I am Validating the model after every epoch. The process of the model validation on the Test dataset is shown below:

  • Pass the testing data to the network

  • Predict the model by passing the dataset into the model

  • Calculate the Loss by computing Categorical_Crossentropy from prediction and labels

  • Calculate the metric value and update the respective metric for visualizations.

  • Repeat the process for the specified number of samples in the dataset.

In Tensorflow/Keras, every execution is in the form of a graph. Tensorflow Graphs are composed of two components:

  • Edges - It signifies the date the nodes will perform mathematical computation.
  • Nodes - The mathematical operation to be performed.

The metric for training and testing the dataset, optimizer, and loss function object initialization is shown below:

The custom training loop snippets are designed to create a TensorFlow graph by implementing @tf.function in the code snippets below. @tf.function decorator on computationally intensive tasks. This will compile decorated functions into a callable TensorFlow graph which will be executed faster than eager execution.

The above function is for the train_step, and test_step follows the same process discussed above.

After training the model, we can see in the output section that the batches of the data are fitted into the model, and the model's loss decreases after each epoch.

Output:

Step 7: Model Evaluation

We have created and trained our model using Custom Training Loop in the above steps. In this step, we will evaluate our model. For evaluation, we will be using our test set. To evaluate the trained model, we will have to perform the following steps:

  1. Pass the testing data to the network
  2. Predict the model by passing the dataset into the model
  3. Calculate the Loss by computing Categorical_Crossentropy from prediction and labels
  4. Calculate the metric value and update the respective metric for visualizations.
  5. Repeat the process for the specified number of samples in the dataset.

The below code snippets depict the model evaluation process:

Output:

Other Approaches to Customize Training Loops in Keras

We can implement the training of the models in many ways using custom training loops. Above I have explained how we can train a model using a custom training loop using the low-level code of Tensorflow. Below, I have discussed the other two options for accomplishing this task:

  • Tensor Layer: It is a Google project often overlooked and aims to provide the fundamental building blocks of deep learning models without giving up on the minute details.
  • K.function: It has a higher level abstraction than the Tensorflow custom training loops but a low-level abstraction than .fit. It can be implemented to achieve the task, but it has some restrictions because it has high abstraction.

Conclusion

In this article, we have studied the Customizing Training Loops in Keras with TensorFlow. The following are the key takeaways:

  • Keras Customized Training Loop gives us more flexibility and fine-grain control over the training process.

  • Keras Customized Training Loop allows us to add the custom loss function, optimizer.

  • Keras Customized Training Loop always works best with the Data Input pipeline, i.e., when the data is batched. Finally, - The customized Training Loop is more efficient and faster than the predefined .fit method.