Heap Sort
This article delves into the Heapsort Algorithm, a prominent and efficient sorting technique in computer programming. It operates by creating a min-heap or max-heap from the array's elements, where the root signifies the minimum or maximum value. Heap sort involves two primary recursive operations: building a heap (H) using array elements and repetitively deleting the heap's root element. Before delving deeper into heap sort, it's essential to understand the concept of a heap. Essentially, heap sort visualizes array elements as a special complete binary tree, facilitating the sorting process and resulting in an ordered array.
What is Heap Sort?
Heap sort is an in-place (an algorithm where no extra memory space is needed, and the sorting is done using the available place) comparison-based sorting algorithm. Heap sort, as the name suggests, makes use of heap data structure to sort the given array. It is one of the most preferred sorting algorithms due to its favorable worst-case time complexity.
Similar to selection sort, heap sort also divides the given array into two regions –
- Sorted region – built on the right side of the list/array
- Unsorted region – the rest of the unsorted list/array
Initially, the sorted region is left empty, and we only have the unsorted region in the array. The largest element from the unsorted region is picked iteratively and added to the sorted region.
The elements in the unsorted region of the array are visualized by using a unique type of complete binary tree known as a heap.
Heap sort differs from the selection sort in a way that heap sort does not go through the unsorted region linearly, instead, it arranges the unsorted region in a heap data structure that helps in finding the desired element quickly & efficiently.
To better understand the above statement, let us first understand what binary trees, complete binary trees, and heaps are.
What are Binary Trees?
A binary tree is a tree data structure whose all nodes have either zero, one, or at most two children nodes. These two children are generally referred to as left and right child respectively.
The top-most node is known as root node, while the nodes with no children are known as leaf nodes.
Notice how the nodes A, B & C contain at most two child nodes. Therefore, the above-given diagram represents a binary tree.
Let us now understand how a binary tree can be represented in an array.
Array Representation of Binary Trees
One of the most interesting properties of a binary tree is that we can arrange all its elements of it in an array. And, the indexes of the elements in this array can be used to find the parent or children of any node.
To further explain this, consider the above-given tree. It has 7 nodes, and each node can be arranged as an element of an array –
Now, if the index of a given element is i,
The element at this index will give the left child of the element at i.
The element at this index will give the right child of the element t i.
The element at the lower bound of this index gives the parent of the element at i.
Consider i = 0. Placing 1 in the above equations will give:
- . Here, it is clear that element t index 3 i.e. D is the left child of the element at index 1 i.e. B.
- . Here, according to the equation, the element at index 4 i.e. E is the right child of the element t index1 i.e B.
- . Here, according to the equation, the element at index 0 i.e. A is the parent of the element at index 1.
Array implementation of binary trees is important since it helps in understanding the heap data structure.
Complete Binary Tree
A complete binary tree is a binary tree in which all the elements are arranged without missing any sequence.
Consider a binary tree –
If we were to represent it as an array, we can not do the following –
Doing this will violate the equations that tell us the index of parents and children of a given node.
Notice how according to the equations, E should be the right child of B, which is not true.
Thus, to preserve the equations, the array should rather be formed this way –
In the above-given diagram of the array, there are empty spaces between the elements. Therefore, according to the definition of a complete binary tree, we can say that the above-given tree is not a complete binary tree.
The following trees are complete binary trees since they have no empty spaces in them.
All full binary trees are complete binary trees, however, all complete binary trees are not full binary trees.
Now let’s move on to understanding heap data structure.
What is Heap Data Structure?
A heap is a special type of binary tree that follows the following conditions –
- The given binary tree should be a complete binary tree.
- It should satisfy the heap property. It means that all the children of a given node must be greater than the parent node, or all the children must be smaller than the parent node.
If all the nodes (including the root) are greater than their child nodes, it is known as a max-heap. But if all the nodes (including the root) are smaller than their child nodes, it is known as a min-heap.
Now that we understand what heaps are, let us move on to understand the working of the Heap sort algorithm.
Working of Heap Sort Algorithm
To sort an array in ascending order, the working of the heap sort algorithm can be explained in the following steps –
- For the given set of elements, create a max-heap by visualizing all the elements of the array in a binary tree.
- Heapify the binary tree using the elements in the unsorted region of the array.
- Heapify is the process of rearranging the elements to form a tree that maintains the properties of a heap.
- Here, we have to build a max-heap since we want to sort the array in ascending order. Heapifying helps us maintain the property that every node should have a greater value than its children nodes. We will discuss heapify in detail.
- Once the heap is formed, delete the root element from the heap, and add this element in the sorted region of the array. Here, since we are removing the root element from a max-heap, we will obtain the largest element from the unsorted region each time an element is removed
- Repeat steps 2 & 3 until all the elements from the unsorted region are added to the sorted region.
How to "Heapify" a Binary Tree?
Heapify is the process of rearranging the elements to form a tree that maintains the properties of the heap data structure.
After inserting the elements into a heap, they may or may not satisfy the heap properties. In that case, we need to rearrange the locations of the elements in the erroneous heap to make it a heap again. Refer heap data structure article.
Recall the list/array that had the elements – 10, 8, 5, 15, 6 in it. To heapify these elements, and form a max-heap, let us follow the under-given steps –
- Visualize all the elements of the list as a complete binary tree To visualize an array as a binary tree, refer to the part where we have discussed the array representation of the binary tree.
Notice how the above-given binary tree is a complete binary tree but does not satisfy the properties of a max-heap since element 8 has an element greater than itself as its child.
- Start by comparing the values of children nodes with that of the parent. If the value of the parent is smaller than the values of the children, swap it. Swapping is done with a larger of two children. This process is repeated until every node satisfy the properties of a max-heap Here, we start comparing 8 with 15 and 6. Now, since 15 is greater than 8, we will swap their positions.
Again, the property of max-heap is not satisfied since 15 is greater than 10. Therefore, we will once again perform the above step.
Now that we have obtained a max-heap, we can stop this step.
One interesting thing to note here is that a node can be heapified if all the children nodes are already heapified. This is the reason why we start from the bottom-most sub-tree.
This step is performed using recursion. We create a function called heapify() that is run on the root node (0th index) and can be called recursively until a max-heap is obtained.
This function works by dividing the tree into smaller sub-trees and then comparing the values of parents with that of children in each sub-tree.
How to Delete Elements From a List?
Let us consider the following max-heap that we created in the last step-
To delete the elements from a heap, we follow the under-given steps –
- Swap the root element with last element and then disconnect last element from the heap-
Here, we have removed the root element that is 15 from the heap and swapped its position with the last element that is 6.
By doing this, we now have an array with both the sorted and the unsorted regions in it. Notice how we picked the largest element from the heap and placed it at its corrected sorted position. For the first iteration of heap sort, 15 is sorted.
But now, the remaining heap is not a max-heap anymore. So, as the next step, we should heapify it once again.
- Heapify the tree again
Here, we have once again heapified the given tree to form a max-heap.
Now, to sort the remaining array, we can repeat the above-given steps until there are no more elements left in the tree –
We have sorted the given array using the heap sort algorithm.
Implementation of Heap Sort Algorithm
1. Java Program for Heap Sort
2. Python Program for Heap Sort
3. C Program for Heap Sort
4. C++ Program for Heap Sort
Heap Sort Complexity
1.Heap Sort Time Complexity
The height of a complete binary tree is always equal to logn. This is because, for a given node in a binary tree, there can be at most 2 child nodes. This further drives us to the explanation that at each level or height of a binary tree, the number of nodes will be almost equal to half of the number of nodes present in the next level. To put it simply, in a binary tree, the number of nodes at each level is almost double the number of nodes at the previous level.
It means, for a binary tree with the height h,
Total number of nodes,
From mathematical induction, we also know that
Therefore,
Therefore, the height of a binary tree is almost equal to .
This indicates that if the number of nodes doubles, the tree will increase by only one level. From this, we can conclude that the complexity of the heapify() function would be nearly equal to O(logn).
Now, to perform sorting, the heapify() function is called at most n times. So, the total complexity of heapSort() would be equal to (n) * logn ~ O(nlogn).
Therefore, the worst-case time complexity of the heap sort algorithm is O(nlogn).
Now let us discuss the time complexity in best, average, and the worst case.
- Worst Case: O(nlogn). The worst case occurs when we want to sort a list in ascending order, but it is arranged in descending order
- Average Case: O(nlogn). Average case is when the list is arranged in a jumbled order.
- Best Case: O(n). The best case occurs when the list is already arranged in the desired order. In this case, there is no need to heapify the tree. It removes the factor of logn from the complexity resulting in a complexity of O(n).
2. Heap Sort Space Complexity
The space complexity of the heap sort algorithm is O(1). This is because we have used a fixed number of variables, and we do not need any extra memory space apart from the loop variables and auxiliary variables that include temp, n, index, and largest. It means that we can sort an array containing a thousand elements even without using any extra variable.
Code with Confidence! Enroll in Scaler Academy's Data Structures Course and Master Efficient Algorithms.
Conclusion
Heap Sort exhibits consistent performance. As in the worst-case performance, average-case performance complexity is the same, O(n log n) and memory usage in heap sort is minimal.
Takeaways
Complexity of heap sort
- Time complexity - O()
- Space complexity - O(1)