Heapsort with Max-Heap and Heapify - Thorough Guide and Animated Walk-Through

Let's visualize the full process of converting an array to a max-heap and sorting the heap with Heapsort.

Heapsort is a comparison-based sorting algorithm that relies on maintaining a max-heap to quickly find the largest value on each iteration.

Heapsort has an O(n log n) runtime, and, since sorting is performed in place, space complexity is constant O(n) total - O(1) auxiliary.

This visualization is a bit more complex with multiple moving parts, so make sure to pause/manually step through the code highlighter as necessary.

*Note: you will likely need to be somewhat familiar with tree data structures to follow along.

Though heapsort can seem complex at first glance, once we break down each piece, it's actually a fairly intuitive and accessible algorithm.

Create a max-heap first

Let's start by creating a max-heap. A max-heap is a tree data structure where the left and right child nodes are always less than the parent.

We'll get the middle element of the array that we want to sort: length // 2 - 1

Next, we loop in the range between this middle element to the first element and run heapify on every iteration.

Then, finally, on line 25 we return the array once it has been converted to a max-heap.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

Let's heapify our heap

Heapify essentially loops through the entire tree making sure each tree and subtree are valid heaps.

This is a max-heap:

    6
/ \
4 5
/ \
2 1

*Note: min-heap is merely the reverse of a max-heap

This is NOT a heap:

    2
/ \
6 3
/ \
1 4

So in the snippet below, we start at the middle element in our unsorted array minus one. All items on the right side are leaves so we don't need to start on them:

Why are all leaves on the right?

A tree can be represented as a flat array. Let's look at an example:

[ 1, 3, 5 ]

i = 0
left = i * 2 + 1
right = i * 2 + 2

So, our sample array could also be represented as this tree:

  1
/ \
3 5

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

Our array data

60411523032541655061473819

We start our heapify function at the middle item minus 1 (left-side are all subtree roots, right-side are all leaves).

Next, we start by initializing largest to the root node (i) and hold reference to the left and right leaf positions:

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

array

60411523032541655061473819

First iteration starts at the last possible root and we get reference to it's child leaves (in this case it only has a single child).

If the left node exists and is a bigger number than the current largest, then we go ahead and change largest to left instead of root.

Same with the right node, if the right is larger than either left or root, then we set it as the largest node.

So essentially we look at the entire subtree (containing 3 nodes; root, left, and right) and figure out which node is the largest of the 3.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

Now if largest is not the root node (i) then we need to go ahead and swap the largest node up to the root in-order to have a valid max-heap within this subtree.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

Before heapify

601182

Before we run heapify we have our root:6, and two child nodes (left:1 and right:8)

- tree -
    6
   / \
  1   8

After heapify

801162

Once we swap the root and right nodes, we now have a proper heap

- tree -
    8
   / \
  1   6

On line 16 we checked if largest != i, this is because if there were no swaps (i.e., the root node (i) was already the largest node) since we travel from the end upwards, this means there is no reason to check any nodes further down, all nodes below this one are guaranteed to be correct.

However, if there WAS a change, a swap between another node means we need to continue checking down the path of the swapped node.

So we only continue "heapifying" if there was a swap, and notice we run heapify AFTER the swap, which means the next iteration is going to use the swapped node as the new root node - we run heapify on i which, after swapping, is now a child node instead of the root.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

And that wraps up the max-heap part of our algorithm - once all iterations are done we will have a perfect max-heap where all child nodes are smaller than their parents.

Breaking down the heapsort algorithm

The hardest part of this algorithm was the heapify method discussed above, as you can see below, heapsort simply utilizes the heapify method again along with one more swap:

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

We iterate through our heap, starting at the end and traveling backward.

We start by swapping the first element with the i'th element. Remember, since we are using a max-heap this first element is the guaranteed highest number in our heap, so we swap it with the last number in our heap.

After swap

10301162143254651564738509

After swapping with first and last item, we have 1 number completely sorted (50). However, our max-heap is no longer valid and we need to run heapify again.

However now our heap is no longer a max-heap, so we need to run heapify again to get our heap back in order.

Notice that we change the heapify length argument to i on each iteration. This allows heapify to ignore the sorted end of the array.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap
    

After heapify

30025116214314651564738509

Once again, we have a proper max-heap structure. We can simply swap the first element up behind 50 now to have 2 numbers sorted.

Once we have a complete heap again, we can guarantee that the first item in our heap is once again the biggest possible number, so we swap it to the end, right behind our previous number. And now we have 2 items sorted!

After swapping on the 2nd iteration

30251162143146515647308509

We now have two items fully sorted.

We can continue iterating through until our array is fully sorted.

heap_sort

class Heap:
    @staticmethod
    def _swap(array, a, b):
        array[a], array[b] = array[b], array[a]

    def _heapify(self, heap: [int], i: int, length: int):
        largest = i
        left = i * 2 + 1
        right = i * 2 + 2

        if left < length and heap[left] > heap[largest]:
            largest = left
        if right < length and heap[right] > heap[largest]:
            largest = right

        if largest != i:
            self._swap(heap, i, largest)
            self._heapify(heap, i, length)

    def create_max_heap(self, array: [int]):
        length = len(array)
        start = length // 2 - 1
        for i in range(start, -1, -1):
            self._heapify(array, i, length)
        return array

    def heap_sort(self, heap):
        end = len(heap) - 1
        for i in range(end, -1, -1):
            self._swap(heap, 0, i)
            self._heapify(heap, 0, i)
        return heap

Author:

*no spam, just fresh content