# Finding Median Value of an Array

We are given a problem where we are supposed to find the **median** value of an **unsorted** array of size `n`

. Let’s get started.

## What’s Being Asked?

First off it’s important to make clear what is being asked. The **median** is simply defined as the middle. This is different than finding the **mean**. To get the mean we would simply have to add up all the values and divide by the number of values in the array:

1
2

def mean(nums: list):
return sum(nums) / len(nums)

Or written another way:

1
2
3
4
5
6
7
8

def mean(nums: list):
total, length = 0, 0
for n in nums:
total += n
length += 1
return total / length

Both calls to `mean([3, -1, 2, 4, 1, -4, 6, 5])`

, for example, will return value `2.0`

.

Another important piece of information we are given is that the array is **unsorted**. Why is this imporant? Well because if the array was guaranteed to be sorted we could easily obtain the **median** (middle value) by:

1
2
3
4

def median(sorted_nums: list):
midpoint = len(sorted_nums) // 2 - 1
return sorted_nums[midpoint]

Calling `median([-4, -1, 1, 2, 3, 4, 5, 6])`

will give you `2`

.

Note:when referring to the “middle” of an array of even length it’s possible to consider a hi- and lo-middle, however we’ll only consider the lo-middle in this article.

We’re done right? Unfortunately not, because our input array is not sorted. Does it make sense to first sort it? Maybe for a small input, but this could take some serious processing time if your input is very large. Let’s discuss a few approaches we could take.

## Naive Approach

To get the median value of the array `[3, -1, 2, 4, 1, -4, 6, 5]`

we could use knowledge we touched upon in the previous section. We know how to find the index in the middle position. Remember:

1
2
3
4

midpoint = len(sorted_nums) // 2 - 1
print(midpoint)
# => 3

So now we know the 3rd index of our array is where the **median** is, if and only if, our array is sorted already. But our array isn’t sorted, so how does this help us? Well we know that the fourth largest value (since the index starts at 0) will be the **median** value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

def median(nums: list):
target = len(nums) // 2 - 1
count = 0
while count < target:
# find smallest value and remove it..
nums.remove(min(nums))
# keep track of removals..
count += 1
return min(nums)
print(median([3, -1, 2, 4, 1, -4, 6, 5]))
# => 2

Let’s walk through what we did.

- First, we define and set a
`target`

variable, which will tell us how many minimum values we will have to discard from the array before returning with the correct result. - Next, we set a
`count`

variable to 0 to keep track of how many values we have discarded. - Then we enter a while loop, removing the smallest value each time until
`count == target`

. - Finally we return the minimum value of
`nums`

, which is`min(nums)`

.

This is a viable approach that will work with any array with a length greater than 1, but we could do better.

## Divide & Conquer Approach?

What would happen if we used a divide & conquer approach? In other words, dividing the array in half then searching for the 3rd largest value in our array. Writing it out we get:

But now what? It looks like the first steps of **Merge Sort**, doesn’t it? Since we know having a sorted array will help us, let’s continue with this approach in implementing **Merge Sort** in order to access the **median** value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

def median(nums: list):
sorted_nums = mergeSort(nums)
midpoint = len(sorted_nums) // 2 - 1
return sorted_nums[midpoint]
def mergeSort(nums: list):
if len(nums) == 1:
return nums
mid = len(nums) // 2
left = mergeSort(nums[:mid])
right = mergeSort(nums[mid:])
return merge(left, right)
def merge(left: list, right: list):
lLen, rLen = len(left), len(right)
sorted_nums = [0] * (lLen + rLen)
lIdx, rIdx = 0, 0
pIdx = 0
while lIdx < lLen and rIdx < rLen:
if left[lIdx] < right[rIdx]:
sorted_nums[pIdx] = left[lIdx]
lIdx += 1
else:
sorted_nums[pIdx] = right[rIdx]
rIdx += 1
pIdx += 1
while lIdx < lLen:
sorted_nums[pIdx] = left[lIdx]
lIdx += 1
pIdx += 1
while rIdx < rLen:
sorted_nums[pIdx] = right[rIdx]
rIdx += 1
pIdx += 1
return sorted_nums
print(median([3, -1, 2, 4, 1, -4, 6, 5]))
# => 2

This took a lot more work, but in the end we have a sorted array and can easily obtain the **median** value.

**Merge Sort** is an efficient divide & conquer algorithm with the following time/space analysis:

1
2
3
4
5
6

Time:
Worst: O(n * log n)
Best: O(n * log n)
Average: O(n * log n)
Space: O(n)

The additional space involved can be found in the `merge`

function when we create a brand new array to hold the sorted values. However, where Merge Sort excels over Quick Sort is in how it divides the array in half for every level. The bulk of the work is in the `merge`

function.

Note:Analyzing a divide & conquer algorithms isn’t as straight forward as analyzing non-rescursive algorithms where everything is contained in one function. In order to analyze the time complexity you can use either Recursion Tree or the Master Theorem.

## Modified QuickSort Approach

Let’s take a look at another sorting algorithm and exploit a feature that it has over **Merge Sort**. This feature is the **partition** step, in which the **pivot** is moved into the correct position before we split the array to continue the algorithm. Therefore if we know what index the target is (explained above), and we know the pivots position after one partition then this should give us information on how to proceed. There are 1 of 3 cases:

Case | What this means… |
---|---|

`Target index == Pivot index` |
We have found the target index! |

`Target index < Pivot index` |
We can partition the left-hand partition. |

`Target index > Pivot index` |
We can partition the right-hand partition. |

Here’s a diagram showing the first paritioning:

The next step in **QuickSort** is to divide the array into two smaller arrays (left and right ends, excluding the pivot). However since we only care about the target position we only have to consider the right half and continue until the pivot equals the target. Let’s continue with our example:

And we’re finished! Try it yourself with our original example `[3, -1, 2, 4, 1, -4, 6, 5]`

and see if you get `2`

as the answer.

Finally, here is the code used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

def median(nums: list):
"""
a = left-most index
z = right-most index
"""
a, z = 0, len(nums) - 1
target = z // 2
return medianQuickSort(nums, a, z, target)
def medianQuickSort(nums: list, a: int, z: int, target: int):
if a < z:
pIdx = partition(nums, a, z)
if target == pIdx:
# pivot == target, so return value
return nums[pIdx]
elif target < pIdx:
# target < pivot; => inspect left
return medianQuickSort(nums, a, pIdx - 1, target)
else:
# pivot > target; => inspect right
return medianQuickSort(nums, pIdx, z, target)
def partition(nums: list, a: int, z: int):
i = a - 1
for j in range(a, z):
if nums[j] < nums[z]:
nums[j], nums[i + 1] = nums[i + 1], nums[j]
i += 1
nums[z], nums[i + 1] = nums[i + 1], nums[z]
return i + 1
print(median([3, -1, 2, 4, 1, -4, 6, 5]))
# => 2

## Conclusion

**QuickSort** is a divide & conquer sorting algorithm, which can be considered faster than it’s competitors (e.g **Merge Sort** or **Heap Sort**) provided you get lucky in pivot selection. Let’s quickly compare the three:

1
2
3
4
5
6

Merge Sort: Heap Sort: QuickSort:
------------------- ------------------- -------------------
High: O(n * log n) High: O(n * log n) High: O(n^2)
Mean: O(n * log n) Mean: O(n) Mean: O(n * log n)
Low: O(n * log n) Low: O(n * log n) Low: O(n * log n)
Space: O(n) Space: O(1) Space: O(log n)

The gotcha with **QuickSort** is in picking a **pivot** efficiently. We also picked a pivot using **Merge Sort**. Where you may ask? The answer is right down the middle, which explains the `log n`

part of the `O(n * log n)`

analysis above.

The main difference between **QuickSort** and **Merge Sort** is the stage in when you perform the sort. **Merge Sort** will sort during the merge stage, while **QuickSort** sorts during the partition stage (before the dividing). We found this to be useful as it gives us an opportunity to choose which halve to pursue when looking for the **median**.

## References

- BigO Cheat Sheet - Good reference for checking your work, however it’s good to know why you get these results.
- Algorithm Lecture 11: Median (Part 1) - Guided walkthrough of finding Median using modified QuickSort - Part 1.
- Algorithm Lecture 12: Median (Part 2) - Guided walkthrough of finding Median using modified QuickSort - Part 2.