Algorithm complexity

Time complexity of an algorithm defines the amount of time taken by an algorithm to run as a function of the input size. Similarly, the space complexity of an algorithm gives a measure for the amount of space (memory) taken by an algorithm to run for a specific length of the input. These complexity metrics define how much time and space an algorithm takes with an increasing amount of data (inputs on which the algorithm has to work on).

For example, consider the problem of adding two numbers. Here, we will look at each digit pair in the two integers, add them, and then move to the next digit pair. If we had to denote the time taken to perform this addition, we could model it as, T(n) = c * n:

  • Here T(n) is the time taken to add two integers of n digits.
  • c is time taken for the addition of a two-digit pair.

Intuitively, we can sense that the time taken will be proportional to the number of digits in the number.

Before we go into details, let's review a key mathematical notion: order of growth. For any two monotonic functions f(n) and g(n), we say that f(n) = O(g(n)) when there exist constants c > 0 and n0 > 0:

f(n) ≤ c * g(n), for all n ≥ n0

This is depicted by this graph (courtesy of Wikipedia):

The implication is that function f(n) does not grow faster than g(n), or that function g(n) is an upper bound for f(n), for all sufficiently large n. Thus, if we can model T(N) in the preceding form, we get a worst case running time for an algorithm for any given n!

As a concrete example on how complexity has an impact on scalability, let's look at two ways of sorting an array of integers. Bubble sorts works by comparing adjacent elements in the array and swapping them if they are out of order. Thus, in every top-level run, the largest element bubbles to the end of the array. A Golang implementation is given here:

func bubbleSort(array []int) {
swapped:= true;

for swapped {
swapped = false
for i:= 0; i < len(array) - 1; i++ {
if array[i + 1] < array[i] {
array[i + 1], array[i ] = array[i], array[i + 1]
swapped = true
}
}
}
}

Here, as you can see, there are two for loops that go over the whole array. As described earlier, the top- level for loop always pushes the next-largest element to the end of the yet-unsorted element. Let's run this through an example input array say, [ 15 1 4 3 8 ].

First pass of the outer for loop:

  • [ 15 1 4 3 8 ] –> [ 1 15 4 3 8 ]: swap since 15 > 1
  • [ 1 15 4 3 8 ] –> [ 1 4 15 3 8 ]: swap since 15 > 4
  • [ 1 4 15 3 8 ] –> [ 1 4 3 15 8 ], swap since 15 > 3  
  • [ 1 4 3 15 8 ] –> [ 1 4 3 8 15  ], swap since 15 > 8

Here is the second pass:

  • [ 1 4 3 8 15  ] –>[ 1 4 3 8 15  ]
  • [ 1 4 3 8 15  ] –> [ 1 3 4 8 15  ], swap since 4 > 3

At this point, the array is already sorted, but our algorithm needs one whole pass without any swap to know it is sorted. The next pass will keep swapped as false and then the code will bail out. In the worst case, we will need n * n comparisons; that is, n2 operations. Assuming each operation takes a unit time, this algorithm is thus said to have a worst case complexity of O(n2), or quadratic complexity.

Quicksort is another example of solving the problem. It is a type of the divide-and-conquer strategy of algorithm design, and is based on the idea of choosing one element of the array as a pivot and partitioning the array around this so that the elements to the left of the pivot are less than the value, while those on the right are larger than the pivot. A Go implementation is shown here:

func quickSort(array []int) []int {
if len(array) <= 1 {
return array
}
left, right:= 0, len(array) - 1

// Pick a pivot randomly and move it to the end
pivot:= rand.Int() % len(array)
a[pivot], a[right] = a[right], a[pivot]

// Partition
for i:= range array {
if array[i] < array[right] {
array[i], array[left] = array[left], array[i]
left++
}
}

// Put the pivot in place
array[left], array[right] = array[right], array[left]

// Recurse
quickSort(array[:left])
quickSort(array[left + 1:])

return array
}

As you can see from the code, at each step, the following is true:

  • There is a linear scan of the data.
  • The input is divided into two parts and the code recourses on it.

In a mathematical form, the time taken will be as follows:

T(n) = 2T(n/2) + n

Without going into the math details, this reduces to nlogn. Thus, for quicksort, the time complexity is O(nlogn).

Well, the code is slightly more complicated and we have changed the complexity from n2 to nlogn. Is this really worth it? To understand the difference, let's say you had to sort an array of a million elements. Bubble sort would need worst-case 1,012 operations, whereas quicksort would need just 20 * 106 operations! If each operation takes a millisecond, bubble sort would need more than 10 days, while quicksort would complete the task in around five hours! A very significant improvement in the scalability of the algorithm.

The following figure gives a graphical of the number of operations required for various Big-O Complexity:

Thus, it is extremely important to analyze and profile your code and identify sub-optimal algorithms to improve the scalability of the code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset