Merge sort follows the principle of divide and conquer, wherein the input vector is first divided into two halves, then each half is independently sorted and later merged into a single sorted vector. Its key features are as follows:
The following R code recursively implements merge sort:
Merge_Sort <- function(V) {
if(length(V) == 0) stop("Not enough elements to sort")
## Merge function to sort two halves or sub-vectors
merge_fn <- function(first_half, second_half) {
result <- c()
while(length(first_half) > 0 && length(second_half) > 0) {
if(first_half[1] <= second_half[1]) {
result <- c(result, first_half[1])
first_half <- first_half[-1]
} else {
result <- c(result, second_half[1])
second_half <- second_half[-1]
}
}
if(length(first_half) > 0) result <- c(result, first_half)
if(length(second_half) > 0) result <- c(result, second_half)
return(result)
}
## Recursively split the parent vector into two halves (sub-
vectors)
if(length(V) <= 1) V else {
middle <- length(V) / 2
first_half <- V[1:floor(middle)]
second_half <- V[floor(middle+1):length(V)]
first_half <- Merge_Sort(first_half)
second_half <- Merge_Sort(second_half)
if(first_half[length(first_half)] <= second_half[1]) {
c(first_half, second_half)
} else {
merge_fn(first_half, second_half)
}
}
}
The R code comprises two subcodes. One explains how to execute the merge operation (merge_fn
), and the other how to operate the main function (Merge_Sort
) recursively. The former function executes the merge operation on two input vectors (or two halves of a subvector), whereas the latter function recursively splits the main vector (V
) to its lowest possible half (log n levels of recursion), and accordingly, performs the merge operation.
Figure 5.6 illustrates the methodology of merge sort in operation:
Figure 5.6: Illustration of merge sort
One of the main drawbacks of merge sort is memory management. It requires almost twice the memory required by most of the sorting algorithms. Initially, the main input vector is recursively split into multiple subarrays. These subarrays are again recursively merged into multiple secondary vectors, until a final sorted vector is obtained. Thus, a complete execution requires two sets of supplementary vectors (one while splitting, and the other while merging); a bypass of either step is extremely difficult to implement.
Despite the fact of its recursive implementation, analyzing merge sort asymptotically is not very difficult. The analysis can be divided into two stages:
Thus, the functional form of the system runtime of the merge sort algorithm in terms of the number of execution steps is θ(n log n), because each log n level of recursion requires an n number of total merge operation steps. As the cost function is independent of the order of the initial input vector, the asymptote for the best, average, and worst-cases remain the same.