Bin sort and radix sort

Bin sort is one of the most efficient algorithms, wherein an input vector is split into multiple bins, and then sorting is performed within each bin. The elements are assigned to the bins based on the computations performed on each element. The bins can be a list of multiple vectors or a linked list. The current execution uses a list of multiple vectors as bins. The following R code performs the bin sort operation on a numeric vector (V) containing n elements. The maxValue variable denotes the element with maximum value within the input vector:

Bin_Sort=function(V,n,maxValue){ 
  bin <-list("binValues"=list(), "nElement"=NA) 
  ## create empty bins 
  for(i in 1:n){ 
    bin[["binValues"]][[i]]<-NA 
    bin[["nElement"]][i]<-0 
  } 
  ## add elements into suitable bins 
  bin <- addItem(V=V,bin=bin,maxValue=maxValue,n=n) 
  ## bind all bins into a single sorted vector 
  output <- bindSorted_vec(bin=bin,n=n) 
  return(output) 
}

Initially, an empty bin is created, which contains a list (binValues) and a vector (nElement). The list (binValues) is meant to act as bins to hold elements of the input vector (V), and the vector (nElement) is meant to track the count of elements in each bin.

The functions addItem and insertItem are meant to allocate each element into bins in a sorted order. The function insertItem gets activated when a new element is being inserted into a bin already containing elements. While inserting, the value of the new element is compared with the existing elements. Accordingly, the position is assigned to the new element, ensuring the sorting order (ascending in our case):

# add item to bin 
addItem=function(V,bin,maxValue,n){ 
  for(i in 1:n){ 
    val<-V[i] 
    ix<-ceiling((val*n)/maxValue) 
    if(is.na(bin[["binValues"]][[ix]][1])){ 
      bin[["binValues"]][[ix]][1]<-val 
      bin[["nElement"]][ix]<-1 
    } else 
    { 
      bin <- insertItem(val=val, ix=ix,bin=bin) 
    } 
  } 
  return(bin) 
} 
 
# insert a item into a bin ensuring sorting 
insertItem=function(val, ix,bin){ 
  nElement<-bin[["nElement"]][ix] 
  pos<-NULL 
  for(i in 1:nElement){ 
    if(val<bin[["binValues"]][[ix]][i]){ 
      pos<-i 
    } 
  } 
  if(is.null(pos)){ 
    bin[["binValues"]][[ix]][nElement+1]<-val 
  } else if(pos==1) { 
    bin[["binValues"]][[ix]]<-c(val, bin[["binValues"]][[ix]][1]) 
  } else 
  { 
    bin[["binValues"]][[ix]]<-c(bin[["binValues"]][[ix]][1:(pos-
    1)], val, bin[["binValues"]][[ix]][pos:nElement]) 
  } 
  bin[["nElement"]][ix]<-nElement+1 
  return(bin) 
}

Some of the key features of the functions addItem and insertItem are as follows:

  • Direct computations are performed on the element values prior to assigning them to each bin. The computation depends on the length of the input vector (n) and the maximum value in the input vector (maxValue). This also restricts the input vector to be of integer type rather than numeric.
  • The length of the binValue list is restricted to n. In other words, the total number of bins is n.
  • The vector nElement keeps track of the elements in each bin.
  • The function insertItem intrinsically ensures sorting among the elements within each bin. Whenever a new element needs to be inserted, its position is first determined based on its value, and then inserted accordingly.

Once all the elements of the input vector are allocated to the respective bins, the bins are then bound into a single output vector in an order from 1 to n. During the bind process, the relative positions of the elements within each bin are maintained. Hence, the output received is a completely sorted vector (ascending order in our case):

# bind the list into a sorted vector 
bindSorted_vec=function(bin,n){ 
  output <- c() 
  currentIx<-1 
  for(i in 1:n){ 
    if(!is.na(bin[["binValues"]][[i]][1])){ 
      nElement<-bin[["nElement"]][i] 
      for(m in 1:nElement){ 
        output[currentIx]<-bin[["binValues"]][[i]][m] 
        currentIx<-currentIx+1 
      } 
    } 
  } 
  return(output) 
}

The following example shows the working of the bin sort algorithm in R:

> V<-c(20,12,65,8,10,16,43,35,23,88,2,56,41,27,67,55)
> n<-16
> maxValue<-88
> Bin_Sort(V=V,n=n,maxValue=maxValue)
[1]  2  8 10 12 16 20 23 27 35 41 43 55 56 65 67 88

The performance of the bin sort algorithm is θ(n) for most of the scenarios. It is evaluated based on the number of operations required to place an element into a bin and then taking out all the elements from the bins into an output vector. However, when the input vector becomes very large, the number of traversing operations required for placement of each element increases considerably, and the performance is drastically affected.

Bucket sort is another representation of the bin sort algorithm, wherein the elements are initially assigned to each bin, and each bin is subjected to a different sorting technique. There is also no initial check on the elements being inserted into non-empty bins. Once all the elements are placed into their respective bins based on a computation criterion, each bin is then exposed to a different sorting algorithm. These individually sorted bins are later bound into a single vector of sorted elements.

Radix sort, on the other hand, is an improvised version of bin sort, wherein the number of bins can be restricted to a smaller number (generally 10 bins), and relative positioning of elements while assigning them into non-empty bins is not required. Consider a vector of n elements ranging from 0 to 999 which needs to be sorted in ascending order. Let us also define bins from 1 to 10 such that bin 1 is meant to store elements with the digit 1, bin 2 is meant to store elements with the digit 2, and so on. We can begin assigning elements to each bin based on their units digit. If the units digit of an element is 1, then the element will be placed in bin 1, and if the units digit is 0, the element will be placed in bin 10, and so on. Also, while inserting elements into non-empty bins, the relative positions need not to taken into account as was the case in bin sort. Once all the elements are inserted into the respective bins based on their units digit, all the 10 bins will then be bound into a single vector (without disturbing the overall order of the bins, that is, the first bin follows the second bin, which follows the third bin, and so on) using the bindSorted_vec function. Similarly, the process continues for the tens digit and the and hundreds digit. The following R code implements the radix sort algorithm:

# add item to bin 
addItem=function(V,bin,digLength,n){ 
  for(i in 1:n){ 
    val<-V[i] 
    ## Extract the required digit from the number 
    ix<-floor((val/digLength) %% 10)+1 
    ## Assign element to each bin 
    bin[["binValues"]][[ix]][bin[["nElement"]][ix]+1]<-val 
    ## Track count of elements in each bin 
    bin[["nElement"]][ix]<-bin[["nElement"]][ix] + 1 
  } 
  return(bin) 
} 
 
# bind the list into a sorted vector 
bindSorted_vec=function(bin){ 
  output <- c() 
  currentIx<-1 
  for(i in 1:10){ 
    if(!is.na(bin[["binValues"]][[i]][1])){ 
      nElement<-bin[["nElement"]][i] 
      for(m in 1:nElement){ 
        output[currentIx]<-bin[["binValues"]][[i]][m] 
        currentIx<-currentIx+1 
      } 
    } 
  } 
  return(output) 
} 
 
# radixsort Algorithm 
radix_Sort=function(V,n,maxValue,digLength){ 
  for(digLength in c(10^(0:digLength))) 
  { 
  bin <-list("binValues"=list(), "nElement"=NA) 
  # create empty bins 
  for(i in 1:10){ 
    bin[["binValues"]][[i]]<-NA 
    bin[["nElement"]][i]<-0 
  } 
  bin <- addItem(V=V,bin=bin,digLength=digLength,n=n) 
  V <- bindSorted_vec(bin=bin) 
  } 
  return(V) 
}

The following example shows the working of the radix sort algorithm in R:

> V<-c(67,54,10,988,15,5,16,43,35,23,88,2,103,83)
> n<-14
> maxValue<-988
> digLength <- 2
> radix_Sort(V=V,n=n,maxValue=maxValue,digLength=digLength)
[1]   2   5  10  15  16  23  35  43  54  67  83  88 103 988

Figure 5.14, 5.15, and 5.16 illustrate the implementation of the radix sort algorithm. Consider an integer vector (V) with 14 elements, with the maximum element as 988, and the length of digits as 2 (one less than the length of the maximum element):

Bin sort and radix sort

Figure 5.14: Iterations 0 and 1 of radix sort

Iteration 0 in radix sort uses the units digit from rightmost to arrange data in bins. For example, 10 with 0 in right most goes to the first bin and 43 goes to the third bin. Similarly, next iteration will use tens digit as shown in Figure 5.15:

Bin sort and radix sort

Figure 5.15: Iteration 2 of radix sort

The output from the tens digit is then reallocated using the hundreds digit (leftmost digit) as shown in Figure 5.16:

Bin sort and radix sort

Figure 5.16: Iteration 3 of radix sort

Now, let's analyze the performance of the radix sort algorithm. The asymptote of radix sort is θ(n) for all types of best, worst, and average-case scenarios irrespective of the length of the input vector. The asymptote primarily depends on the maximum number of digits for a given input vector and the base of the computation. In our algorithm, we have used a base of 10 for performing computations on each element prior to assigning them to the respective bins. The asymptote can be rewritten as θ(nk + sk), where n represents the total length of the input vector, s represents the base, and k represents the length of the maximum element in the input vector. However, if the length of the input vector is large and most of the values are distinct, then the asymptotic complexity of radix sort changes to Ω(nlog n). Also, if the range of elements is large, then the radix sort algorithm will show its best performance in terms of the Ω(nlog n) asymptote.

Nevertheless, the radix sort algorithm is very difficult to implement efficiently. The implementation requires a number of loop iterations, which affects the runtime performance of the algorithm. The following loops form an integral part of radix sort, which is shown in the preceding three images:

  • Loop to initialize the position of the digit (digLength) for an element
  • Loop to create empty bins
  • Loop to perform radix/index computation on each element prior to assigning elements into the respective bins, and to keep a track on the count of elements within each bin
  • Loop to extract elements from each bin and assign them to an output vector

Also, radix sort is limited to the integer type of input vectors. Vectors with real numbers and arbitrary element lengths need to be handled with extra care.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset