Self-organizing lists

So far, we have learned that the performance of search can be enhanced by sorting the vectors based on their key values prior to the search operations. However, there seems to be an another approach for sorting vectors, which is not based on key values but on the expected frequency of accessing the key values for comparison purposes. This kind of sorting based on expected frequency of access can sometimes be cheaper compared to sorting based on key values, thereby increasing the performance of search operations.

Consider a vector V sorted based on the frequency of access of key values, but not on the value of its elements. In other words, the elements with a higher probability (pi) of getting compared with the search element S is placed first, followed by the element with the second highest probability, and so on. The search for element S is performed sequentially on all the sorted elements in the vector. Upon multiple search iterations, the expected number of comparisons required for one search is given as follows:

Self-organizing lists

Here, the cost to access the first element is 1 (only a single element to compare with S), and the probability of access of this first element is p1. Similarly, the cost to access the second element is 2 (as the first and second elements need to be compared with, and its probability of access is p2, and so on. Assuming the possibility of search on each of the elements in the vector, the sum of all the probabilities from p1 to pn is infinity.

This approach of sorting based on the frequency of comparisons has some disadvantages. Primarily, it is very difficult to determine the probabilities of access in advance if the corresponding vector has not been iterated over a bunch of search elements. Moreover, the records which had a higher frequency of access initially may not continue to stay for long time. Thus, the probability of access for some elements might change over time. These constraints led to the concept of self-organizing lists, wherein the pattern of element access is also taken into account along with their frequency of access. These self-organizing vectors are based on heuristics, examples of some heuristics for self-organizing vectors are as follows:

  • Count: This is the most basic heuristic of self-organizing vectors. Here, the count refers to the number of comparisons or accesses being made with the elements of the vectors. The count of each key value is stored, which is used in ordering of the vector. In parallel, the elements are also moved toward the left of the vector as its count starts increasing more than the elements preceding it. The main drawback of this heuristic is that the ordering is very hostile to the change in frequency of access over time. In other words, once the element gets a higher count, it nearly always remains toward the left of the vector regardless of further changes to other elements' counts. Also, this method requires additional memory to store the count information.
  • Move-to-front: In this heuristic, once the element S is found, the corresponding element in the vector is moved toward the first position, and all the other elements are pushed back by one positon. This kind of heuristic is called move-to-front. This is easy to implement in linked lists against vectors. In vectors, bringing a near-end element toward the front of the vector requires the displacement of a large number of preceding elements. The cost of move-to-front is almost twice the cost required by the count heuristic, wherein n searches are performed on the vector and the elements are, accordingly, order based on the frequency of access. It performs better in scenarios where the elements are accessed frequently for a brief period of time, as these elements will be near the front of the vector during this period of access. However, it performs poorly when elements are processed repeatedly in a sequential order.
  • Transpose: The heuristic of swapping adjacent elements based on their frequency of access is termed as transpose. It performs well in both linked lists and vectors. The transpose will inherently move the most frequently accessed elements to the front of the vector. The elements which were initially accessed frequently and moved to the front will start to slowly drift backward once they are no longer accessed frequently. Thus, it performs well for scenarios where there is change in the frequency of access. In some situations, it performs poorly. Assume a sequence of search operations wherein the last and second-last elements are accessed alternately. Then these elements will get swapped for each iteration, but neither of them will move toward the front of the vector. However, these kinds of situations are rare. This can be resolved if the accessed elements are moved forward by some fixed number of positions instead of swapping with their adjacent preceding element.

Now let's understand each heuristic using an example. Consider a numeric vector, V of eight elements arranged in an order of key values, as shown in the following diagram:

Self-organizing lists

Figure 6.6: Example of numeric vector arranged in order of key values

Now, let's perform a series of 12 search operations in the following order of elements (S):

Self-organizing lists

Figure 6.7: Vector of search operations to be performed

Heuristic 1 - Count

In count based heuristic, the frequency of access of the elements start moving forward. After the first three searches, the element 6 will be first, followed by 4, and so on. The total cost of all these accesses will be 45 comparisons. The following R code implements self-organizing lists, and returns a sorted vector based on sequential search. The input is a vector V, the sequence of search elements is S, the number of elements to be searched (that is, length of S) is n_search, and the number of elements in the input vector (that is, the length of V) is n:

SOL_count <- function(V,S,n_search,n) 
{ 
  if(is.null(V)) stop("NO elements in input vector") 
  if(is.null(S)) stop("NO elemens to search") 
  i=1 
  count <- as.list(sapply(1:n,function(x) 0)) 
  names(count) <- V 
  cl <- class(V) 
  while(i<=n_search) 
  { 
    if(Sequential_search(V,S[i],n)$present){ 
      key <- Sequential_search(V,S[i],n)$key 
      count[key][[1]] <- count[key][[1]] + 1 
      count <- count[order(-unlist(count))] 
      V <- as.numeric(names(count)) 
    } 
    i=i+1 
  } 
  return(V) 
}

The final sorted vector based on count is as follows:

Heuristic 1 - Count

Figure 6.8: Output from Heuristic 1 - Count

Heuristic 2 - Move-to-front

Here, upon finding the element S, the element is moved toward the front of the vector. The following R code implements the move-to-search heuristic of self-organizing lists, and returns a sorted vector. The input is a vector V, the sequence of search elements is S, the number of elements to be searched (that is-length of S: n_search) and number of elements in the input vector (that is, the length of V: n):

SOL_move <- function(V,S,n_search,n) 
{ 
  if(is.null(V)) stop("NO elements in input vector") 
  if(is.null(S)) stop("NO elemens to search") 
  i=1 
  while(i<=n_search) 
  { 
    if(Sequential_search(V,S[i],n)$present){ 
      if(Sequential_search(V,S[i],n)$key !=1){ 
        key <- Sequential_search(V,S[i],n)$key 
        temp <- V[key] 
        V <- V[-key] 
        V <- c(temp,V) 
      } 
    } 
    i <- i+1 
  } 
  return(V) 
}

The total cost of these accesses will be 54 comparisons, and the final sorted vector is as follows:

Heuristic 2 - Move-to-front

Figure 6.9: Output from Heuristic 2 - Move-to-front

Heuristic 3 - Transpose

Here, the elements, once found, are transposed with the adjacent element till it moves toward the front. The following R code implements the transpose heuristic of self-organizing lists, and returns the sorted vector. The input is a vector V, the sequence of search elements is S, the number of elements to be searched (that is, the length S) is n_search, and the number of elements in the input vector (that is, the length of V) is n:

SOL_transpose <- function(V,S,n_search,n) 
{ 
  if(is.null(V)) stop("NO elements in input vector") 
  if(is.null(S)) stop("NO elemens to search") 
  i=1 
  while(i<=n_search) 
  { 
    if(Sequential_search(V,S[i],n)$present){ 
      if(Sequential_search(V,S[i],n)$key !=1){ 
        key <- Sequential_search(V,S[i],n)$key 
        temp <- V[key-1] 
        V[key-1] <- V[key] 
        V[key] <- temp 
      } 
    } 
    i <- i+1 
  } 
  return(V) 
}

The total cost of these accesses will be 62 comparisons, and the final sorted vector is as follows:

Heuristic 3 - Transpose

Figure 6.10: Output from Heuristic 3 - Transpose

The asymptote of self-organizing lists based on system runtime is O(log n), which is similar to binary search trees; however, the former performs better in many scenarios. The main advantage of self-organizing lists is the non-requirement of a pre-sorted vector, as sorting itself requires a certain cost. Also, the cost to insert a new element is also low, as its position need not be determined, which is compulsory in the case of insertion in a sorted vector. Self-organizing lists are simple to implement, and show better performance even for smaller vectors (or lists). Thus, with a minor change in the algorithm, the performance of a sequential search can be enhanced using self-organizing lists without any prerequisite for a sorted vector.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset