Chapter 5. Efficient Searching – Binary Search and Sorting

What is searching? Searching is trying to locate a given value in a collection of values. For example, you are given an array of integers, and your problem is to check whether the integer 5 is in that array. This is a search problem. In addition to just deciding whether the element 5 is in the array, we may be interested in its location as well when it is found. This is also a search problem.

Another interesting take on it would be to imagine a dictionary, that is, an array of values and associated values. For example, you have an array of names of students and their marks, as shown in the following table:

Name

Marks

Tom

63

Harry

70

Merry

65

Aisha

85

Abdullah

72

...

The list continues. Suppose, our system lets the student view his/her own marks. They would type their name and the system would show their marks. For simplicity, let's assume that there are no duplicate names. Now, we have to search for the name provided and return the corresponding values. This is, thus, another search problem. As we will see, search problems are quite ubiquitous in programming.

In this chapter, you will learn the following:

  • Search algorithms
  • Efficient searching in a sorted list
  • Some sorting algorithms

Search algorithms

Suppose you are given an array of values and you are required to check whether a particular value is in that array, what is the most natural way to find that element? Well, it seems that the natural way is to go through each element one by one and check whether they match the given value. If any one does, we have found that element and we can return the index; if not, we can return -1 at the end of processing all the elements to report that such an element could not be found. This is what we would call a linear search. The following demonstrates a linear search algorithm in an array:

    public static <E, F extends E> int linearSearch(E[] values, 
    valueToLookup) { 
        for (int i = 0; i < values.length; i++) { 
            if (values[i].equals(valueToLookup)) { 
                return i; 
            } 
        } 
        return -1; 
    }

The function linearSearch takes an array of values and a value to search in it, and returns the index if the value is found. If not, it returns a special value, -1. The program simply goes through each element and checks whether the current element matches with the value to lookup; if it does, then it just returns the current index, otherwise it keeps looking. At the end of the array, it returns the special value, -1. Now the following piece of code should return -1 in the first case and the value 5 in the second case:

        Integer[] integers = new Integer[]{232,54,1,213,654,23,6,72,21}; 
        System.out.println(ArraySearcher.linearSearch(integers,5)); 
        System.out.println(ArraySearcher.linearSearch(integers,23));

Now, if we want to solve the student-marks problem described in the introduction of this chapter, we just need to have the marks of the students stored in a different array in the same order, as follows:

    static String[] students = new String[]{"Tom","Harry","Merry","Aisha", "Abdullah"}; 
    static int[] marks = new int[]{63,70, 65, 85, 72}; 

Now we can write a function to search for a name:

    public static Integer marksForName(String name){ 
        int index = linearSearch(students, name); 
        if(index>=0){ 
            return marks[index]; 
        }else{ 
            return null; 
        } 
    }

First, we look for the name of the students in the list of students. If the name is found, the corresponding index would be assigned to the variable index and the value would be greater than or equal to zero. In such a case, we return the marks stored in the same index as of the marks array. If it is not found, we return null. To lookup the marks for Merry, for example, we call as shown here:

        System.out.println(marksForName("Merry"));

We correctly obtain her marks, that are, 65.

What is the complexity of linear search? We have a for loop that moves through each element of an array of length n (say); in the worst case, we would go through all the elements, so the worst case complexity is θ(n). Even on an average, we would be visiting half the elements before we hit the correct element, so the average case complexity is θ(n/2) = θ(n).

Binary search

Is a linear search the best we can do? Well, it turns out that if we are looking at an arbitrary array, this is what we have to do. After all, in an arbitrary array, there is no way to know whether an element is there without potentially looking at all of them. More specifically, we cannot say for sure that some element does not exist, without verifying all the elements. The reason is that the value of one element does not say anything about the values of the other elements.

But, what information can one element have about other elements in an array? One way to make elements have information about the other elements is to have a sorted array instead of just an arbitrary array. What is a sorted array? A sorted array is an array that has all the elements arranged in order of their values. When an array is sorted, every element contains the information that everything on the left is smaller than that particular element, and everything on the right is bigger (or the other way round if the order of the elements is opposite, but we will consider the arrays that are sorted in an increasing order from left to right). This information can, amazingly, make this search a lot faster. Here is what we do:

  • Check the middle element of the array. If it matches the element we are searching for, we are done.
  • If the middle element is smaller than the value we are searching for, search on the subarray on the right of the current array. This is because everything on the left is even smaller.
  • If the middle element is bigger than the value we are searching for, search only in the left sub-array.

To avoid creating copies of the array while creating a sub-array, we just pass on the whole array, but we remember the start and end positions we are looking at. The start is included in the range and end is excluded. So, only elements on the right of the start position and the left of the end position are included in the subarray being searched. The following figure gives a visual understanding of binary search:

Binary search

Figure 1: Binary Search.

An arrow representing moving one element to another during the search.

But, before implementing this algorithm, we need to understand the concept of Comparable. Comparable is an interface in the Java standard library that looks like this:

package java.lang; 
public interface Comparable<T> { 
    public int compareTo(T o); 
} 

Any class implementing this interface has to compare a different object with itself. It is required that the type parameter, T, must be instantiated with the same class that implements it, like this:

public class Integer implements Comparable<Integer>{
    public int compareTo(Integer o){
        …
    } 
}

The compareTo method intends to compare an object of the same type. If the current object (the one that the this reference refers to) is smaller than the object passed, compareTo must return a negative value. If the object passed is smaller, the method must return a positive value. Otherwise, if they are equal, it must return 0. The following conditions are required to be fulfilled by the compareTo method:

  • If a.compareTo(b) == 0, then a.equals(b) must be true
  • If a.compareTo(b) < 0 and b.compareTo(c) < 0, then a.compareTo(c) <0
  • If a.compareTo(b) <0, then b.compareTo(a) > 0
  • If b.equals(c) is true and a.compareTo(b) <0, then a.compareTo(c) <0
  • If b.equals(c) is true and a.compareTo(b) >0, then a.compareTo(c) >0

Basically, the conditions are the same for a total order where equality is represented by the equals method. It basically generalizes the concept of the < and <= operators, which are there for numbers. Of course, the compareTo method for the Wrapper objects are implemented exactly as the < and <= operators are on the primitives inside them.

Now, we write the search function to do the search, as per the preceding steps:

    private static <E extends Comparable<E>, 
      F extends E> int binarySearch( E[] sortedValues, 
      F valueToSearch, int start, int end) { 
        if(start>=end){ 
            return -1; 
        } 
        int midIndex = (end+start)/2; 
        int comparison = sortedValues[midIndex].compareTo(valueToSearch); 
        if(comparison==0){ 
            return midIndex; 
        }else if(comparison>0){ 
            return binarySearch(sortedValues, valueToSearch, start, midIndex); 
        }else{ 
            return binarySearch(sortedValues, valueToSearch, midIndex+1, end); 
        } 
    }

Note that in this case, we have mandated that the objects in the array must be comparable so that we can know if an object is greater than or less than another object. It does not matter exactly how this relationship is determined; the array must be sorted using the same comparison – that the comparison between two consecutive elements will make sure the element on the left is smaller than the one on the right, as provided by Comparable.

The first if condition checks whether the array passed is empty, if so, obviously the element to be searched is not found and we return -1 to represent this. Then, we find the midIndex and recursively search for the element in either the left or the right subarray. Once we have this function, we create another wrapper function to run the search without having to mention the start and the end positions:

    public static <E extends Comparable<E>, F extends E> int binarySearch( 
            E[] sortedValues, F valueToSearch) { 
        return binarySearch(sortedValues, valueToSearch, 0, sortedValues.length); 
    }

Complexity of the binary search algorithm

In every step, we are partitioning the total array into two parts, barring the one element that we are comparing. In the worst case, that is, the case where the element searched is not present in the array, we will have to get down all the way to the point where we are dealing with an empty array, in which case we return -1 and stop recursing. We must have had an array of only one element in the previous step. For our analysis, we will consider this step as the last step. So, let's have a sorted array of n elements and T(.) is the time required for searching in the array. So, we have the following:

T(n) = T((n-1)/2) + C, where C is a constant.

In general, the two search branches in every step would be of different sizes, one potentially being of size one less than that of the other part. But we will ignore these small differences, which hardly matter for a large n. Hence, we will work with the following equation:

T(n) = T(n/2) + C

Now, let's assume that n is an integral power of 2 so that n = 2m for some integer m. So, we have this:

T(2m) = T(2m-1) + C

Now we take another function S(.), such that S(m) = T(2m) for all m. Then, we have:

S(m) = S(m-1) + C

This is the formula for an arithmetic progression. And hence, we have this:

S(m) = mC + D, where D is also a constant.
=> T(2m) = mC + D
=> T(n) = C lg(n) + D

So, we have the asymptotic complexity of T(n):

T(n) = O(lg(n))

The function, T(n), grows only as fast as the logarithm of the size of the array passed, which is really slow. This makes binary search an extremely efficient algorithm.

To sort of field test the algorithms, we run both linear and binary search algorithms on an array of a hundred million elements with the following code:

        int arraySize = 100000000; 
        Long array[] = new Long[arraySize]; 
        array[0] = (long)(Math.random()*100); 
        for(int i=1;i<array.length;i++){ 
            array[i] = array[i-1] + (long)(Math.random()*100); 
        } 

        //let us look for an element using linear and binary search 
        long start = System.currentTimeMillis(); 
        linearSearch(array, 31232L); 
        long linEnd = System.currentTimeMillis(); 
        binarySearch(array, 31232L); 
        long binEnd = System.currentTimeMillis(); 

        System.out.println("linear search time :=" + (linEnd -start)); 
        System.out.println("binary search time :=" + (binEnd -linEnd));

On my computer, the linear search took 282 milliseconds and binary search took 0 milliseconds. Also, note that the value we are looking for is expected to be quite near to the beginning of the array; in case of values near the middle, a binary search would have an even higher advantage.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset