© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
M. IndenPython Challengeshttps://doi.org/10.1007/978-1-4842-7398-2_9

9. Searching and Sorting

Michael Inden1  
(1)
Zurich, Switzerland
 

Searching and sorting are two elementary topics of computer science in the field of algorithms and data structures. Python provides efficient implementations for both of them and thus takes a lot of work off your shoulders. However, understanding the underlying algorithms helps in choosing the most suitable variant for a particular use case. I only skim over the topic of searching here since it is built in and does not offer a lot of variations, except for binary search, which is covered in section 9.1.1.

In this chapter, you will primarily dedicate yourself to some essential sorting algorithms because you will learn some algorithmic tricks in the meantime.

9.1 Introduction Search

When managing data, sometimes you need to search for items, such as customers with the first name “Carsten” or an invoice with specific order date. Conveniently, all containers like lists, sets and dictionaries offer various methods or functions, such as those with which you can search for elements.

9.1.1 Search with in(), index(), and count()

Sometimes you just need to check if certain data is present in a container. The keyword in helps with this. Occasionally you may also want to get the position of the element. Then you use index() , which triggers a ValueError in case of non-existence:
3 in [1, 2, 3] # => True
[1, 2, 3].index(2) # => 1
[1, 2, 3].index(4) # => ValueError
Furthermore, it is important to remember that index() always returns the index of the first occurrence. If there are several identical elements and all their occurrences are to be determined, then you can help yourself with a combination of in and enumerate(). If necessary, count() returns the number of identical elements.
print([1, 2, 3, 2].index(2)) # => 1
print([i for i, val in enumerate([1, 2, 3, 2, 4, 2]) if val == 2]) # => [1, 3, 5]
print([1, 2, 3, 2, 4, 2].count(2)) # => 3
For dictionaries you can get the keys (keys()), the values (values()), or their combination with items(). For queries, there is another short form for the keys:
programmers = {"Michael": "Python",
               "Tim": "C++",
               "Karthi": "Java"}
if "Karthi" in programmers.keys():
    print("Karthi is here")
# Python Shortcut
if "Karthi" in programmers:
    print("Karthi is here II")
if "C++" in programmers.values():
    print("someone knows C++")
if ("Michael", "Python") in programmers.items():
    print("Michael knows Python")
In addition, all() can be used to check whether a set of elements is included. With any() you can determine if there is a match.
print(all(elem in [2, 3, 5, 7, 9] for elem in [7, 2]))  # => True
print(any(elem in [2, 3, 5, 7, 9] for elem in [7, 2]))  # => True
print(any(elem in [2, 3, 5, 7, 9] for elem in [4]))  # => False

9.1.2 Search with rindex() and rfind()

For strings, there are functions rindex() and rfind() to find the position of a desired element from the end of the string:
print("Hello".rindex("l")) # => 3
print("Hello".rfind("l")) # => 3
print("Hello".rfind("x")) # => -1
# print("Hello".rindex("x")) # => ValueError: substring not found
Unfortunately, there is no such thing for lists. However, you can program this quite easily yourself using the function named last_index_of(), which is known from Java and, in my opinion, is better understandable. If no element is found, the return value is -1.
def last_index_of(values, search_for):
    for pos in range(len(values) - 1, -1, -1):
        if values[pos] == search_for:
            return pos
    return -1
print(last_index_of([1, 2, 3, 2, 4, 2, 5, 2], 2))  # => 7

9.1.3 Binary Search

A brute-force way to search is to iterate over all elements until the desired element is found or you reach the end of the dataset, like this:
def find(values, search_for):
    for i, current_value in enumerate(values):
        if current_value == search_for:
            return i
    return -1

The same applies to the search methods just mentioned. Even if not directly visible, all of them iteratively look at all elements of the data structure until they find what they are looking for. They search all result in a linear running time. In contrast, there is an efficient search called binary search, which offers logarithmic running time. But there is a prerequisite: binary search requires sorted data. If you have to sort data explicitly first, then the advantage over a linear search is hardly given, especially with small datasets.

For larger volumes of data, however, the logarithmic running time of a binary search is significantly better than that of a linear search. The low running time is achieved by the algorithm splitting the parts to be processed in half in each case and then continuing the search in the appropriate chunk. Figure 9-1 illustrates the principle procedure, with discarded parts marked in gray.
Figure 9-1

Schematic sequence for binary search

In the figure, the arrow points between the elements in the first step—depending on the implementation of binary_search(), the left or right element directly adjacent to the center is used for comparison if the number is even.

9.2 Introduction Sort

In this section, I introduce some sorting algorithms that form the basis for the later exercises.

9.2.1 Insertion Sort

Insertion Sort is illustrated the best by sorting a deck of cards in your hand. Typically, you start on the left side, then take the next card to the right, and insert it appropriately into the already sorted left part, which usually causes some cards to move to the right. With this procedure, you can skip the first card since it is sorted by itself and start with the second card. Let’s look at this for the number sequence 4, 2, 7, 9, 1. For this purpose, the respective new element to be sorted in is marked. The already sorted part on the left is separated with || from the unsorted part on the right.
4 || ➁ 7 9 1
2 4 || ➆ 9 1
2 4 7 || ➈ 1
2 4 7 9 || ➀
1 2 4 7 9

In the example, you start with the value 2. For each number, you have to determine the correct insertion position. There are two ways to do this, as described below.

Determine Insertion Position

Starting from the current position, move to the left as long as the compared values are larger. Alternatively, you can also start from the beginning and move one position to the right as long as the compared values are smaller.
def find_insert_pos_from_current(values, current_pos):
    insert_pos = current_pos
    while insert_pos > 0 and values[insert_pos - 1] > values[current_pos]:
        insert_pos -= 1
    return insert_pos
def find_insert_pos_from_start(values, current_pos):
    insert_pos = 0
    while insert_pos < current_pos and values[insert_pos] < values[current_pos]:
        insert_pos += 1
    return insert_pos
HINT: STABLE SORTING

When sorting elements of the same value, keeping their original order in the collection is referred to as a Stable Sort . This is often a preferable behavior because it prevents data associated with the elements from getting out of order.

For the example, find_insert_pos_from_current() results in a stable sorting, but the second one does not. However, if you replace the < with <=, the resulting sorting algorithm also becomes stable:
while insert_pos < current_pos and
      values[insert_pos] <= values[current_pos]:

This is due to the fact that a most recently found element of the same value is always placed behind all elements of the same value.

Implementation of Insertion Sort

After identifying the correct insertion position for a value, all values (up to the currently considered value) have to be shifted by one position to the right. Finally, the value is inserted at the determined position.
def insertion_sort(values):
    for current_pos in range(1, len(numbers)):
        current_val = numbers[current_pos]
        insert_pos = find_insert_pos_from_current(values, current_pos)
        move_right(values, current_pos, insert_pos)
        numbers[insert_pos] = current_val
def move_right(values, current_pos, insert_pos):
    move_pos = current_pos
    while move_pos > insert_pos:
        values[move_pos] = values[move_pos - 1]
        move_pos -= 1

This code shows a well-understandable implementation that focuses on comprehensibility and not on speed. In fact, it is possible to combine some actions cleverly and thus avoid multiple runs. Later, in Exercise 4, you will deal with exactly this optimization.

Let’s try out the implementation on the command line:
>>> values = [ 4, 2, 7, 9, 1 ]
>>> insertion_sort(values)
>>> print(values)
[1, 2, 4, 7, 9]

9.2.2 Selection Sort

Selection Sort is another intuitive method for sorting. It offers two variations, one based on the minimum and the other on the maximum. In the minimum version, the values to be sorted are traversed from front to back. In each step, the minimum is determined from the section that is still unsorted. This is moved forward by swapping it with the current element. This causes the sorted area to grow from the front and the remaining unsorted section to shrink. For the version based on the maximum, the data to be sorted is processed from the back to the front. The respective maximum is placed at the end so that the sorted area grows from the back.

To gain a better understanding, let’s reproduce this for a small set of values. For this purpose, the respective current minimum or maximum is specially marked. The sorted part is separated from the unsorted part with ||. You can easily observe how the sorted part grows.
    Min                      Max
    ->                       <-
    || 4 2 7 9 ➀            4 2 7 ➈ 1 ||
1:  1 || ➁ 7 9 4            4 2 ➆ 1 || 9
2:  1 2 || 7 9 ➃            ➃ 2 1 || 7 9
3:  1 2 4 || 9 ➆            1 ➁ || 4 7 9
4:  1 2 4 7 || 9            1 || 2 4 7 9
The implementation of the version concerning the minimum is as follows:
def selection_sort_min(values):
    for i in range(len(values) - 1):
        min_idx = i
        # find minimum
        for j in range(i + 1, len(values)):
            if values[j] < values[min_idx]:
                min_idx = j
        # swap current value with minimum
        tmp = values[min_idx]
        values[min_idx] = values[i]
        values[i] = tmp

If you only look at algorithms at this low level, it is usually difficult to understand and comprehend them. Of course, the final algorithms used in frameworks must be as optimal as possible. This requires estimations with the O-notation. This is easier to perform on the low level than on the high level since then all constructs, including invoked functions or methods, must be considered. However, to learn and get started, it is much more suitable to program comprehensively first and then optimize in further steps.

Let’s execute the sorting in the following main():
def main():
    values = [4, 2, 7, 9, 1]
    selection_sort_min(values)
    print(values)
We get the expected output:
[1, 2, 4, 7, 9]
OPINION: START WITH COMPREHENSIBILITY

How can Selection Sort be described on a higher level of abstraction? To do this, let’s use some auxiliary functions that you created for arrays. In the corresponding chapter’s introduction, you learned about the method swap() for swapping elements and the function find_min_pos() for finding the position of the smallest element, which was created as solution for Exercise 11 in Section 6.3.11. Conveniently, they can also be used for lists without modification.

By using these functions, the actual procedure becomes almost immediately apparent. You traverse the values from the beginning and, in each case, find the minimum of the remaining part and swap it with the value of the current position:
def selection_sort_min_readable(values):
    for cur_idx in range(len(values) - 1):
        min_idx = find_min_pos(values, cur_idx, len(values))
        swap(values, min_idx, cur_idx)
In the code, the two helper functions from the module array_utils marked in bold are used. As usual, the module array_utils bundles several helper functions developed in Chapter 6. The two helper functions called in the code above are shown again here to ease trying out the example with the command line:
def find_min_pos(values, start_pos, end_pos):
    min_pos = start_pos
    for i in range(start_pos + 1, end_pos):
        if values[i] < values[min_pos]:
            min_pos = i
    return min_pos
def swap(values, pos1, pos2):
    temp = values[pos1]
    values[pos1] = values[pos2]
    values[pos2] = temp

9.2.3 Merge Sort

Merge Sort is based on a divide-and-conquer approach. It recursively splits the values to be sorted into smaller and smaller subparts of about half the original size until they consist of only one or possibly no element. Afterwards, the subparts are combined again. In this merging step, the sorting is done by the appropriate merging based on the respective values. The processes can be illustrated as shown in Figure 9-2.
Figure 9-2

Merge Sort procedure

The splitting algorithm can be implemented recursively and well comprehensibly— though also somewhat inefficiently—as long as you are allowed to create new lists. The implementation of the function merge(values1, values2) was already presented as the solution to Exercise 12 in Section 5.3.12. It is used here:
def merge_sort(to_sort):
    # recursive termination: length 0 (only if initially empty array) or 1
    if len(to_sort) <= 1:
        return to_sort
    # recursive descent: divide into two halves
    mid_pos = len(to_sort) // 2
    left = to_sort[0: mid_pos]
    result_left = merge_sort(left)
    right = to_sort[mid_pos: len(to_sort)]
    result_right = merge_sort(right)
    # combine the partial results into larger sorted data set
    return merge(result_left, result_right)
Let’s execute the sorting in the following main():
def main():
    unsorted_values = [4, 2, 7, 9, 1]
    sorted_values = merge_sort(unsorted_values)
    print(sorted_values)
You get the expected output:
[1, 2, 4, 7, 9]
HINT: ANALOGY FROM REAL LIFE LEADS TO OPTIMIZATION
The analogy to sorting a deck of cards is suitable for Merge Sort as well. If you need to sort a fairly large pile of cards, you can divide it into many, much smaller piles, sort them separately, and then merge them successively. However, instead of reducing the piles down to one card, it is a good idea to sort the smaller piles using another method, often Insertion Sort, which has a running time of O(n) for small, ideally nearly ordered values. This is useful for fine-tuning. Ingeniously, Merge Sort makes this easy as pie:
def merge_sort_with_insertion_sort(to_sort):
    # recursive termination including mini-optimization
    if len(to_sort) < 5:
        insertion_sort(to_sort)
        return to_sort
    # recursive descent: divide into two halves
    mid_pos = len(to_sort) // 2
    left = to_sort[0: mid_pos]
    result_left = merge_sort(left)
    right = to_sort[mid_pos: len(to_sort)]
    result_right = merge_sort(right)
    # combine the partial results into larger sorted data set
    return merge(result_left, result_right)

Finally, I would like to point out that the limit at which one should switch to Insertion Sort has been set here quite arbitrarily to the value 5. Presumably, values between 10 and 20 elements are quite practical. However, it would be best if you rely on the knowledge of algorithm professionals who create mathematically sound estimates for running times.

9.2.4 Quick Sort

Just like Merge Sort, Quick Sort is based on a divide-and-conquer approach and splits the values to be sorted into smaller and smaller subparts. A special element (called a pivot) is chosen that determines the grouping or processing. For simplicity, you can choose the first element of the subparts to be sorted as the pivot element, but other ways are conceivable. In Quick Sort, sorting is done based on this pivot element by arranging all elements of the parts according to their value to the left (less than or equal to) or to the right (greater than) of the pivot. This way, the pivot element is placed in the correct position. The whole process is repeated recursively for the left and right parts until the parts consist of only one element. The processes are shown in Figure 9-3.
Figure 9-3

Quick Sort

Let’s start with an implementation for lists since this is more easily accessible and understandable. As a result, breaking down the contents of a list into smaller and larger elements is easy to implement. Later, combining the results of the recursive computations is also straightforward. The whole implementation is intentional, not optimized for speed but for comprehensibility.

For partitioning, you collect all elements that are less than or equal to or greater than the value of the pivot element in respectively one separate result list. You skip the first element because it is the pivot element and then apply the appropriate filter condition within a list comprehension.
def quick_sort(values):
    # recursive termination
    if len(values) <= 1:
        return values
    # collect less than or equal to / greater than pivot
    pivot = values[0]
    below_or_equals = [value for value in values[1:] if value <= pivot]
    aboves = [value for value in values[1:] if value > pivot]
    # recursive descent
    sorted_lowers_part = quick_sort(below_or_equals)
    sorted_uppers_part = quick_sort(aboves)
    # assemble
    return sorted_lowers_part + [pivot] + sorted_uppers_part

The whole thing is quite intuitive for lists and when not optimized for performance. It becomes considerably more awkward if you want to realize the partitioning for inplace (i. e., directly in the original array or list itself). You can see this for yourself later when solving Exercise 6. You will now take a look at the basic procedure.

Let’s execute the sorting in the following main() :
def main():
    unsorted_values = [4, 2, 7, 9, 1]
    sorted_values = quick_sort(unsorted_values)
    print(sorted_values)
We get the expected output:
[1, 2, 4, 7, 9]

Inplace Implementation

The basic algorithm can be implemented as follows, although the realization of the partitioning, as already mentioned, will be a practice exercise:
def quick_sort_inplace(values):
    quick_sort_in_range(values, 0, len(values) - 1)
def quick_sort_in_range(values, left, right):
    # recursive termination
    if left >= right:
        return
    partition_index = partition(values, left, right)
    # recursive descent
    quick_sort_in_range(values, left, partition_index - 1)
    quick_sort_in_range(values, partition_index + 1, right)
HINT: AVOIDING SIDE EFFECTS BY COPYING
If the original data set should be left unchanged, you can first create a copy of it and then call the inplace function:
def quick_sort_with_copy(values):
    copied_values = values.copy()
    quick_sort_inplace(copied_values);
    return copied_values

9.2.5 Bucket Sort

Bucket Sort is an interesting sorting method whose algorithm is only outlined below since the implementation is the subject of Exercise 7.

Bucket Sort is a two-step procedure for sorting data. First, the values are collected in special containers called buckets. Then, these values are transferred appropriately into a sorted list. For the algorithm to be feasible, the elements to be sorted must have a limited set of values. For example, this applies to the age information of persons, where you can assume a range of values from 0 to 150.
ages = [10, 50, 22, 7, 42, 111, 50, 7]

This definition of a maximum number of different values means that a corresponding number of containers, the buckets, can store the values or, more precisely, their frequency. One bucket is provided for each possible value.

Step 1: Distribution to buckets At first, the initial set of data is traversed, and their occurrence is recorded in the buckets. For the age information above, the distribution is as follows:
bucket[7] = 2
bucket[10] = 1
bucket[22] = 1
bucket[42] = 1
bucket[50] = 2
bucket[111] = 1

All other buckets store the value 0.

Step 2: Preparation of the sorted result In a final step, the buckets are traversed from the beginning. The respective values are inserted into the result as many times as their number is stored in the bucket. This produces this sorting:
result = [7, 7, 10, 22, 42, 50, 50, 111]

9.2.6 Final Thoughts

Many of the more intuitive algorithms, such as Insertion Sort and Selection Sort, possess the disadvantage of having a running time of O(n2). However, Insertion Sort has a positive and remarkable feature: As long as the output data is (nearly) sorted, Insertion Sort becomes extremely performant with O(n).

Quick Sort and Merge Sort are usually very efficient with a running time of O(n . log(n)).1 Still, they also have higher complexity of the source code, especially when working inplace. For frameworks and larger datasets, performance is essential. Potentially unfavorable for Merge Sort, on the other hand, is the creation of many copies of subranges. The same applies to Quick Sort and its partitioning. For both, however, some variants do this inplace. Interestingly, the respective divisions of the subranges to be sorted are quite easy to express by recursion, but the partitioning or merging part is then more complex and more difficult to implement. This holds in particular if you work inplace. For Merge Sort, you will find an example in the provided PyCharm project. For Quick Sort, you may try it in Exercise 6.

Bucket Sort remains. This algorithm sometimes runs even in linear running time, which is O(n). However, in contrast to the other sorting algorithms presented, it is not generally applicable since it has the already mentioned restriction concerning the number of allowed values.

9.3 Exercises

9.3.1 Exercise 1: Contains All (★★✩✩✩)

The task is to create function contains_all(values, search_values) that checks if all passed values are present in the given list. Explicitly do not use the Python standard functionality of all(). Program this yourself.

Examples

Input

Search values

Result

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[7, 2]

True

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[5, 11]

False

9.3.2 Exercise 2: Partitioning (★★★★✩)

The challenge is to suitably sort or arrange a mixed sequence of the letters A and B in a single pass so that all As occur before the Bs. This can also be extended to three letters.

Examples

Input

Result

“ABAABBBAAABBBA”

“AAAAAAABBBBBBB”

“ABACCBBCAACCBBA”

“AAAAABBBBBCCCCC”

Exercise 2a: Partitioning Two Letters (★★★✩✩)

Write function partition2(text) that takes a given sequence built out of the two letters A and B and turns it into an ordered sequence where all As occur before the Bs.

Exercise 2b: Partitioning Three Letters (★★★★✩)

Write function partition3(text) that partitions a sequence built of the three letters A, B, and C given as str into an ordered sequence where all As occur before Bs and they in turn before Cs. Instead of letters, this can be thought of for colors of a flag. Then it is known as the Dutch Flag Problem.

9.3.3 Exercise 3: Binary Search (★★✩✩✩)

Exercise 3a: Binary Search Recursive (★★✩✩✩)

Write recursive function binary_search(values, search_for) that performs a search for the desired value in a sorted list.

Examples

Input

Search values

Result

[1, 2, 3, 4, 5, 7, 8, 9]

5

True

[1, 2, 3, 4, 5, 7, 8, 9]

6

False

Exercise 3b: Binary Search Iterative (★★✩✩✩)

Your task is to convert the recursive function into an iterative one. As a modification it should return the position of the search value or -1 for not found instead of True or False. Additionally, it should be named binary_search_iterative(values, search_value).

Examples

Input

Search values

Result

[1, 2, 3, 4, 5, 7, 8, 9]

5

4

[1, 2, 3, 4, 5, 7, 8, 9]

6

-1

9.3.4 Exercise 4: Insertion Sort (★★✩✩✩)

The introductory Section 9.2.1 showed a simplified, easy-to-follow realization of Insertion Sort. In this exercise, the goal is to optimize the whole thing by now finding the insertion position and performing the necessary swapping and insertion in one go. Write an optimized version of insertion_sort(values).

Example

Input

Result

[7, 2, 5, 1, 6, 8, 9, 4, 3]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

9.3.5 Exercise 5: Selection Sort (★★✩✩✩)

Write a variation of Selection Sort that uses the maximum instead of the minimum and has the following signature: selection_sort_max_inplace(values).

What needs to be modified so that the sort algorithm leaves the original data unchanged and returns a new sorted list? Implement this requirement in function selection_sort_max_copy(values).

Example

Input

Result

[7, 2, 5, 1, 6, 8, 9, 4, 3]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

9.3.6 Exercise 6: Quick Sort (★★★✩✩)

I described Quick Sort in the introductory Section 9.2.4. Whereas the splitting into two ranges with values less than or equal to the pivot elements can be implemented very easily when creating new lists, this is more challenging for lists when performing inplace. You need to implement the partitioning with the function partition(values, left, right). In the following, the already existing source code is shown once again:
def quick_sort_inplace(values):
    quick_sort_in_range(values, 0, len(values) - 1)
def quick_sort_in_range(values, left, right):
    # recursive termination
    if left >= right:
        return
    partition_index = partition(values, left, right)
    # recursive descent
    quick_sort_in_range(values, left, partition_index - 1)
    quick_sort_in_range(values, partition_index + 1, right)

Examples

Input

Result

[5, 2, 7, 1, 4, 3, 6, 8]

[1, 2, 3, 4, 5, 6, 7, 8]

[5, 2, 7, 9, 6, 3, 1, 4, 8]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8]

[1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9]

9.3.7 Exercise 7: Bucket Sort (★★✩✩✩)

In the introductory Section 9.2.5, a Bucket Sort algorithm was described. In this exercise, you want to create function bucket_sort(values, expected_max) that implements this sorting algorithm for a list of values and an expected maximum value.

Example

Input

Maximum value

Result

[10, 50, 22, 7, 42, 111, 50, 7]

150

[7, 7, 10, 22, 42, 50, 50, 111]

9.3.8 Exercise 8: Search in Rotated Data (★★★★✩)

In this exercise, your task is to implement an efficient binary search in a sorted sequence of integer values. The challenge is that the values are ordered but rotated within themselves. According to that, the smallest element may not be at the front of the data. Additionally, the largest element does often not reside at the end of the data (except in the special case of a rotation by 0 positions).

Tip

Be careful also to check the special case of a rotation of 0 or a multiple of the length of the data set that would again correspond to a rotation for the value 0.

Exercise 8a: Flank Change Efficient (★★★★✩)

Write function find_flank_pos(values) that efficiently finds the position of a flank change in a given sorted sequence of n integer values, say 25, 33, 47, 1, 2, 3, 5, 11 in logarithmic time, which is O(log(n)). Write two functions min_value(values) and max_value(values) based on find_flank_pos(values) that, according to their names, determine the minimum and maximum, respectively, from the given sorted but rotated sequence of values.

Examples

Input

Flank position

Minimum

Maximum

[25, 33, 47, 1, 2, 3, 5, 11]

3

1

47

[5, 11, 17, 25, 1, 2]

4

1

25

[6, 1, 2, 3, 4, 5]

1

1

6

[1, 2, 3, 4, 5, 6]

0 (special case)

1

6

Exercise 8b: Binary Search in Rotated Data (★★★★✩)

Write function binary_search_rotated(values, search_for) that efficiently searches in a sorted sequence of integer values, say the number sequence 25, 33, 47, 1, 2, 3, 5, 11, for a given value and returns its position or -1 if not found.

Examples

Input

Flank position

Search value

Result

[25, 33, 47, 1, 2, 3, 5, 11]

3

47

2

[25, 33, 47, 1, 2, 3, 5, 11]

3

3

5

[25, 33, 47, 1, 2, 3, 5, 11]

3

13

-1

[1, 2, 3, 4, 5, 6, 7]

0 (special case)

5

4

[1, 2, 3, 4, 5, 6, 7]

0 (special case)

13

-1

9.4 Solutions

9.4.1 Solution 1: Contains All (★★✩✩✩)

The task is to create function contains_all (values, search_values) that checks if all passed values are present in the given list. Explicitly do not use the Python standard functionality of all(). Program this yourself.

Examples

Input

Search values

Result

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[7, 2]

True

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[5, 11]

False

Algorithm For your implementation, you call the test with in repeatedly, for all elements passed to be checked for containment:
def contains_all(values, search_values):
    for current in search_values:
        if current not in values:
            return False
    return True
Python shortcut Of course this can be written more compactly with the help of all(). Nevertheless, a helper function is probably useful to keep the calling source code as understandable as possible.
def contains_all_v2(values, search_values):
    return all(elem in values for elem in search_values)

Verification

Let’s define a list with the numbers from 0 to 9 and check if the values 7 and 2, as well as 5 and 11, are present there:
@pytest.mark.parametrize("values, search_values, expected",
                         [([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [7, 2], True),
                          ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [5, 11], False)])
def test_contains_all(values, search_values, expected):
    assert contains_all(values, search_values) == expected
@pytest.mark.parametrize("values, search_values, expected",
                         [([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [7, 2], True),
                          ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [5, 11], False)])
def test_contains_all_v2(values, search_values, expected):
    assert contains_all_v2(values, search_values) == expected

9.4.2 Solution 2: Partitioning (★★★★✩)

The challenge is to suitably sort or arrange a mixed sequence of the letters A and B in a single pass so that all As occur before the Bs. This can also be extended to three letters.

Examples

Input

Result

“ABAABBBAAABBBA”

“AAAAAAABBBBBBB”

“ABACCBBCAACCBBA”

“AAAAABBBBBCCCCC”

Solution 2a: Partitioning Two Letters (★★★✩✩)

Write function partition2(text) that takes a given sequence built out of the two letters A and B and turns it into an ordered sequence where all As occur before the Bs.

Algorithm Although you may be initially tempted to compare all possible positions, an ingenious and performant solution exists that solves the task in one pass. Work with two position pointers, low and high, which mark the front and back positions, in this case, the valid range given by the rearmost A and the foremost B. This area is initially empty and grows until you reach the end of the text. The following procedure is used: When an A is found, its position pointer (low) is incremented. When a B is found, it is swapped to the back. Afterwards, the position pointer of the Bs (high) is decreased, expanding the already correctly divided area.
def partition2(char_values):
    low = 0
    high = len(char_values) - 1
    while low <= high:
        if char_values[low] == 'A':
            low += 1
        else:
            swap_positions(char_values, low, high)
            high -= 1
            # low must remain, because theoretically also a
            # B can be exchanged to the front
    return "".join(char_values)
def swap_positions(list, pos1, pos2):
    list[pos1], list[pos2] = list[pos2], list[pos1]

Because a B may also move to the front when swapping, the lower position pointer must stay unchanged. In one of the next steps, the B will then move to the back again. This tricky algorithm makes it possible to arrange all As in front of the Bs in a single pass.

Solution 2b: Partitioning Three Letters (★★★★✩)

Write function partition3(text) that partitions a sequence built of the three letters A, B, and C given as str into an ordered sequence where all As occur before Bs and they in turn before Cs. Instead of letters, this can be thought of for colors of a flag. Then it is known as the Dutch Flag Problem.

Algorithm The extension from two to three letters (or colors) employs similar ideas as before, but with a few more tricks and special treatments. You start again at the beginning of the array or list but use the three position markers low, mid, and high. Initially, they are located for the first and middle character at position 0, the one for high at the end position. If an A is found, the positions for low and mid shift by one to the right. Before that, the last character from the lower range is swapped with the current (middle) one. If you read a B, only the middle position is shifted towards the end. If the current character is a C, this is swapped to the back. The position marker for the upper area is then reduced by 1.
def partition3(char_values):
    low = 0
    mid = 0
    high = len(char_values) - 1
    while mid <= high:
        if char_values[mid] == 'A':
            swap_positions(char_values, low, mid)
            low += 1
            mid += 1
        elif char_values[mid] == 'B':
            mid += 1
        else:
            swap_positions(char_values, mid, high)
            high -= 1
            # low, mid must remain unchanged, because also a B or C
            # can be swapped to the front
    return "".join(char_values)

Verification

To check functionality, you use two strings consisting of a shuffled sequence of the letters A and B or A, B, and C, respectively:
def test_partition2():
    assert partition2(list("ABAABBBAAABBBA")) == "AAAAAAABBBBBBB"
def test_partition3():
    assert partition3(list("ABACCBBCAACCBBA")) == "AAAAABBBBBCCCCC"

9.4.3 Solution 3: Binary Search (★★✩✩✩)

Solution 3a: Binary Search Recursive (★★✩✩✩)

Write recursive function binary_search(values, search_for) that performs a search for the desired value in a sorted list.

Examples

Input

Search values

Result

[1, 2, 3, 4, 5, 7, 8, 9]

5

True

[1, 2, 3, 4, 5, 7, 8, 9]

6

False

Algorithm Divide the list into two halves. Determine the value in the middle and see if you need to search further in the top or bottom half. This can be easily determined based on the given sort order.

Valuecenter == search_for ⇒ found, end

Valuecenter < search_for ⇒ continue searching in lower part

Valuecenter > search_for ⇒ continue searching in upper part

The implementation in Python strictly follows the description. As usual, be especially careful at the boundaries of the list or array to avoid making careless mistakes.
def binary_search(sorted_values, search_for):
    mid_pos = len(sorted_values) // 2
    # recursive termination
    if search_for == sorted_values[mid_pos]:
        return True
    # there are still at least 2 numbers
    if len(sorted_values) > 1:
        if search_for < sorted_values[mid_pos]:
            # recursive descent: search further in the lower part
            lower_half = sorted_values[0: mid_pos]
            return binary_search(lower_half, search_for)
        if search_for > sorted_values[mid_pos]:
            # recursive descent: continue search in the upper part
            upper_half = sorted_values[mid_pos + 1: len(sorted_values)]
            return binary_search(upper_half, search_for)
    return False
To try it out, execute the following code:
def main():
    sorted_values = [1, 2, 3, 4, 5, 7, 8, 9]
    print("Given: ", sorted_values)
    print("search for 5:", binary_search(sorted_values, 5))
    print("search for 6:", binary_search(sorted_values, 6))
The expected result is as follows:
Given:  [1, 2, 3, 4, 5, 7, 8, 9]
search for 5: True
search for 6: False
Optimized algorithm The solution shown is not really optimal because parts of the original data are permanently copied to perform further searches. The entire process can be done completely without potentially time-consuming copying with the help of two index variables. The following solution is certainly preferable:
def binary_search_optimized(values, search_value):
    return binary_search_in_range(values, search_value, 0, len(values) - 1)
def binary_search_in_range(values, search_for, left, right):
    if right >= left:
        mid_idx = (left + right) // 2
        if search_for == values[mid_idx]:
            return True
        # recursive descent: search in the lower / upper part further
        if search_for < values[mid_idx]:
            return binary_search_in_range(values, search_for,
                                          left, mid_idx - 1)
        else:
            return binary_search_in_range(values, search_for,
                                          mid_idx + 1, right)
    return False

Solution 3b: Binary Search Iterative (★★✩✩✩)

Your task is to convert the recursive function into an iterative one. As a modification it should return the position of the search value or -1 for not found instead of True or False. Additionally, it should be named binary_search_iterative(values, search_value).

Examples

Input

Search values

Result

[1, 2, 3, 4, 5, 7, 8, 9]

5

4

[1, 2, 3, 4, 5, 7, 8, 9]

6

-1

Algorithm Based on the recursive version just shown, the iterative implementation may be derived quite easily. Use two position markers left and right for left and right, which initially start at the beginning and end (position 0 and len(values) 1). These two markers determine the respective index boundaries in which further searching is performed. At first, you compare the value in the middle with the one you are searching for. If the values are equal, you return the index. Otherwise, you divide the search area into two parts and continue until either the search is successful or the left and right position markers cross each other.
def binary_search_iterative(values, search_for):
    left = 0
    right = len(values) - 1
    while right >= left:
        mid_idx = (left + right) // 2
        if search_for == values[mid_idx]:
            return mid_idx
        if search_for < values[mid_idx]:
            right = mid_idx - 1
        else:
            left = mid_idx + 1
    return -1

Verification

For testing, you use the following inputs, which show the correct operation:
@pytest.mark.parametrize("sorted_values, search_for, expected",
                         [([1, 2, 3, 4, 5, 7, 8, 9], 5, True),
                          ([1, 2, 3, 4, 5, 7, 8, 9],6, False)])
def test_binary_search(sorted_values, search_for, expected):
    assert binary_search(sorted_values, search_for) == expected
@pytest.mark.parametrize("sorted_values, search_for, expected",
                         [([1, 2, 3, 4, 5, 7, 8, 9], 5, True),
                          ([1, 2, 3, 4, 5, 7, 8, 9], 6, False)])
def test_binary_search_optimized(sorted_values, search_for, expected):
    assert binary_search_optimized(sorted_values, search_for) == expected
@pytest.mark.parametrize("sorted_values, search_for, expected",
                         [([1, 2, 3, 4, 5, 7, 8, 9], 5, 4),
                          ([1, 2, 3, 4, 5, 7, 8, 9], 6, -1)])
def test_binary_search_iterative(sorted_values, search_for, expected):
    assert binary_search_iterative(sorted_values, search_for) == expected

9.4.4 Solution 4: Insertion Sort (★★✩✩✩)

Section 9.2.1 showed a simplified, easy-to-follow realization of Insertion Sort. In this exercise, the goal is to optimize the whole thing by now finding the insertion position and performing the necessary swapping and insertion in one go. Write an optimized version of insertion_sort(values).

Example

Input

Result

[7, 2, 5, 1, 6, 8, 9, 4, 3]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Algorithm For all elements, you perform the following procedure, which is described exemplarily for the value sequence 24317. Let’s consider 3 as a value to be sorted in. You must swap with the left neighbor starting from its position as long as the neighbor’s value is greater than the current one. You have not yet reached the very front in the list. In this case, you swap the 3 only with the 4. Next, you need to swap the 1 all the way to the front. Finally, the 7 is already in the right position.
def insertion_sort(values):
    for i in range(1, len(values)):
        # check if current element is larger than predecessor
        current_idx = i
        while current_idx > 0 and values[current_idx - 1] > values[current_idx]:
            swap_positions(values, current_idx - 1, current_idx)
            current_idx -= 1
def swap_positions(values, pos1, pos2):
    values[pos1], values[pos2] = values[pos2], values[pos1]

The function to swap the values of two positions can be written very compactly in Python using the tuple notation and also still without parentheses.

Verification

Verify that the implementation produces the desired result for the given sequence of numbers using a unit test:
def test_insertion_sort():
    values = [7, 2, 5, 1, 6, 8, 9, 4, 3]
    insertion_sort(values)
    assert values == [1, 2, 3, 4, 5, 6, 7, 8, 9]

9.4.5 Solution 5: Selection Sort (★★✩✩✩)

Write a variation of Selection Sort that uses the maximum instead of the minimum and has the following signature: selection_sort_max_inplace(values).

What needs to be modified so that the sort algorithm leaves the original data unchanged and returns a new sorted list? Implement this requirement in function selection_sort_max_copy(values).

Example

Input

Result

[7, 2, 5, 1, 6, 8, 9, 4, 3]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Algorithm The list to be sorted is traversed from back to front while the largest element in each case is moved back to the current position. By calling the function find_max_pos(), you determine the position of the maximum from the remaining unsorted subrange. This function was created as a solution to Exercise 11 in Section 6.3.11 for arrays; it can also be used for lists without modification. Subsequently, the element is moved to the back accordingly by swapping it with the current element. This reduces the size of the remaining, not-yet-sorted part until it consists only of the foremost element.
def selection_sort_max_inplace(values):
    for i in range(len(values) - 1, 0, -1):
        max_pos = find_max_pos(values, 0, i + 1)
        swap_positions(values, max_pos, i)
The function with the copy functionality is trivial to implement if you have created the previous function:
def selection_sort_max_copy(values):
    copy = list(values)
    selection_sort_max_inplace(copy)
    return copy

Verification

Verify that the implementation produces the desired result for the given sequence of numbers using a unit test:
def test_selection_sort_max_inplace():
    values = [7, 2, 5, 1, 6, 8, 9, 4, 3]
    selection_sort_max_inplace(values)
    assert values == [1, 2, 3, 4, 5, 6, 7, 8, 9]

9.4.6 Solution 6: Quick Sort (★★★✩✩)

I described Quick Sort in the introductory Section 9.2.4. Whereas the splitting into two ranges with values less than or equal to the pivot elements can be implemented very easily when creating new lists, this is more challenging for a list when performing inplace. Now the partitioning is to be implemented with the function partition(values, left, right). The already existing source code is shown once again:
def quick_sort_inplace(values):
    quick_sort_in_range(values, 0, len(values) - 1)
def quick_sort_in_range(values, left, right):
    # recursive termination
    if left >= right:
        return
    partition_index = partition(values, left, right)
    # recursive descent
    quick_sort_in_range(values, left, partition_index - 1)
    quick_sort_in_range(values, partition_index + 1, right)

Examples

Input

Result

[5, 2, 7, 1, 4, 3, 6, 8]

[1, 2, 3, 4, 5, 6, 7, 8]

[5, 2, 7, 9, 6, 3, 1, 4, 8]

[1, 2, 3, 4, 5, 6, 7, 8, 9]

[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8]

[1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9]

Algorithm Your goal is to subdivide an array or a list (or a range of them) into two parts by passing the lower start and upper end index and choosing a value at a special position (e. g., foremost element) as the pivot element. Now the two parts are rearranged. All elements with values smaller than or equal to the pivot element should reside in the lower part. Furthermore, all elements with a value larger than the pivot element should reside in the upper part. Here, the two indices left_index and right_index each move inward as long as the conditions values[left_index] <= pivot hold for left and pivot < values[right_index] for right. If an inappropriately ordered element is found on the left side, the examination starts on the right side. If an inappropriately ordered element is found here as well, the two are swapped. This process is repeated as long as the position markers do not cross each other. Finally, the element from the right_index position is swapped with the pivot element. There is also the special case that the array or list has only two elements. In this case, you also have to make sure that the right value is actually larger than that of the pivot.
def partition(values, left, right):
    pivot = values[left]
    left_index = left + 1
    right_index = right
    while left_index < right_index:
        # move the position left_index to the right, as long as value
        # less than or equal to pivot and left limit less than right limit
        while values[left_index] <= pivot and left_index < right_index:
            left_index += 1
        # move the position right_index to the left, as long as value greater
        # than pivot and right limit greater than or equal to left limit
        while pivot < values[right_index] and right_index >= left_index:
            right_index -= 1
        if left_index < right_index:
            swap_positions(values, left_index, right_index)
    # special case 2-element list with wrong sorting, but no
    # pass (left_index ==  right_index) as well as normal case at the very end
    if values[right_index] < pivot:
        swap_positions(values, left, right_index)
    return right_index

Verification

Let’s define the three lists from the introductory examples and use them to check the implementation of Quick Sort:
@pytest.mark.parametrize("values, expected",
                         [([5, 2, 7, 1, 4, 3, 6, 8],
                           [1, 2, 3, 4, 5, 6, 7, 8]),
                          ([5, 2, 7, 9, 6, 3, 1, 4, 8],
                           [1, 2, 3, 4, 5, 6, 7, 8, 9]),
                          [[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8],
                           [1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9]]])
def test_quick_sort_inplace(values, expected):
    quick_sort_inplace(values)
    assert values == expected

9.4.7 Solution 7: Bucket Sort (★★✩✩✩)

In the introductory Section 9.2.5, a Bucket Sort algorithm was described. In this exercise, you want to create function bucket_sort(values, expected_max) that implements this sorting algorithm for a list of values and an expected maximum value.

Example

Input

Maximum value

Result

[10, 50, 22, 7, 42, 111, 50, 7]

150

[7, 7, 10, 22, 42, 50, 50, 111]

Algorithm Bucket Sort is one of the most straightforward sorting algorithms to implement and also one of the fastest with linear running time—but with the prerequisite of a limited range of values.

First, you create buckets that store the counts of values. Afterwards, Bucket Sort is implemented in two steps:
  1. 1.

    Traverse all input values and assign them to the corresponding buckets. If there are several same elements, you have to increment the counter.

     
  2. 2.

    The final step is to reconstruct the values based on the counter values.

     
The described procedure is implemented in Python as follows:
def bucket_sort(values, expected_max):
    buckets = [0] * (expected_max + 1)
    collect_into_buckets(values, buckets)
    results = [0] * len(values)
    fill_result_from_buckets(buckets, results)
    return results
The algorithm is thereby described in its basic characteristics. Only the implementation of the two helper functions remains, which is also done straightforwardly. To calculate the count of the respective numbers, you have to iterate through the original values and increment the counter in the bucket corresponding to the current value.
def collect_into_buckets(values, buckets):
    for current in values:
        buckets[current] += 1
Based on the quantities in the buckets, the generation of the result is just a little bit more complex. For this purpose, you traverse all buckets. If index i contains a quantity greater than 0, this index value has to be copied to the target as often as specified there— in this case, solved by the while loop. You only have to carry the position in the target list separately.
def fill_result_from_buckets(buckets, results):
    result_pos = 0
    for i, count in enumerate(buckets):
        while count > 0:
            results[result_pos] = i
            count -= 1
            result_pos += 1

Verification

Write a short test function to check your implementation of Bucket Sort with some values:
@pytest.mark.parametrize("values, max, expected",
                         [([10, 50, 22, 7, 42, 111, 50, 7], 150,
                           [7, 7, 10, 22, 42, 50, 50, 111]),
                          ([10, 50, 22, 7, 42, 111, 50, 7], 120,
                           [7, 7, 10, 22, 42, 50, 50, 111]),
                          [[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8], 10,
                           [1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9]]])
def test_bucket_sort(values, max, expected):
    result = bucket_sort(values, max)
    assert result == expected

9.4.8 Solution 8: Search in Rotated Data (★★★★✩)

In this exercise, your task is to implement an efficient binary search in a sorted sequence of integer values. The challenge is that the values are ordered but rotated within themselves. According to that, the smallest element may not be at the front of the data. Additionally, the largest element does often not reside at the end of the data (except in the special case of a rotation by 0 positions).

Tip

Be careful also to check the special case of a rotation of 0 or a multiple of the length of the data set that would again correspond to a rotation for the value 0.

Solution 8a: Flank Change Efficient (★★★★✩)

Write function find_flank_pos(values) that efficiently finds the position of a flank change in a given sorted sequence of n integer values, say 25, 33, 47, 1, 2, 3, 5, 11, in logarithmic time, which is O(log(n)). Write two functions min_value(values) and max_value(values) based on find_flank_pos(values) that, according to their names, determine the minimum and maximum, respectively, from the given sorted but rotated sequence of values.

Examples

Input

Flank position

Minimum

Maximum

[25, 33, 47, 1, 2, 3, 5, 11]

3

1

47

[5, 11, 17, 25, 1, 2]

4

1

25

[6, 1, 2, 3, 4, 5]

1

1

6

[1, 2, 3, 4, 5, 6]

0 (special case)

1

6

Preliminary considerations for the algorithm Let’s start with the brute-force version of linear search to check your optimized version against it later on. For the search, you only need to check each element from front to back to determine if the successor of a value is smaller than the current element:
def find_flank_pos_simple(values):
    for i, value in enumerate(values):
        next_idx = (i + 1) % len(values)
        if value > values[next_idx]:
            return next_idx
    raise Exception("should never reach here!")

Of course, when traversing, you also have to consider the special case that the flank change takes place at the very end of the list. Then you have a non-rotated list as a base.

Algorithm So, how can you proceed to achieve logarithmic running time? In this case, you take advantage of the fact that the value sequence is sorted. The search ranges can always be divided in half, following the idea of binary search. Because there is a rotation, however, you must be careful concerning the indices.

There are three comparisons to be made:
  • Case A: With the predecessor: If it is larger, you have found the flank change.

  • Case B: With the leftmost element: If it is larger than the current element, then the flank change must happen somewhere in between. So, you can exclude the right half.

  • Case C: With the rightmost element: If this is smaller, the flank change must happen on the right side. You can exclude the left half.

At the very beginning, it is crucial to check for the special case of the non-rotated initial dataset. This can be determined by the fact that the far left value is smaller than that on the far right.

With these preliminary considerations, the following implementation emerges:
def find_flank_pos(values):
    return find_flank_pos_in_range(values, 0, len(values) - 1)
def find_flank_pos_in_range(values, left, right):
    mid_pos = left + (right - left) // 2
    mid_value = values[mid_pos]
    # special case no rotation
    if values[left] < values[right]:
        return 0
    prev_index = mid_pos - 1
    if prev_index < 0:
        prev_index = len(values) - 1
    # case A: value to the left of this is larger, then you got a
    # flank change
    if values[prev_index] > mid_value:
        return mid_pos
    if values[left] > mid_value:
        # case B: flank change must be on the left, since first value
        # larger than in the middle
        return find_flank_pos_in_range(values, left, mid_pos + 1)
    if values[right] < mid_value:
        # case C: flank change must be on the right, as last value
        # smaller than in the middle
        return find_flank_pos_in_range(values, mid_pos + 1, right)
    raise Exception("should not reach here")

Based on this method, it is possible to write the functions for determining minimum and maximum quite simply as follows with the knowledge that the position of the flank change contains the minimum and the position of the maximum is a position to the left of it.

Due to the convenient Python characteristic of supporting negative indexes for access from the end of the data structure, no correction needs to be made for a rotation around 0.
def min_value(values):
    flank_pos = find_flank_pos(values)
    return values[flank_pos]
def max_value(values):
    flank_pos = find_flank_pos(values)
    return values[(flank_pos - 1) % len(values)]

Verification

Test the determination of the flank change using the following parameterized test— in particular, also the special case of non-rotated input values is verified:
@pytest.mark.parametrize("values, expected",
                         [([25, 33, 47, 1, 2, 3, 5, 11], 3),
                          ([6, 7, 1, 2, 3, 4, 5], 2),
                          ([1, 2, 3, 4, 5, 6, 7], 0)])
def test_find_flank_pos(values, expected):
    flank_pos = find_flank_pos(values)
    assert flank_pos == expected

Solution 8b: Binary Search in Rotated Data (★★★★✩)

Write function binary_search _rotated(values, search_for) that efficiently searches in a sorted sequence of integer values, say the number sequence 25, 33, 47, 1, 2, 3, 5, 11, for a given value and returns its position or -1 if not found.

Examples

Input

Flank position

Search value

Result

[25, 33, 47, 1, 2, 3, 5, 11]

3

47

2

[25, 33, 47, 1, 2, 3, 5, 11]

3

3

5

[25, 33, 47, 1, 2, 3, 5, 11]

3

13

-1

[1, 2, 3, 4, 5, 6, 7]

0 (special case)

5

4

[1, 2, 3, 4, 5, 6, 7]

0 (special case)

13

-1

Algorithm After being able to efficiently determine the flank change in O(log(n)), one possibility is to enlarge the list. Thereby you cut out the front part of the list and appends it at the end (this is feasible for medium-sized lists). Afterwards, you can invoke a binary search, which was developed in Exercise 3:
25 | 27 | 33 | 2 | 3 | 5     =>   | 2 | 3 | 5 | 25 | 27 | 33

However, this procedure causes quite a bit of effort. So how can you improve it?

For this purpose, you adapt the binary search to specify a lower and upper bound. You pick up the idea of the list expansion but make it virtual. Let’s take a look at the example of the search for the 47 in the number sequence shown in the exercise, shown in Figure 9-4.
Figure 9-4

Rotated binary search procedure

Based on these preliminary ideas, you proceed with the binary search. First, you determine the position of the flank change and use it to specify your search value range. Next, you perform a normal binary search, but you use the modulo operator to bring the extended value range back into the list’s boundaries and determine the comparison value based on this.
def binary_search_rotated(values, search_for):
    flank_pos = find_flank_pos(values)
    return binary_search_rotated_in_range(values, search_for, flank_pos,
                                          flank_pos - 1 + len(values))
def binary_search_rotated_in_range(values, search_for, start, end):
    if start > end:
        return -1
    mid_pos = start + (end - start) // 2
    adjusted_mid = mid_pos % len(values)
    mid_value = values[adjusted_mid]
    if mid_value == search_for:
        return adjusted_mid
    if search_for < mid_value:
        return binary_search_rotated_in_range(values, search_for,
                                              start, mid_pos - 1)
    if search_for > mid_value:
        return binary_search_rotated_in_range(values, search_for,
                                              mid_pos + 1, end)

Python-specific algorithm I have just described a general algorithm. In Python, you can take advantage of the fact that the indices can also be negative and then operate from the end of the list or array.

As before, you take up the idea of performing the binary search with shifted index. Instead of the virtual extension of the output data and the mapping back by modulo to the real value range, you simply use that the value range is shifted by n positions. Thus, instead of searching between 0 and n, you can search in the range between flankpos−n and flankpos 1. Then, to make it work with the termination condition, you need to move it after checking for value equality. Additionally, you need to check for matching start and end. In the actual call, you need to calculate the start and end positions appropriately and cover the special case of no rotation.
def binary_search_rotated2(values, search_for):
    flank_pos = find_flank_pos(values)
    start = flank_pos - len(values)
    end = flank_pos - 1
    if flank_pos == 0:
        start = 0
        end = len(values) - 1
    return binary_search_rotated_helper2(values, search_for, start, end)
def binary_search_rotated_helper2(values, search_for, start, end):
    mid_pos = start + (end - start) // 2
    mid_value = values[mid_pos]
    if mid_value == search_for:
        return mid_pos % len(values)
    if start == end:
        return -1
    if search_for < mid_value:
        return binary_search_rotated_helper2(values, search_for,
                                             start, mid_pos - 1)
    if search_for > mid_value:
        return binary_search_rotated_helper2(values, search_for,
                                             mid_pos + 1, end)

Verification

To check the functionality, you use the value combinations from the introductory example:
@pytest.mark.parametrize("values, search_for, expected",
                         [([25, 33, 47, 1, 2, 3, 5, 11], 47, 2),
                          ([25, 33, 47, 1, 2, 3, 5, 11], 3, 5),
                          ([25, 33, 47, 1, 2, 3, 5, 11], 13, -1),
                          ([1, 2, 3, 4, 5, 6, 7], 5, 4),
                          ([1, 2, 3, 4, 5, 6, 7], 13, -1)])
def test_binary_search_rotated(values, search_for, expected):
    pos = binary_search_rotated(values, search_for)
    assert pos == expected

9.5 Summary: What You Learned

Even if you will hardly ever program a search or sorting algorithm nowadays yourself, it is still helpful for algorithmic understanding to have done this once.

Simple implementations of (linear) searches tend to have a running time of O(n). You learned how you can benefit from sorted data sets helping you to use binary search as a trickier search and reducing the running time down to O(log(n)).

Similar observations apply for sorting: While naive implementations often have a running time of O(n2), this can usually be reduced to O(n × log(n)) with Merge Sort and Quick Sort. It is fascinating to see how a fixed range of values can have a significant effect. Bucket Sort with a running time of O(n) plays out its strengths with these constraints.

As a nice challenge for the end, you solved a binary search in rotated data sets, where the values are sorted but shifted by some positions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset