Searching and sorting are two elementary topics of computer science in the field of algorithms and data structures. Python provides efficient implementations for both of them and thus takes a lot of work off your shoulders. However, understanding the underlying algorithms helps in choosing the most suitable variant for a particular use case. I only skim over the topic of searching here since it is built in and does not offer a lot of variations, except for binary search, which is covered in section 9.1.1.
In this chapter, you will primarily dedicate yourself to some essential sorting algorithms because you will learn some algorithmic tricks in the meantime.
9.1 Introduction Search
When managing data, sometimes you need to search for items, such as customers with the first name “Carsten” or an invoice with specific order date. Conveniently, all containers like lists, sets and dictionaries offer various methods or functions, such as those with which you can search for elements.
9.1.1 Search with in(), index(), and count()
9.1.2 Search with rindex() and rfind()
9.1.3 Binary Search
The same applies to the search methods just mentioned. Even if not directly visible, all of them iteratively look at all elements of the data structure until they find what they are looking for. They search all result in a linear running time. In contrast, there is an efficient search called binary search, which offers logarithmic running time. But there is a prerequisite: binary search requires sorted data. If you have to sort data explicitly first, then the advantage over a linear search is hardly given, especially with small datasets.
In the figure, the arrow points between the elements in the first step—depending on the implementation of binary_search(), the left or right element directly adjacent to the center is used for comparison if the number is even.
9.2 Introduction Sort
In this section, I introduce some sorting algorithms that form the basis for the later exercises.
9.2.1 Insertion Sort
In the example, you start with the value 2. For each number, you have to determine the correct insertion position. There are two ways to do this, as described below.
Determine Insertion Position
When sorting elements of the same value, keeping their original order in the collection is referred to as a Stable Sort . This is often a preferable behavior because it prevents data associated with the elements from getting out of order.
This is due to the fact that a most recently found element of the same value is always placed behind all elements of the same value.
Implementation of Insertion Sort
This code shows a well-understandable implementation that focuses on comprehensibility and not on speed. In fact, it is possible to combine some actions cleverly and thus avoid multiple runs. Later, in Exercise 4, you will deal with exactly this optimization.
9.2.2 Selection Sort
Selection Sort is another intuitive method for sorting. It offers two variations, one based on the minimum and the other on the maximum. In the minimum version, the values to be sorted are traversed from front to back. In each step, the minimum is determined from the section that is still unsorted. This is moved forward by swapping it with the current element. This causes the sorted area to grow from the front and the remaining unsorted section to shrink. For the version based on the maximum, the data to be sorted is processed from the back to the front. The respective maximum is placed at the end so that the sorted area grows from the back.
If you only look at algorithms at this low level, it is usually difficult to understand and comprehend them. Of course, the final algorithms used in frameworks must be as optimal as possible. This requires estimations with the O-notation. This is easier to perform on the low level than on the high level since then all constructs, including invoked functions or methods, must be considered. However, to learn and get started, it is much more suitable to program comprehensively first and then optimize in further steps.
How can Selection Sort be described on a higher level of abstraction? To do this, let’s use some auxiliary functions that you created for arrays. In the corresponding chapter’s introduction, you learned about the method swap() for swapping elements and the function find_min_pos() for finding the position of the smallest element, which was created as solution for Exercise 11 in Section 6.3.11. Conveniently, they can also be used for lists without modification.
9.2.3 Merge Sort
Finally, I would like to point out that the limit at which one should switch to Insertion Sort has been set here quite arbitrarily to the value 5. Presumably, values between 10 and 20 elements are quite practical. However, it would be best if you rely on the knowledge of algorithm professionals who create mathematically sound estimates for running times.
9.2.4 Quick Sort
Let’s start with an implementation for lists since this is more easily accessible and understandable. As a result, breaking down the contents of a list into smaller and larger elements is easy to implement. Later, combining the results of the recursive computations is also straightforward. The whole implementation is intentional, not optimized for speed but for comprehensibility.
The whole thing is quite intuitive for lists and when not optimized for performance. It becomes considerably more awkward if you want to realize the partitioning for inplace (i. e., directly in the original array or list itself). You can see this for yourself later when solving Exercise 6. You will now take a look at the basic procedure.
Inplace Implementation
9.2.5 Bucket Sort
Bucket Sort is an interesting sorting method whose algorithm is only outlined below since the implementation is the subject of Exercise 7.
This definition of a maximum number of different values means that a corresponding number of containers, the buckets, can store the values or, more precisely, their frequency. One bucket is provided for each possible value.
All other buckets store the value 0.
9.2.6 Final Thoughts
Many of the more intuitive algorithms, such as Insertion Sort and Selection Sort, possess the disadvantage of having a running time of O(n2). However, Insertion Sort has a positive and remarkable feature: As long as the output data is (nearly) sorted, Insertion Sort becomes extremely performant with O(n).
Quick Sort and Merge Sort are usually very efficient with a running time of O(n . log(n)).1 Still, they also have higher complexity of the source code, especially when working inplace. For frameworks and larger datasets, performance is essential. Potentially unfavorable for Merge Sort, on the other hand, is the creation of many copies of subranges. The same applies to Quick Sort and its partitioning. For both, however, some variants do this inplace. Interestingly, the respective divisions of the subranges to be sorted are quite easy to express by recursion, but the partitioning or merging part is then more complex and more difficult to implement. This holds in particular if you work inplace. For Merge Sort, you will find an example in the provided PyCharm project. For Quick Sort, you may try it in Exercise 6.
Bucket Sort remains. This algorithm sometimes runs even in linear running time, which is O(n). However, in contrast to the other sorting algorithms presented, it is not generally applicable since it has the already mentioned restriction concerning the number of allowed values.
9.3 Exercises
9.3.1 Exercise 1: Contains All (★★✩✩✩)
The task is to create function contains_all(values, search_values) that checks if all passed values are present in the given list. Explicitly do not use the Python standard functionality of all(). Program this yourself.
Examples
Input | Search values | Result |
---|---|---|
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | [7, 2] | True |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | [5, 11] | False |
9.3.2 Exercise 2: Partitioning (★★★★✩)
The challenge is to suitably sort or arrange a mixed sequence of the letters A and B in a single pass so that all As occur before the Bs. This can also be extended to three letters.
Examples
Input | Result |
---|---|
“ABAABBBAAABBBA” | “AAAAAAABBBBBBB” |
“ABACCBBCAACCBBA” | “AAAAABBBBBCCCCC” |
Exercise 2a: Partitioning Two Letters (★★★✩✩)
Write function partition2(text) that takes a given sequence built out of the two letters A and B and turns it into an ordered sequence where all As occur before the Bs.
Exercise 2b: Partitioning Three Letters (★★★★✩)
Write function partition3(text) that partitions a sequence built of the three letters A, B, and C given as str into an ordered sequence where all As occur before Bs and they in turn before Cs. Instead of letters, this can be thought of for colors of a flag. Then it is known as the Dutch Flag Problem.
9.3.3 Exercise 3: Binary Search (★★✩✩✩)
Exercise 3a: Binary Search Recursive (★★✩✩✩)
Write recursive function binary_search(values, search_for) that performs a search for the desired value in a sorted list.
Examples
Input | Search values | Result |
---|---|---|
[1, 2, 3, 4, 5, 7, 8, 9] | 5 | True |
[1, 2, 3, 4, 5, 7, 8, 9] | 6 | False |
Exercise 3b: Binary Search Iterative (★★✩✩✩)
Your task is to convert the recursive function into an iterative one. As a modification it should return the position of the search value or -1 for not found instead of True or False. Additionally, it should be named binary_search_iterative(values, search_value).
Examples
Input | Search values | Result |
---|---|---|
[1, 2, 3, 4, 5, 7, 8, 9] | 5 | 4 |
[1, 2, 3, 4, 5, 7, 8, 9] | 6 | -1 |
9.3.4 Exercise 4: Insertion Sort (★★✩✩✩)
The introductory Section 9.2.1 showed a simplified, easy-to-follow realization of Insertion Sort. In this exercise, the goal is to optimize the whole thing by now finding the insertion position and performing the necessary swapping and insertion in one go. Write an optimized version of insertion_sort(values).
Example
Input | Result |
---|---|
[7, 2, 5, 1, 6, 8, 9, 4, 3] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
9.3.5 Exercise 5: Selection Sort (★★✩✩✩)
Write a variation of Selection Sort that uses the maximum instead of the minimum and has the following signature: selection_sort_max_inplace(values).
What needs to be modified so that the sort algorithm leaves the original data unchanged and returns a new sorted list? Implement this requirement in function selection_sort_max_copy(values).
Example
Input | Result |
---|---|
[7, 2, 5, 1, 6, 8, 9, 4, 3] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
9.3.6 Exercise 6: Quick Sort (★★★✩✩)
Examples
Input | Result |
---|---|
[5, 2, 7, 1, 4, 3, 6, 8] | [1, 2, 3, 4, 5, 6, 7, 8] |
[5, 2, 7, 9, 6, 3, 1, 4, 8] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8] | [1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9] |
9.3.7 Exercise 7: Bucket Sort (★★✩✩✩)
In the introductory Section 9.2.5, a Bucket Sort algorithm was described. In this exercise, you want to create function bucket_sort(values, expected_max) that implements this sorting algorithm for a list of values and an expected maximum value.
Example
Input | Maximum value | Result |
---|---|---|
[10, 50, 22, 7, 42, 111, 50, 7] | 150 | [7, 7, 10, 22, 42, 50, 50, 111] |
9.3.8 Exercise 8: Search in Rotated Data (★★★★✩)
In this exercise, your task is to implement an efficient binary search in a sorted sequence of integer values. The challenge is that the values are ordered but rotated within themselves. According to that, the smallest element may not be at the front of the data. Additionally, the largest element does often not reside at the end of the data (except in the special case of a rotation by 0 positions).
Be careful also to check the special case of a rotation of 0 or a multiple of the length of the data set that would again correspond to a rotation for the value 0.
Exercise 8a: Flank Change Efficient (★★★★✩)
Write function find_flank_pos(values) that efficiently finds the position of a flank change in a given sorted sequence of n integer values, say 25, 33, 47, 1, 2, 3, 5, 11 in logarithmic time, which is O(log(n)). Write two functions min_value(values) and max_value(values) based on find_flank_pos(values) that, according to their names, determine the minimum and maximum, respectively, from the given sorted but rotated sequence of values.
Examples
Input | Flank position | Minimum | Maximum |
---|---|---|---|
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 1 | 47 |
[5, 11, 17, 25, 1, 2] | 4 | 1 | 25 |
[6, 1, 2, 3, 4, 5] | 1 | 1 | 6 |
[1, 2, 3, 4, 5, 6] | 0 (special case) | 1 | 6 |
Exercise 8b: Binary Search in Rotated Data (★★★★✩)
Write function binary_search_rotated(values, search_for) that efficiently searches in a sorted sequence of integer values, say the number sequence 25, 33, 47, 1, 2, 3, 5, 11, for a given value and returns its position or -1 if not found.
Examples
Input | Flank position | Search value | Result |
---|---|---|---|
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 47 | 2 |
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 3 | 5 |
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 13 | -1 |
[1, 2, 3, 4, 5, 6, 7] | 0 (special case) | 5 | 4 |
[1, 2, 3, 4, 5, 6, 7] | 0 (special case) | 13 | -1 |
9.4 Solutions
9.4.1 Solution 1: Contains All (★★✩✩✩)
The task is to create function contains_all (values, search_values) that checks if all passed values are present in the given list. Explicitly do not use the Python standard functionality of all(). Program this yourself.
Examples
Input | Search values | Result |
---|---|---|
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | [7, 2] | True |
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] | [5, 11] | False |
Verification
9.4.2 Solution 2: Partitioning (★★★★✩)
The challenge is to suitably sort or arrange a mixed sequence of the letters A and B in a single pass so that all As occur before the Bs. This can also be extended to three letters.
Examples
Input | Result |
---|---|
“ABAABBBAAABBBA” | “AAAAAAABBBBBBB” |
“ABACCBBCAACCBBA” | “AAAAABBBBBCCCCC” |
Solution 2a: Partitioning Two Letters (★★★✩✩)
Write function partition2(text) that takes a given sequence built out of the two letters A and B and turns it into an ordered sequence where all As occur before the Bs.
Because a B may also move to the front when swapping, the lower position pointer must stay unchanged. In one of the next steps, the B will then move to the back again. This tricky algorithm makes it possible to arrange all As in front of the Bs in a single pass.
Solution 2b: Partitioning Three Letters (★★★★✩)
Write function partition3(text) that partitions a sequence built of the three letters A, B, and C given as str into an ordered sequence where all As occur before Bs and they in turn before Cs. Instead of letters, this can be thought of for colors of a flag. Then it is known as the Dutch Flag Problem.
Verification
9.4.3 Solution 3: Binary Search (★★✩✩✩)
Solution 3a: Binary Search Recursive (★★✩✩✩)
Write recursive function binary_search(values, search_for) that performs a search for the desired value in a sorted list.
Examples
Input | Search values | Result |
---|---|---|
[1, 2, 3, 4, 5, 7, 8, 9] | 5 | True |
[1, 2, 3, 4, 5, 7, 8, 9] | 6 | False |
Algorithm Divide the list into two halves. Determine the value in the middle and see if you need to search further in the top or bottom half. This can be easily determined based on the given sort order.
Valuecenter == search_for ⇒ found, end
Valuecenter < search_for ⇒ continue searching in lower part
Valuecenter > search_for ⇒ continue searching in upper part
Solution 3b: Binary Search Iterative (★★✩✩✩)
Your task is to convert the recursive function into an iterative one. As a modification it should return the position of the search value or -1 for not found instead of True or False. Additionally, it should be named binary_search_iterative(values, search_value).
Examples
Input | Search values | Result |
---|---|---|
[1, 2, 3, 4, 5, 7, 8, 9] | 5 | 4 |
[1, 2, 3, 4, 5, 7, 8, 9] | 6 | -1 |
Verification
9.4.4 Solution 4: Insertion Sort (★★✩✩✩)
Section 9.2.1 showed a simplified, easy-to-follow realization of Insertion Sort. In this exercise, the goal is to optimize the whole thing by now finding the insertion position and performing the necessary swapping and insertion in one go. Write an optimized version of insertion_sort(values).
Example
Input | Result |
---|---|
[7, 2, 5, 1, 6, 8, 9, 4, 3] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
The function to swap the values of two positions can be written very compactly in Python using the tuple notation and also still without parentheses.
Verification
9.4.5 Solution 5: Selection Sort (★★✩✩✩)
Write a variation of Selection Sort that uses the maximum instead of the minimum and has the following signature: selection_sort_max_inplace(values).
What needs to be modified so that the sort algorithm leaves the original data unchanged and returns a new sorted list? Implement this requirement in function selection_sort_max_copy(values).
Example
Input | Result |
---|---|
[7, 2, 5, 1, 6, 8, 9, 4, 3] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
Verification
9.4.6 Solution 6: Quick Sort (★★★✩✩)
Examples
Input | Result |
---|---|
[5, 2, 7, 1, 4, 3, 6, 8] | [1, 2, 3, 4, 5, 6, 7, 8] |
[5, 2, 7, 9, 6, 3, 1, 4, 8] | [1, 2, 3, 4, 5, 6, 7, 8, 9] |
[5, 2, 7, 9, 6, 3, 1, 4, 2, 3, 8] | [1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9] |
Verification
9.4.7 Solution 7: Bucket Sort (★★✩✩✩)
In the introductory Section 9.2.5, a Bucket Sort algorithm was described. In this exercise, you want to create function bucket_sort(values, expected_max) that implements this sorting algorithm for a list of values and an expected maximum value.
Example
Input | Maximum value | Result |
---|---|---|
[10, 50, 22, 7, 42, 111, 50, 7] | 150 | [7, 7, 10, 22, 42, 50, 50, 111] |
Algorithm Bucket Sort is one of the most straightforward sorting algorithms to implement and also one of the fastest with linear running time—but with the prerequisite of a limited range of values.
- 1.
Traverse all input values and assign them to the corresponding buckets. If there are several same elements, you have to increment the counter.
- 2.
The final step is to reconstruct the values based on the counter values.
Verification
9.4.8 Solution 8: Search in Rotated Data (★★★★✩)
In this exercise, your task is to implement an efficient binary search in a sorted sequence of integer values. The challenge is that the values are ordered but rotated within themselves. According to that, the smallest element may not be at the front of the data. Additionally, the largest element does often not reside at the end of the data (except in the special case of a rotation by 0 positions).
Be careful also to check the special case of a rotation of 0 or a multiple of the length of the data set that would again correspond to a rotation for the value 0.
Solution 8a: Flank Change Efficient (★★★★✩)
Write function find_flank_pos(values) that efficiently finds the position of a flank change in a given sorted sequence of n integer values, say 25, 33, 47, 1, 2, 3, 5, 11, in logarithmic time, which is O(log(n)). Write two functions min_value(values) and max_value(values) based on find_flank_pos(values) that, according to their names, determine the minimum and maximum, respectively, from the given sorted but rotated sequence of values.
Examples
Input | Flank position | Minimum | Maximum |
---|---|---|---|
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 1 | 47 |
[5, 11, 17, 25, 1, 2] | 4 | 1 | 25 |
[6, 1, 2, 3, 4, 5] | 1 | 1 | 6 |
[1, 2, 3, 4, 5, 6] | 0 (special case) | 1 | 6 |
Of course, when traversing, you also have to consider the special case that the flank change takes place at the very end of the list. Then you have a non-rotated list as a base.
Algorithm So, how can you proceed to achieve logarithmic running time? In this case, you take advantage of the fact that the value sequence is sorted. The search ranges can always be divided in half, following the idea of binary search. Because there is a rotation, however, you must be careful concerning the indices.
Case A: With the predecessor: If it is larger, you have found the flank change.
Case B: With the leftmost element: If it is larger than the current element, then the flank change must happen somewhere in between. So, you can exclude the right half.
Case C: With the rightmost element: If this is smaller, the flank change must happen on the right side. You can exclude the left half.
At the very beginning, it is crucial to check for the special case of the non-rotated initial dataset. This can be determined by the fact that the far left value is smaller than that on the far right.
Based on this method, it is possible to write the functions for determining minimum and maximum quite simply as follows with the knowledge that the position of the flank change contains the minimum and the position of the maximum is a position to the left of it.
Verification
Solution 8b: Binary Search in Rotated Data (★★★★✩)
Write function binary_search _rotated(values, search_for) that efficiently searches in a sorted sequence of integer values, say the number sequence 25, 33, 47, 1, 2, 3, 5, 11, for a given value and returns its position or -1 if not found.
Examples
Input | Flank position | Search value | Result |
---|---|---|---|
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 47 | 2 |
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 3 | 5 |
[25, 33, 47, 1, 2, 3, 5, 11] | 3 | 13 | -1 |
[1, 2, 3, 4, 5, 6, 7] | 0 (special case) | 5 | 4 |
[1, 2, 3, 4, 5, 6, 7] | 0 (special case) | 13 | -1 |
However, this procedure causes quite a bit of effort. So how can you improve it?
Python-specific algorithm I have just described a general algorithm. In Python, you can take advantage of the fact that the indices can also be negative and then operate from the end of the list or array.
Verification
9.5 Summary: What You Learned
Even if you will hardly ever program a search or sorting algorithm nowadays yourself, it is still helpful for algorithmic understanding to have done this once.
Simple implementations of (linear) searches tend to have a running time of O(n). You learned how you can benefit from sorted data sets helping you to use binary search as a trickier search and reducing the running time down to O(log(n)).
Similar observations apply for sorting: While naive implementations often have a running time of O(n2), this can usually be reduced to O(n × log(n)) with Merge Sort and Quick Sort. It is fascinating to see how a fixed range of values can have a significant effect. Bucket Sort with a running time of O(n) plays out its strengths with these constraints.
As a nice challenge for the end, you solved a binary search in rotated data sets, where the values are sorted but shifted by some positions.