Profiling with timeit

timeit is a module that allows you to time pieces of code. It is part of the standard Python library. We will time the NumPy sort function with several different array sizes. The classic quicksort and merge sort algorithms have an average running time of O(nlogn); so we will try to fit our result to such a model.

How to do it...

We will require arrays to sort.

  1. Create arrays to sort.

    We will create arrays of varying sizes containing random integer values:

    times = numpy.array([])
    
    for size in sizes:
        integers = numpy.random.random_integers(1, 10 ** 6, size)
  2. Measure time.

    In order to measure time, we need to create a timer and give it a function to execute and specify the relevant imports. After that, sort 100 times to get some data about the sorting times:

    def measure():
        timer = timeit.Timer('dosort()', 'from __main__ import dosort')
    
        return timer.timeit(10 ** 2)
  3. Build measurement time arrays.

    Build the measurement time arrays by appending times one by one:

    times = numpy.append(times, measure())
  4. Fit to nlogn.

    Fit the times to the theoretical model of nlogn. Because we are varying the array size as powers of two, this is easy:

    fit = numpy.polyfit(sizes * powersOf2, times, 1)

The following is the complete timing code:

import numpy
import timeit
import matplotlib.pyplot

# This program measures the performance of the NumPy sort function
# and plots time vs array size.
integers = []
def dosort():
    integers.sort()

def measure():
    timer = timeit.Timer('dosort()', 'from __main__ import dosort')

    return timer.timeit(10 ** 2)

powersOf2 = numpy.arange(0, 19)
sizes = 2 ** powersOf2

times = numpy.array([])

for size in sizes:
    integers = numpy.random.random_integers(1, 10 ** 6, size)
    times = numpy.append(times, measure())

fit = numpy.polyfit(sizes * powersOf2, times, 1)
print fit
matplotlib.pyplot.title("Sort array sizes vs execution times")
matplotlib.pyplot.xlabel("Size")
matplotlib.pyplot.ylabel("(s)")
matplotlib.pyplot.semilogx(sizes, times, 'ro')
matplotlib.pyplot.semilogx(sizes, numpy.polyval(fit, sizes * powersOf2))
matplotlib.pyplot.show()

The resulting plot for the running time versus array size is shown in the following screenshot:

How to do it...

How it works...

We measured the average running time of the NumPy sort function. The following functions were used in this recipe:

Function

Description

random_integers

Creates an array of random integers given a range for the values and array size.

append

Appends a value to a NumPy array.

polyfit

Fits data to a polynomial of a given degree.

polyval

Evaluates a polynomial and returns the corresponding value for a certain "x" value.

semilogx

Plots data using logarithmic scale on the X-axis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset