An introduction to tasks in jug

Tasks are the basic building blocks of jug. A task is composed of a function and values for its arguments. Consider this simple example:

def double(x): 
    return 2*x 

In this chapter, the code examples will generally have to be typed in script files. Commands that should be typed at the shell will be indicated by prefixing them with $.

A task could be call double with argument 3. Another task would be call double with argument 642.34. Using jug, we can build these tasks as follows:

from jug import Task 
t1 = Task(double, 3) 
t2 = Task(double, 642.34) 

Save this to a file called jugfile.py (which is just a regular Python file). Now, we can run jug execute to run the tasks. This is something you type on the command line, not at the Python prompt, so we show it marked with a dollar sign ($):

$ jug execute 

You will also get some feedback on the tasks (jug will say that two tasks named double were run). Run jug execute again and it will tell you that it did nothing! It does not need to. In this case, we gained little, but if the tasks took a long time to compute, it would have been very useful.

You may notice that a new directory also appeared on your hard drive named jugfile.jugdata, with a few weirdly named files. This is the memorization cache. If you remove it, jug execute will run all your tasks again.

Often, it's good to distinguish between pure functions, which simply take their inputs and return a result from more general functions that can perform actions (such as reading from files, writing to files, accessing global variables, modifying their arguments, or anything that the language allows). Some programming languages, such as Haskell, even distinguish pure from impure functions in the type system.

With jug, your tasks do not need to be perfectly pure. It's even recommended that you use tasks to read in your data or write out your results. However, accessing and modifying global variables will not work well: the tasks may be run in any order on different processors. The exceptions are global constants, but even this may confuse the memorization system (if the value is changed between runs). Similarly, you should not modify the input values. Jug has a debug mode (use jug execute --debug), which slows down your computation, but will give you useful error messages if you make this sort of mistake.

The preceding code works, but is a bit cumbersome. You are always repeating the Task(function, argument) construct. Using a bit of Python magic, we can make the code even more natural as follows:

from jug import TaskGenerator 
from time import sleep 
 
@TaskGenerator 
def double(x): 
    sleep(4) 
    return 2*x 
 
@TaskGenerator 
def add(a, b): 
    return a + b 
 
@TaskGenerator 
def print_final_result(oname, value): 
    with open(oname, 'w') as output: 
        output.write('Final result: {}n'.format(value)) 
 
 
y = double(2) 
z = double(y) 
y2 = double(7) 
z2 = double(y2) 
print_final_result('output.txt', add(z,z2)) 

Except for the use of TaskGenerator, the preceding code could be a standard Python file! However, using TaskGenerator, it actually creates a series of tasks, and it is now possible to run it in a way that takes advantage of multiple processors. Behind the scenes, the decorator transforms your functions so that they do not actually execute when called, but create a Task object. We also take advantage of the fact that we can pass tasks to other tasks and this results in a dependency being generated.

You may have noticed that we added a few sleep(4) calls in the preceding code. This simulates running a long computation. Otherwise, this example is so fast that there is no point in using multiple processors.

We start by running jug status, which results in the output shown in the following screenshot:

Now, we start two processes simultaneously (using the & operator, which is the traditional Unix way of starting processes in the background):

$ jug execute &
$ jug execute &  

Now, we run jug status again:

We can see that the two initial double operators are running at the same time. After about 8 seconds, the whole process will finish and the output.txt file will be written.

By the way, if your file was called anything other than jugfile.py, you would then have to specify it explicitly on the command line. For example, if your file was called analysis.py, you would run the following command:

$ jug execute analysis.py  

This is the only disadvantage of not using the name jugfile.py. So, feel free to use more meaningful names.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset