Reusing partial results

For example, let's say you want to add a new feature (or even a set of features). As we saw in Chapter 12, Computer Vision, this is easy to do by changing the feature computation code. However, this would imply recomputing all the features again, which is wasteful, particularly if you want to test new features and techniques quickly.

We now add a set of features, that is, another type of texture feature called linear binary patterns. This is implemented in mahotas; we just need to call a function, but we wrap it in TaskGenerator:

@TaskGenerator 
def compute_lbp(fname): 
    from mahotas.features import lbp 
    imc = mh.imread(fname) 
    im = mh.colors.rgb2grey(imc) 
    # The parameters 'radius' and 'points' are set to typical values 
    # check the documentation for their exact meaning 
    return lbp(im, radius=8, points=6) 

We replace the previous loop with an extra function call:

lbps = [] 
for fname in sorted(images): 
    # the rest of the loop as before 
    lbps.append(compute_lbp(fname)) 
lbps = to_array(lbps) 

We call accuracy with these newer features:

scores_lbps = accuracy(lbps, labels) 
combined_all = hstack([chists, haralicks, lbps]) 
scores_combined_all = accuracy(combined_all, labels) 
 
print_results([ 
        ('base', scores_base), 
        ('chists', scores_chist), 
        ('lbps', scores_lbps), 
        ('combined' , scores_combined), 
        ('combined_all' , scores_combined_all), 
        ]) 

Now, when you run jug execute again, the new features will be computed, but the old features will be loaded from the cache. This is when jug can be very powerful. It ensures that you always get the results you want while saving you from unnecessarily recomputing cached results. You will also see that adding this feature set improves on the previous methods.

Not all features of jug will be mentioned in this chapter, but here is a summary of the most potentially interesting ones we didn't cover in the main text:

  • jug invalidate: This declares that all results from a given function should be considered invalid and in need of recomputation. This will also recompute any downstream computation, which depended (even indirectly) on the invalidated results
  • jug status --cache: If jug status takes too long, you can use the --cache flag to cache the status and speed it up. Note that this will not detect any changes to the jugfile, but you can always use --cache --clear to remove the cache and start again
  • jug cleanup: This removes any extra files in the memorization cache. This is a garbage collection operation
There are other, more advanced features, which allow you to look at values that have been computed inside the jugfile. Read up on features such as barriers in the jug documentation online at http://jug.rtfd.org.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset