We want to show in this recipe how to solve a "real-world" task—how to use matplotlib to visualize our directory occupancy.
In this recipe, you will learn how to visualize a filesystem tree with relative sizes.
We all have big hard drives that sometimes contain stuff that we usually forget about. It would be nice to see what is inside such a directory, and what the biggest file inside it is.
Although there are many more sophisticated and elaborate software products for this job, we want to demonstrate how this is achievable using Python and matplotlib.
Let's perform the following steps:
draw()
, that does the plotting.import os import sys import matplotlib.pyplot as plt import matplotlib.cm as cm import numpy as np def build_folders(start_path): folders = [] for each in get_directories(start_path): size = get_size(each) if size >= 25 * 1024 * 1024: folders.append({'size' : size, 'path' : each}) for each in folders: print "Path: " + os.path.basename(each['path']) print "Size: " + str(each['size'] / 1024 / 1024) + " MB" return folders def get_size(path): assert path is not None total_size = 0 for dirpath, dirnames, filenames in os.walk(path): for f in filenames: fp = os.path.join(dirpath, f) try: size = os.path.getsize(fp) total_size += size #print "Size of '{0}' is {1}".format(fp, size) except OSError as err: print str(err) pass return total_size def get_directories(path): dirs = set() for dirpath, dirnames, filenames in os.walk(path): dirs = set([os.path.join(dirpath, x) for x in dirnames]) break # we just want the first one return dirs def draw(folders): """ Draw folder size for given folder""" figsize = (8, 8) # keep the figure square ldo, rup = 0.1, 0.8 # leftdown and right up normalized fig = plt.figure(figsize=figsize) ax = fig.add_axes([ldo, ldo, rup, rup], polar=True) # transform data x = [os.path.basename(x['path']) for x in folders] y = [y['size'] / 1024 / 1024 for y in folders] theta = np.arange(0.0, 2 * np.pi, 2 * np.pi / len(x)) radii = y bars = ax.bar(theta, radii) middle = 90/len(x) theta_ticks = [t*(180/np.pi)+middle for t in theta] lines, labels = plt.thetagrids(theta_ticks, labels=x, frac=0.5) for step, each in enumerate(labels): each.set_rotation(theta[step]*(180/np.pi)+ middle) each.set_fontsize(8) # configure bars colormap = lambda r:cm.Set2(r / len(x)) for r, each in zip(radii, bars): each.set_facecolor(colormap(r)) each.set_alpha(0.5) plt.show()
if __name__ == '__main__': if len(sys.argv) is not 2: print "ERROR: Please supply path to folder." sys.exit(-1) start_path = sys.argv[1] if not os.path.exists(start_path): print "ERROR: Path must exits." sys.exit(-1) folders = build_folders(start_path) if len(folders) < 1: print "ERROR: Path does not contain any folders." sys.exit(-1) draw(folders)
$ pythonch04_rec11_filesystem.py /usr/
We will start from the bottom of the code, after if __name__ == '__main__'
because this is the place where our program starts.
Using the module sys
, we pick up the command-line arguments; they represent the path to the directory we want to visualize.
The function build_folders
builds the list of dictionaries, each containing the size and path that it found inside the given start_path
. This function calls get_directories
, which returns a list of all the subdirectories in start_path
. Later, for each directory found, we calculated the sizes in bytes using the get_size
function.
For debugging purposes, we print our dictionary so that we are able to compare the figure against what our data looks like.
After we have built the folders as a list of dictionaries, we pass them to a function, draw
, that performs all the work of transforming the data to the right dimensions (here, we are using the polar coordinate system), constructing the polar figure, and drawing all the bars, ticks, and labels.
Strictly speaking, we should divide this job into smaller functions, especially if this code is to be further developed.