Credit: Frank Fejes
You need to compute the total size of a directory (or set of directories) in a way that works under both Windows and Unix-like platforms.
There are easier platform-dependent solutions, such as Unix’s du, but Python also makes it quite feasible to have a cross-platform solution:
import os
from os.path import *
class DirSizeError(Exception): pass
def dir_size(start, follow_links=0, start_depth=0, max_depth=0, skip_errs=0):
# Get a list of all names of files and subdirectories in directory start
try: dir_list = os.listdir(start)
except:
# If start is a directory, we probably have permission problems
if os.path.isdir(start):
raise DirSizeError('Cannot list directory %s'%start)
else: # otherwise, just re-raise the error so that it propagates
raise
total = 0L
for item in dir_list:
# Get statistics on each item--file and subdirectory--of start
path = join(start, item)
try: stats = os.stat(path)
except:
if not skip_errs:
raise DirSizeError('Cannot stat %s'%path)
# The size in bytes is in the seventh item of the stats tuple, so:
total += stats[6]
# recursive descent if warranted
if isdir(path) and (follow_links or not islink(path)):
bytes = dir_size(path, follow_links, start_depth+1, max_depth)
total += bytes
if max_depth and (start_depth < max_depth):
print_path(path, bytes)
return total
def print_path(path, bytes, units='b'):
if units == 'k':
print '%-8ld%s' % (bytes / 1024, path)
elif units == 'm':
print '%-5ld%s' % (bytes / 1024 / 1024, path)
else:
print '%-11ld%s' % (bytes, path)
def usage (name):
print "usage: %s [-bkLm] [-d depth] directory [directory...]" % name
print ' -b Display in Bytes (default)'
print ' -k Display in Kilobytes'
print ' -m Display in Megabytes'
print ' -L Follow symbolic links (meaningful on Unix only)'
print ' -d, --depth # of directories down to print (default = 0)'
if _ _name_ _=='_ _main_ _':
# When used as a script:
import string, sys, getopt
units = 'b'
follow_links = 0
depth = 0
try:
opts, args = getopt.getopt(sys.argv[1:], "bkLmd:", ["depth="])
except getopt.GetoptError:
usage(sys.argv[0])
sys.exit(1)
for o, a in opts:
if o == '-b': units = 'b'
elif o == '-k': units = 'k'
elif o == '-L': follow_links = 1
elif o == '-m': units = 'm'
elif o in ('-d', '--depth'):
try: depth = int(a)
except:
print "Not a valid integer: (%s)" % a
usage(sys.argv[0])
sys.exit(1)
if len(args) < 1:
print "No directories specified"
usage(sys.argv[0])
sys.exit(1)
else:
paths = args
for path in paths:
try: bytes = dir_size(path, follow_links, 0, depth)
except DirSizeError, x: print "Error:", x
else: print_path(path, bytes)
Unix-like platforms have the
du command, but that
doesn’t help when you need to get information about
disk-space usage in a cross-platform way. This recipe has been tested
under both Windows and Unix, although it
is most useful under Windows, where the normal way of getting this
information requires using a GUI. In any case, the
recipe’s code can be used both as a module (in which
case you’ll normally call only the
dir_size
function) or as a command-line script.
Typical use as a script is:
C:> python dir_size.py "c:Program Files"
This will give you some idea of where all your disk space has gone. To help you narrow the search, you can, for example, display each subdirectory:
C:> python dir_size.py --depth=1 "c:Program Files"
The recipe’s operation is based on recursive
descent.
os.listdir
provides a list of names of all the files and subdirectories of a
given directory. If dir_size
finds a subdirectory,
it calls itself recursively. An alternative architecture might be
based on
os.path.walk
, which
handles the recursion on our behalf and just does callbacks to a
function we specify, for each subdirectory it visits. However, here
we need to be able to control the depth of descent (e.g., to allow
the useful --depth
command-line option, which
turns into the max_depth
argument of the
dir_size
function). This control is easier to
attain when we administer the recursion directly, rather than letting
os.path.walk
handle it on our behalf.