Chapter 13. Controlling Execution

Python directly exposes, supports, and documents many of its internal mechanisms. This may help you understand Python at an advanced level, and lets you hook your own code into such Python mechanisms, controlling them to some extent. For example, “Python built-ins” covers the way Python arranges for built-ins to be visible. This chapter covers some other advanced Python techniques; Chapter 16 covers issues specific to testing, debugging, and profiling. Other issues related to controlling execution are about using multiple threads and processes, covered in Chapter 14, and about asynchronous processing, covered in Chapter 18.

Site and User Customization

Python provides a specific “hook” to let each site customize some aspects of Python’s behavior at the start of each run. Customization by each single user is not enabled by default, but Python specifies how programs that want to run user-provided code at startup can explicitly support such customization (a rarely used facility).

The site and sitecustomize Modules

Python loads the standard module site just before the main script. If Python is run with option -S, Python does not load site. -S allows faster startup but saddles the main script with initialization chores. site’s tasks are:

  • Putting sys.path in standard form (absolute paths, no duplicates).

  • Interpreting each .pth file found in the Python home directory, adding entries to sys.path, and/or importing modules, as each .pth file indicates.

  • Adding built-ins used to print information in interactive sessions (exit, copyright, credits, license, and quit).

  • In v2 only, setting the default Unicode encoding to 'ascii' (in v3, the default encoding is built-in as 'utf-8' ). v2’s site source code includes two blocks, each guarded by if 0:, one to set the default encoding to be locale-dependent, and the other to completely disable any default encoding between Unicode and plain strings. You may optionally edit site.py to select either block, but this is not a good idea, even though a comment in site.py says “if you’re willing to experiment, you can change this.”

  • In v2 only, trying to import sitecustomize (should import sitecustomize raise an ImportError exception, site catches and ignores it). sitecustomize is the module that each site’s installation can optionally use for further site-specific customization beyond site’s tasks. It is best not to edit site.py, since any Python upgrade or reinstallation would overwrite customizations.

  • After sitecustomize is done, removing the attribute sys.setdefaultencoding from the sys module, so that the default encoding can’t be changed.

User Customization

Each interactive Python interpreter session starts by running the script named by the environment variable PYTHONSTARTUP. Outside of interactive interpreter sessions, there is no automatic per-user customization. To request per-user customization, a Python (v2 only) main script can explicitly import user. The v2 standard library module user, when loaded, first determines the user’s home directory, as indicated by the environment variable HOME (or, failing that, HOMEPATH, possibly preceded by HOMEDRIVE on Windows systems only). If the environment does not indicate a home directory, user uses the current directory. If the user module finds a file named .pythonrc.py in the indicated directory, user executes that file, with the built-in Python v2 function execfile, in user’s own global namespace.

Scripts that don’t import user do not run .pythonrc.py; no Python v3 script does, either, since the user module is not defined in v3. Of course, any given script is free to arrange other specific ways to run whatever user-specific startup module it requires. Such application-specific arrangements, even in v2, are more common than importing user. A generic .pythonrc.py, as loaded via import user, needs to be usable with any application that loads it. Specialized, application-specific startup files only need to follow whatever convention a specific application documents.

For example, your application MyApp.py could document that it looks for a file named .myapprc.py in the user’s home directory, as indicated by the environment variable HOME, and loads it in the application’s main script’s global namespace. You could then have the following code in your main script:

import os
homedir = os.environ.get('HOME')
if homedir is not None:
    userscript = os.path.join(homedir, '.myapprc.py')
    if os.path.isfile(userscript):
        with open(userscript) as f:
            exec(f.read())

In this case, the .myapprc.py user customization script, if present, has to deal only with MyApp-specific user customization tasks. This approach is better than relying on the user module, and works just as well in v3 as it does in v2.

Termination Functions

The atexit module lets you register termination functions (i.e., functions to be called at program termination, “last in, first out”). Termination functions are similar to clean-up handlers established by try/finally or with. However, termination functions are globally registered and get called at the end of the whole program, while clean-up handlers are established lexically and get called at the end of a specific try clause or with statement. Termination functions and clean-up handlers are called whether the program terminates normally or abnormally, but not when the program ends by calling os._exit (which is why you normally call sys.exit instead). The atexit module supplies a function called register:

register

register(func,*args,**kwds)

Ensures that func(*args,**kwds) is called at program termination time.

Dynamic Execution and exec

Python’s exec statement (built-in function, in v3) can execute code that you read, generate, or otherwise obtain during a program’s run. exec dynamically executes a statement or a suite of statements. In v2, exec is a simple keyword statement with the following syntax:

exec code[ in globals[,locals]]

code can be a string, an open file-like object, or a code object. globals and locals are mappings. In v3, exec is a built-in function with the syntax:

exec(code, globals=None, locals=None)

code can be a string, bytes, or code object. globals is a dict; locals, any mapping.

If both globals are locals are present, they are the global and local namespaces in which code runs. If only globals is present, exec uses globals as both namespaces. If neither is present, code runs in the current scope.

Never run exec in the current scope

Running exec in the current scope is a very bad idea: it can bind, rebind, or unbind any global name. To keep things under control, use exec, if at all, only with specific, explicit dictionaries.

Avoiding exec

A frequently asked question about Python is “How do I set a variable whose name I just read or built?” Literally, for a global variable, exec allows this, but it’s a bad idea. For example, if the name of the variable is in varname, you might think to use:

exec(varname + '= 23')

Don’t do this. An exec like this in current scope makes you lose control of your namespace, leading to bugs that are extremely hard to find, and making your program unfathomably difficult to understand. Keep the “variables” that you need to set, not as variables, but as entries in a dictionary, say mydict. You could then use:

exec(varname+'=23', mydict)

While this is not quite as terrible as the previous example, it is still a bad idea. Keeping such “variables” as dictionary entries means that you don’t have any need to use exec to set them. Just code:

mydict[varname] = 23

This way, your program is clearer, direct, elegant, and faster. There are some valid uses of exec, but they are extremely rare: just use explicit dictionaries instead.

Strive to avoid exec

Use exec only when it’s really indispensable, which is extremely rare. Most often, it’s best to avoid exec and choose more specific, well-controlled mechanisms: exec weakens your control of your code’s namespace, can damage your program’s performance, and exposes you to numerous hard-to-find bugs and huge security risks.

Expressions

exec can execute an expression, because any expression is also a valid statement (called an expression statement). However, Python ignores the value returned by an expression statement. To evaluate an expression and obtain the expression’s value, see the built-in function eval, covered in Table 7-2.

Compile and Code Objects

To make a code object to use with exec, call the built-in function compile with the last argument set to 'exec' (as covered in Table 7-2).

A code object c exposes many interesting read-only attributes whose names all start with 'co_', such as:

co_argcount

Number of parameters of the function of which c is the code (0 when c is not the code object of a function, but rather is built directly by compile )

co_code

A bytestring with c’s bytecode

co_consts

The tuple of constants used in c

co_filename

The name of the file c was compiled from (the string that is the second argument to compile, when c was built that way)

co_firstlinenumber

The initial line number (within the file named by co_filename) of the source code that was compiled to produce c, if c was built by compiling from a file

co_name

The name of the function of which c is the code ('<module>' when c is not the code object of a function but rather is built directly by compile)

co_names

The tuple of all identifiers used within c

co_varnames

The tuple of local variables’ identifiers in c, starting with parameter names

Most of these attributes are useful only for debugging purposes, but some may help advanced introspection, as exemplified later in this section.

If you start with a string that holds some statements, first use compile on the string, then call exec on the resulting code object—that’s better than giving exec the string to compile and execute. This separation lets you check for syntax errors separately from execution-time errors. You can often arrange things so that the string is compiled once and the code object executes repeatedly, which speeds things up. eval can also benefit from such separation. Moreover, the compile step is intrinsically safe (both exec and eval are extremely risky if you execute them on code that you don’t trust), and you may be able to perform some checks on the code object, before it executes, to lessen the risk (though never truly down to zero).

A code object has a read-only attribute co_names, which is the tuple of the names used in the code. For example, say that you want the user to enter an expression that contains only literal constants and operators—no function calls or other names. Before evaluating the expression, you can check that the string the user entered satisfies these constraints:

def safer_eval(s):
    code = compile(s, '<user-entered string>', 'eval')
    if code.co_names:
        raise ValueError('No names {!r} allowed in expression {!r}'
                         .format(code.co_names, s))
    return eval(code)

This function safer_eval evaluates the expression passed in as argument s only when the string is a syntactically valid expression (otherwise, compile raises SyntaxError) and contains no names at all (otherwise, safer_eval explicitly raises ValueError). (This is similar to the standard library function ast.literal_eval, covered in “Standard Input”, but a bit more powerful, since it does allow the use of operators.)

Knowing what names the code is about to access may sometimes help you optimize the preparation of the dictionary that you need to pass to exec or eval as the namespace. Since you need to provide values only for those names, you may save work by not preparing other entries. For example, say that your application dynamically accepts code from the user with the convention that variable names starting with data_ refer to files residing in the subdirectory data that user-written code doesn’t need to read explicitly. User-written code may in turn compute and leave results in global variables with names starting with result_, which your application writes back as files in subdirectory data. Thanks to this convention, you may later move the data elsewhere (e.g., to BLOBs in a database instead of files in a subdirectory), and user-written code won’t be affected. Here’s how you might implement these conventions efficiently (in v3; in v2, use exec user_code in datadict instead of exec(user_code, datadict)):

def exec_with_data(user_code_string):
    user_code = compile(user_code_string, '<user code>', 'exec')
    datadict = {}
    for name in user_code.co_names:
        if name.startswith('data_'):
            with open('data/{}'.format(name[5:]), 'rb') as datafile:
                datadict[name] = datafile.read()
    exec(user_code, datadict)
    for name in datadict:
        if name.startswith('result_'):
            with open('data/{}'.format(name[7:]), 'wb') as datafile:
                datafile.write(datadict[name])

Never exec or eval Untrusted Code

Old versions of Python tried to supply tools to ameliorate the risks of using exec and eval, under the heading of “restricted execution,” but those tools were never entirely secure against the ingenuity of able hackers, and current versions of Python have therefore dropped them. If you need to ward against such attacks, take advantage of your operating system’s protection mechanisms: run untrusted code in a separate process, with privileges as restricted as you can possibly make them (study the mechanisms that your OS supplies for the purpose, such as chroot, setuid, and jail), or run untrusted code in a separate, highly constrained virtual machine. To guard against “denial of service” attacks, have the main process monitor the separate one and terminate the latter if and when resource consumption becomes excessive. Processes are covered in “Running Other Programs”.

exec and eval are unsafe with untrusted code

The function exec_with_data is not at all safe against untrusted code: if you pass it, as the argument user_code_string, some string obtained in a way that you cannot entirely trust, there is essentially no limit to the amount of damage it might do. This is unfortunately true of just about any use of both exec and eval, except for those rare cases in which you can set very strict and checkable limits on the code to execute or evaluate, as was the case for the function safer_eval.

Internal Types

Some of the internal Python objects in this section are hard to use. Using such objects correctly and to good effect requires some study of your Python implementation’s own C (or Java, or C#) sources. Such black magic is rarely needed, except to build general-purpose development tools, and similar wizardly tasks. Once you do understand things in depth, Python empowers you to exert control if and when needed. Since Python exposes many kinds of internal objects to your Python code, you can exert that control by coding in Python, even when an understanding of C (or Java, or C#) is needed to read Python’s sources to understand what’s going on.

Type Objects

The built-in type named type acts as a callable factory, returning objects that are types. Type objects don’t have to support any special operations except equality comparison and representation as strings. However, most type objects are callable and return new instances of the type when called. In particular, built-in types such as int, float, list, str, tuple, set, and dict all work this way; specifically, when called without arguments, they return a new empty instance or, for numbers, one that equals 0. The attributes of the types module are the built-in types, each with one or more names. For example, in v2, types.DictType and types.DictionaryType both refer to type({}), also known as dict. In v3, types only supplies names for built-in types that don’t already have a built-in name, as covered in Chapter 7. Besides being callable to generate instances, many type objects are also useful because you can inherit from them, as covered in “Classes and Instances”.

The Code Object Type

Besides using the built-in function compile, you can get a code object via the __code__ attribute of a function or method object. (For the attributes of code objects, see “Compile and Code Objects”.) Code objects are not callable, but you can rebind the __code__ attribute of a function object with the right number of parameters in order to wrap a code object into callable form. For example:

def g(x): print('g', x)
code_object = g.__code__
def f(x): pass
f.__code__ = code_object
f(23)     # prints: g 23

Code objects that have no parameters can also be used with exec or eval. To create a new object, call the type object you want to instantiate. However, directly creating code objects requires many parameters; see Stack Overflow’s nonofficial docs on how to do it, but, almost always, you’re better off calling compile instead.

The frame Type

The function _getframe in the module sys returns a frame object from Python’s call stack. A frame object has attributes giving information about the code executing in the frame and the execution state. The modules traceback and inspect help you access and display such information, particularly when an exception is being handled. Chapter 16 provides more information about frames and tracebacks, and covers the module inspect, which is the best way to perform such introspection.

Garbage Collection

Python’s garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachable—that is, when no chain of references can reach x by starting from a local variable of a function instance that is executing, nor from a global variable of a loaded module. Normally, an object x becomes unreachable when there are no references at all to x. In addition, a group of objects can be unreachable when they reference each other but no global or local variables reference any of them, even indirectly (such a situation is known as a mutual reference loop).

Classic Python keeps with each object x a count, known as a reference count, of how many references to x are outstanding. When x’s reference count drops to 0, CPython immediately collects x. The function getrefcount of the module sys accepts any object and returns its reference count (at least 1, since getrefcount itself has a reference to the object it’s examining). Other versions of Python, such as Jython or IronPython, rely on other garbage-collection mechanisms supplied by the platform they run on (e.g., the JVM or the MSCLR). The modules gc and weakref therefore apply only to CPython.

When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x.__del__()) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable.

The gc Module

The gc module exposes the functionality of Python’s garbage collector. gc deals only with unreachable objects that are part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, no outside references to any one of the set of mutually referencing objects exist any longer. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage-collectable. Looking for such cyclic garbage loops takes time, which is why the module gc exists: to help you control whether and when your program spends that time. The functionality of “cyclic garbage collection,” by default, is enabled with some reasonable default parameters: however, by importing the gc module and calling its functions, you may choose to disable the functionality, change its parameters, and/or find out exactly what’s going on in this respect.

gc exposes functions you can use to help you keep cyclic garbage-collection times under control. These functions can sometimes let you track down a memory leak—objects that are not getting collected even though there should be no more references to them—by helping you discover what other objects are in fact holding on to references to them:

collect

collect()

Forces a full cyclic garbage collection run to happen immediately.

disable

disable()

Suspends automatic, periodic cyclic garbage collection.

enable

enable()

Reenables periodic cyclic garbage collection previously suspended with disable.

garbage

A read-only attribute that lists the unreachable but uncollectable objects. This happens when any object in a cyclic garbage loop has a __del__ special method, as there may be no demonstrably safe order for Python to finalize such objects.

get_debug

get_debug()

Returns an int bit string, the garbage-collection debug flags set with set_debug.

get_objects

get_objects()

Returns a list of all objects currently tracked by the cyclic garbage collector.

get_referrers

get_referrers(*objs)

Returns a list of all container objects, currently tracked by the cyclic garbage collector, that refer to any one or more of the arguments.

get_threshold

get_threshold()

Returns a three-item tuple (thresh0, thresh1, thresh2), the garbage-collection thresholds set with set_threshold.

isenabled

isenabled()

Returns True when cyclic garbage collection is currently enabled. Returns False when collection is currently disabled.

set_debug

set_debug(flags)

Sets debugging flags for garbage collection. flags is an int, interpreted as a bit string, built by ORing (with the bitwise-OR operator |) zero or more constants supplied by the module gc:

DEBUG_COLLECTABLE

Prints information on collectable objects found during collection

DEBUG_INSTANCES

If DEBUG_COLLECTABLE or DEBUG_UNCOLLECTABLE is also set, prints information on objects found during collection that are instances of old-style classes (v2 only)

DEBUG_LEAK

The set of debugging flags that make the garbage collector print all information that can help you diagnose memory leaks; same as the bitwise-OR of all other constants (except DEBUG_STATS, which serves a different purpose)

DEBUG_OBJECTS

If DEBUG_COLLECTABLE or DEBUG_UNCOLLECTABLE is also set, prints information on objects found during collection that are not instances of old-style classes (v2 only)

DEBUG_SAVEALL

Saves all collectable objects to the list gc.garbage (where uncollectable ones are also always saved) to help you diagnose leaks

DEBUG_STATS

Prints statistics during collection to help you tune the thresholds

DEBUG_UNCOLLECTABLE

Prints information on uncollectable objects found during collection

set_threshold

set_threshold(thresh0[,thresh1[,thresh2]])

Sets thresholds that control how often cyclic garbage-collection cycles run. A thresh0 of 0 disables garbage collection. Garbage collection is an advanced, specialized topic, and the details of the generational garbage-collection approach used in Python (and consequently the detailed meanings of these thresholds) are beyond the scope of this book; see the online docs for some details.

When you know there are no cyclic garbage loops in your program, or when you can’t afford the delay of cyclic garbage collection at some crucial time, suspend automatic garbage collection by calling gc.disable(). You can enable collection again later by calling gc.enable(). You can test whether automatic collection is currently enabled by calling gc.isenabled(), which returns True or False. To control when time is spent collecting, you can call gc.collect() to force a full cyclic collection run to happen immediately. To wrap some time-critical code:

import gc
gc_was_enabled = gc.isenabled()
if gc_was_enabled:
    gc.collect()
    gc.disable()
# insert some time-critical code here
if gc_was_enabled:
    gc.enable()

Other functionality in the module gc is more advanced and rarely used, and can be grouped into two areas. The functions get_threshold and set_threshold and debug flag DEBUG_STATS help you fine-tune garbage collection to optimize your program’s performance. The rest of gc’s functionality can help you diagnose memory leaks in your program. While gc itself can automatically fix many leaks (as long as you avoid defining __del__ in your classes, since the existence of __del__ can block cyclic garbage collection), your program runs faster if it avoids creating cyclic garbage in the first place.

The weakref Module

Careful design can often avoid reference loops. However, at times you need objects to know about each other, and avoiding mutual references would distort and complicate your design. For example, a container has references to its items, yet it can often be useful for an object to know about a container holding it. The result is a reference loop: due to the mutual references, the container and items keep each other alive, even when all other objects forget about them. Weak references solve this problem by allowing objects to reference others without keeping them alive.

A weak reference is a special object w that refers to some other object x without incrementing x’s reference count. When x’s reference count goes down to 0, Python finalizes and collects x, then informs w of x’s demise. Weak reference w can now either disappear or get marked as invalid in a controlled way. At any time, a given w refers to either the same object x as when w was created, or to nothing at all; a weak reference is never retargeted. Not all types of objects support being the target x of a weak reference w, but classes, instances, and functions do.

The weakref module exposes functions and types to create and manage weak references:

getweakrefcount

getweakrefcount(x)

Returns len(getweakrefs(x)).

getweakrefs

getweakrefs(x)

Returns a list of all weak references and proxies whose target is x.

proxy

proxy(x[,f])

Returns a weak proxy p of type ProxyType (CallableProxyType when x is callable) with object x as the target. In most contexts, using p is just like using x, except that when you use p after x has been deleted, Python raises ReferenceError. p is not hashable (thus, you cannot use p as a dictionary key), even when x is. When f is present, it must be callable with one argument and is the finalization callback for p (i.e., right before finalizing x, Python calls f(p)). f executes right after x is no longer reachable from p.

ref

ref(x[,f])

Returns a weak reference w of type ReferenceType with object x as the target. w is callable without arguments: calling w() returns x when x is still alive; otherwise, w() returns None. w is hashable when x is hashable. You can compare weak references for equality (==, !=), but not for order (<, >, <=, >=). Two weak references x and y are equal when their targets are alive and equal, or when x is y. When f is present, it must be callable with one argument and is the finalization callback for w (i.e., right before finalizing x, Python calls f(w)). f executes right after x is no longer reachable from w.

WeakKeyDictionary

class WeakKeyDictionary(adict={})

A WeakKeyDictionary d is a mapping weakly referencing its keys. When the reference count of a key k in d goes to 0, item d[k] disappears. adict is used to initialize the mapping.

WeakValueDictionary

class WeakValueDictionary(adict={})

A WeakValueDictionary d is a mapping weakly referencing its values. When the reference count of a value v in d goes to 0, all items of d such that d[k] is v disappear. adict is used to initialize the mapping.

WeakKeyDictionary lets you noninvasively associate additional data with some hashable objects, with no change to the objects. WeakValueDictionary lets you non-invasively record transient associations between objects, and build caches. In each case, use a weak mapping, rather than a dict, to ensure that an object that is otherwise garbage-collectable is not kept alive just by being used in a mapping.

A typical example is a class that keeps track of its instances, but does not keep them alive just in order to keep track of them:

import weakref
class Tracking(object):
    _instances_dict = weakref.WeakValueDictionary()
    def __init__(self):
        Tracking._instances_dict[id(self)] = self
    @classmethod
    def instances(cls): return cls._instances_dict.values()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset