CPython runs on a portable, C-coded virtual machine. Python’s built-in objects—such as numbers, sequences, dictionaries, sets, and files—are coded in C, as are several modules in Python’s standard library. Modern platforms support dynamic-load libraries, with file extensions such as .dll on Windows, .so on Linux, and .dylib on Mac: building Python produces such binary files. You can code your own extension modules for Python in C, using the Python C API covered in this chapter, to produce and deploy dynamic libraries that Python scripts and interactive sessions can later use with the import
statement, covered in “The import Statement”.
Extending Python means building modules that Python code can import
to access the features the modules supply. Embedding Python means executing Python code from an application coded in another language. For such execution to be useful, Python code must in turn be able to access some of your application’s functionality. In practice, therefore, embedding implies some extending, as well as a few embedding-specific operations. The three main reasons for wishing to extend Python can be summarized as follows:
Reimplementing some functionality (originally coded in Python) in a lower-level language, hoping to get better performance
Letting Python code access some existing functionality supplied by libraries coded in (or, at any rate, callable from) lower-level languages
Letting Python code access some existing functionality of an application that is in the process of embedding Python as the application’s scripting language
Embedding and extending are covered in Python’s online documentation; there, you can find an in-depth tutorial and an extensive reference manual. Many details are best studied in Python’s extensively documented C sources. Download Python’s source distribution and study the sources of Python’s core, C-coded extension modules, and the example extensions supplied for study purposes.
This chapter covers the basics of extending and embedding Python with C. It also mentions, but does not cover in depth, other ways to extend Python. Do notice that, as the online docs put it, several excellent third-party modules (such as Cython, covered in “Cython”, CFFI, mentioned at “Extending Python Without Python’s C API”, and Numba and SWIG, not covered in this book) “offer both simpler and more sophisticated approaches to creating C and C++ extensions for Python.”
To extend or embed Python with the C API (or with most of the above-mentioned third-party modules), you need to know the C programming language. We do not cover C in this book; there are many good resources to learn C on the web (just make sure not to confuse C with its subtly different close relative C++, which is a different although similar language). In the rest of this chapter, we assume that you know C.
A Python extension module named x
resides in a dynamic library with the same filename (x.pyd on Windows; x.so on most Unix-like platforms) in an appropriate directory (often the site-packages subdirectory of the Python library directory). You generally build the x
extension module from a C source file x.c (or, more conventionally, xmodule.c) whose overall structure is:
#
include <Python.h>
/* omitted: the body of the
x
module */
PyMODINIT_FUNC
PyInit_
x
(
void
)
{
/* omitted: the code that initializes the module named
x
*/
}
In v2, the module initialization function’s name is initx
; use C preprocessor constructs such as #if PY_MAJOR_VERSION >= 3
if you’re coding extensions meant to compile for both v2 and v3. For example, the module initialization function, for such a portable extension, might start:
PyMODINIT_FUNC
#if PY_MAJOR_VERSION >= 3
PyInit_x
(
void
)
#else
initx
(
void
)
#endif
{
When you have built and installed the extension module, a Python statement import
x
loads the dynamic library, then locates and calls the module initialization function, which must do all that is needed to initialize the module object named x
.
To build and install a C-coded Python extension module, it’s simplest and most productive to use the distribution utilities, distutils
(or its improved third-party variant, setuptools
). In the same directory as x.c, place a file named setup.py that contains the following statements:
from
setuptools
import
setup
,
Extension
setup
(
name
=
'
x
'
,
ext_modules
=
[
Extension
(
'
x
'
,
sources
=
[
'
x
.c
'
]
)
]
)
From a shell prompt in this directory, you can now run:
C:>
python setup.py install
to build the module and install it so that it becomes usable in your Python installation. (You’ll normally want to do this within a virtual environment, with venv
, as covered in “Python Environments”, to avoid affecting the global state of your Python installation; however, for simplicity, we omit that step in this chapter.)
setuptools
performs compilation and linking steps, with the right compiler and linker commands and flags, and copies the resulting dynamic library into the right directory, dependent on your Python installation (depending on that installation’s details, you may need to have administrator or superuser privileges for the installation; for example, on a Mac or Linux, you may run sudo python setup.py install
, although using venv
instead is more likely to be what serves you best). Your Python code (running in the appropriate virtual environment, if needed) can then access the resulting module with the statement import
x
.
To compile C-coded extensions to Python, you normally need the same C compiler used to build the Python version you want to extend. For most Linux platforms, this means the free gcc compiler that normally comes with your platform or can be freely downloaded for it. (You might consider clang
, widely reputed to offer better error messages.) On the Macintosh, gcc (actually a frontend to clang
) comes with Apple’s free XCode (AKA Developer Tools) integrated development environment (IDE), which you may download and install separately from Apple’s App Store.
For Windows, you ideally need the Microsoft product known as VS2015 for v3, VS2010 for v2. However, other versions of Microsoft Visual Studio may also work.
In general, a C-coded extension compiled to run with one version of Python has not been guaranteed to run with another. For example, a version compiled for Python 3.4 is only certain to run with 3.4, not with 3.3 or 3.5. On Windows, you cannot even try to run an extension with a different version of Python; elsewhere, such as on Linux or macOS, a given extension may happen to work on more than one version of Python, but you may get a warning when the module is imported, and the prudent course is to heed the warning: recompile the extension appropriately. (Since Python 3.5, you should be able to compile extensions forward-compatibly.)
At a C-source level, on the other hand, compatibility is almost always preserved within a major version (though not between v2 and v3).
Your C function PyInit_
x
generally has the following overall structure (from now on, we chiefly cover v3; see the v2 tutorial and reference for slight differences):
PyMODINIT_FUNC
PyInit_
x
(
void
)
{
PyObject
*
m
=
PyModule_Create
(
&
x_module
)
;
// x_module is the instance of struct PyModuleDef describing the
// module and in particular connecting to its methods (functions)
// then: calls to PyModule_AddObject(
m
, "
somename
",
someobj
)
// to add exceptions or other classes, and module constants.
// And at last, when all done:
return
m
;
}
More details are covered in “The Initialization Module”. x_module
is a struct like:
static
struct
PyModuleDef
x_module
=
{
PyModuleDef_HEAD_INIT
,
"x"
,
/* the name of the module */
x_doc
,
/* the module's docstring, may be NULL */
-
1
,
/* size of per-interpreter state of the module, or -1
if the module keeps state in global variables. */
x_methods
/* array of the module's method definitions */
};
and, within it, x_methods
is an array of PyMethodDef
structs. Each PyMethodDef
struct in the x_methods
array describes a C function that your module x
makes available to Python code that imports x
. Each such C function has the following overall structure:
static
PyObject
*
func_with_named_args
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
{
/* omitted: body of function, accessing arguments via the Python C API function PyArg_ParseTupleAndKeywords, returning a PyObject* result, NULL for errors */
}
or a slightly simpler variant:
static
PyObject
*
func_with_positional_args_only
(
PyObject
*
self
,
PyObject
*
args
)
{
/* omitted: body of function, accessing arguments via the Python C API function PyArg_ParseTuple, returning a PyObject* result, NULL for errors */
}
How C-coded functions access arguments passed by Python code is covered in “Accessing Arguments”. How such functions build Python objects is covered in “Creating Python Values”, and how they raise or propagate exceptions back to the Python code that called them is covered in Chapter 5. When your module defines new Python types, AKA classes, your C code defines one or more instances of the struct PyTypeObject
. This subject is covered in “Defining New Types”.
A simple example using all these concepts is shown in “A Simple Extension Example”. A toy-level “Hello World” example module could be as simple as:
#include <Python.h>
static
PyObject
*
hello
(
PyObject
*
self
)
{
return
Py_BuildValue
(
"s"
,
"Hello, Python extensions world!"
);
}
static
char
hello_docs
[]
=
"hello(): return a popular greeting phrase
"
;
static
PyMethodDef
hello_funcs
[]
=
{
{
"helloworld"
,
(
PyCFunction
)
hello
,
METH_NOARGS
,
hello_docs
},
{
NULL
}
};
static
struct
PyModuleDef
hello_module
=
{
PyModuleDef_HEAD_INIT
,
"hello"
,
hello_docs
,
-
1
,
hello_funcs
};
PyMODINIT_FUNC
PyInit_hello
(
void
)
{
return
PyModule_Create
(
&
hello_module
);
}
The C string passed to Py_BuildValue
is encoded in UTF-8, and the result is a Python str
instance, which in v2 is also UTF-8 encoded. As previously mentioned, this is for v3. For the slight differences in module initialization in v2, see the online docs; for this trivial extension, all you need is to guard the whole definition of helloworld_module
in a #if PY_MAJOR_VERSION >= 3
/ #endif
(since in v2 there is no such type as PyModuleDef
), and change the module initialization function accordingly, to:
PyMODINIT_FUNC
#if PY_MAJOR_VERSION >= 3
PyInit_hello
(
void
)
{
return
PyModule_Create
(
&
hello_module
);
#else
inithello
(
void
)
{
Py_InitModule3
(
"hello"
,
hello_funcs
,
hello_docs
);
#endif
}
Save this as hello.c and build it through a setup.py script with distutils
, such as:
from
setuptools
import
setup
,
Extension
setup
(
name
=
'hello'
,
ext_modules
=
[
Extension
(
'hello'
,
sources
=
[
'hello.c'
])])
After you have run python setup.py install, you can use the newly installed module—for example, from a Python interactive session—such as:
>>>
import
hello
>>>
hello
.
hello
(
)
Hello, Python extensions world!
>>>
All functions in the Python C API return either an int
or a PyObject*
. Most functions returning int
return 0
in case of success and -1
to indicate errors. Some functions return results that are true or false: these functions return 0
to indicate false and an integer not equal to 0
to indicate true, and never indicate errors. Functions returning PyObject*
return NULL
in case of errors. See Chapter 5 for more details on how C-coded functions handle and raise errors.
The PyInit_x
function must contain, at a minimum, a call to the function Py_Module_Create
, (or, since 3.5, PyModuleDef_Init
), with, as the only parameter, the address of the struct
PyModuleDef
that defines the module’s details. In addition, it may have one or more calls to the functions listed in Table 24-1, all returning -1
on error, 0
on success.
PyModule_AddIntConstant |
Adds to the module |
PyModule_AddObject |
Adds to the module |
PyModule_AddStringConstant |
Adds to the module |
Sometimes, as part of the job of initializing your new module, you need to access something within another module—if you were coding in Python, you would just import othermodule
, then access attributes of othermodule
. Coding a Python extension in C, it can be almost as simple: call PyImport_Import
for the other module, then PyModule_GetDict
to get the other module’s __dict__
.
PyImport_Import |
Imports the module named in Python string object |
PyModule_GetDict |
Returns a borrowed reference (see “Reference Counting”) to the dictionary of the module |
To add functions to a module (or nonspecial methods to new types, as covered in “Defining New Types”), you must describe the functions or methods in an array of PyMethodDef
struct
s, and terminate the array with a sentinel (i.e., a structure whose fields are all 0
or NULL
). PyMethodDef
is defined as follows:
typedef
struct
{
char
*
ml_name
;
/* Python name of function or method */
PyCFunction
ml_meth
;
/* pointer to C function implementing it */
int
ml_flags
;
/* flag describing how to pass arguments */
char
*
ml_doc
;
/* docstring for the function or method */
}
PyMethodDef
You must cast the second field to (PyCFunction)
unless the C function’s signature is exactly PyObject*
function
(PyObject*
self
, PyObject*
args
)
, which is the typedef
for PyCFunction
. This signature is correct when ml_flags
is METH_O
, which indicates a function that accepts a single argument, or METH_VARARGS
, which indicates a function that accepts positional arguments. For METH_O
, args
is the only argument. For METH_VARARGS
, args
is a tuple of all arguments, to be parsed with the C API function PyArg_ParseTuple
. However, ml_flags
can also be METH_NOARGS
, which indicates a function that accepts no arguments, or METH_KEYWORDS
, which indicates a function that accepts both positional and named arguments. For METH_NOARGS
, the signature is PyObject*
function
(PyObject*
self
)
, without further arguments. For METH_KEYWORDS
, the signature is:
PyObject
*
function
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
args
is the tuple of positional arguments, and kwds
is the dictionary of named arguments; both are parsed with the C API function PyArg_ParseTupleAndKeywords
. In these cases, you do need to explicitly cast the second field to (PyCFunction)
.
When a C-coded function implements a module’s function, the self
parameter of the C function is NULL
, for any value of the ml_flags
field. When a C-coded function implements a nonspecial method of an extension type, the self
parameter points to the instance on which the method is being called.
Python objects live on the heap, and C code sees them as pointers of the type PyObject*
. Each PyObject
counts how many references to itself are outstanding and destroys itself when the number of references goes down to 0
. To make this possible, your code must use Python-supplied macros: Py_INCREF
to add a reference to a Python object and Py_DECREF
to abandon a reference to a Python object. The Py_XINCREF
and Py_XDECREF
macros are like Py_INCREF
and Py_DECREF
, but you may also use them innocuously on a null pointer. The test for a nonnull pointer is implicitly performed inside the Py_XINCREF
and Py_XDECREF
macros, saving you the little bother of writing out that test explicitly when you don’t know for sure whether the pointer might be null.
A PyObject*
p
, which your code receives by calling or being called by other functions, is known as a new reference when the code that supplies p
has already called Py_INCREF
on your behalf. Otherwise, it is known as a borrowed reference. Your code is said to own new references it holds, but not borrowed ones. You can call Py_INCREF
on a borrowed reference to make it into a reference that you own; you must do this when you need to use the reference across calls to code that might cause the count of the reference you borrowed to be decremented. You must always call Py_DECREF
before abandoning or overwriting references that you own, but never on references you don’t own. Therefore, understanding which interactions transfer reference ownership and which ones rely on reference borrowing is absolutely crucial. For most functions in the C API, and for all functions that you write and Python calls, the following general rules apply:
PyObject*
arguments are borrowed references.
A PyObject*
returned as the function’s result transfers ownership.
For each of the two rules, there are a few exceptions for some functions in the C API. PyList_SetItem
and PyTuple_SetItem
steal a reference to the item they are setting (but not to the list or tuple object into which they’re setting it), meaning that they take ownership even though by general rules that item would be a borrowed reference. PyList_SET_ITEM
and PyTuple_SET_ITEM
, the C preprocessor macros, which implement faster versions of the item-setting functions, are also reference-stealers. So is PyModule_AddObject
, covered in Table 24-1. There are no other exceptions to the first rule. The rationale for these exceptions, which may help you remember them, is that the object you just created will be owned by the list, tuple, or module, so the reference-stealing semantics save unnecessary use of Py_DECREF
immediately afterward.
The second rule has more exceptions than the first one. There are several cases in which the returned PyObject*
is a borrowed reference rather than a new reference. The abstract functions—whose names begin with PyObject_
, PySequence_
, PyMapping_
, and PyNumber_
—return new references. This is because you can call them on objects of many types, and there might not be any other reference to the resulting object that they return (i.e., the returned object might have to be created on the fly). The concrete functions—whose names begin with PyList_
, PyTuple_
, PyDict_
, and so on—return a borrowed reference when the semantics of the object they return ensure that there must be some other reference to the returned object somewhere.
In this chapter, we show all cases of exceptions to these rules (i.e., return of borrowed references and rare cases of reference stealing from arguments) regarding all functions we cover. When we don’t explicitly mention a function as being an exception, the function follows the rules: its PyObject*
arguments, if any, are borrowed references, and its PyObject*
result, if any, is a new reference.
A function that has ml_flags
in its PyMethodDef
set to METH_NOARGS
is called from Python with no arguments. The corresponding C function has a signature with only one argument, self
. When ml_flags
is METH_O
, Python code calls the function with exactly one argument. The C function’s second argument is a borrowed reference to the object that the Python caller passes as the argument’s value.
When ml_flags
is METH_VARARGS
, Python code calls the function with any number of positional arguments, which the Python interpreter implicitly collects into a tuple. The C function’s second argument is a borrowed reference to the tuple. Your C code then calls the PyArg_ParseTuple
function:
PyArg_ParseTuple |
Returns
|
Code formats d
to n
(and rarely used other codes for unsigned chars and short ints) accept numeric arguments from Python. Python coerces the corresponding values. For example, a code of i
can correspond to a Python float
; the fractional part gets truncated, as if the built-in function int
had been called. Py_Complex
is a C struct with two fields named real
and imag
, both of type double
.
O
is the most general format code and accepts any argument, which you can later check and/or convert as needed. The variant O!
corresponds to two arguments in the variable arguments: first the address of a Python type object, then the address of a PyObject*
. O!
checks that the corresponding value belongs to the given type (or any subtype of that type) before setting the PyObject*
to point to the value; otherwise, it raises TypeError
(the whole call fails, and the error is set to an appropriate TypeError
instance, as covered in Chapter 5). The variant O&
also corresponds to two arguments in the variable arguments: first the address of a converter function you coded, then a void*
(i.e., any address). The converter function must have signature int
convert
(PyObject*, void*)
. Python calls your conversion function with the value passed from Python as the first argument and the void*
from the variable arguments as the second argument. The conversion function must either return 0
and raise an exception (as covered in Chapter 5) to indicate an error, or return 1
and store whatever is appropriate via the void*
it gets.
The code format s
accepts a string from Python and the address of a char*
(i.e., a char**
) among the variable arguments. It changes the char*
to point at the string’s buffer, which your C code must treat as a read-only, null-terminated array of char
s (i.e., a typical C string; however, your code must not modify it). The Python string must contain no embedded null characters; in v3, the resulting encoding is UTF-8. s#
is similar, but corresponds to two arguments among the variable arguments: first the address of a char*
, then the address of an int
, which gets set to the string’s length. The Python string can contain embedded nulls, and therefore so can the buffer to which the char*
is set to point. u
and u#
are similar, but specifically accept a Unicode string (in both v3 and v3), and the C-side pointers must be Py_UNICODE*
rather than char*
. Py_UNICODE
is a macro defined in Python.h, and corresponds to the type of a Python Unicode character in the implementation (this is often, but not always, a C wchar_t
).
t#
and w#
are similar to s#
, but the corresponding Python argument can be any object of a type respecting the buffer protocol, respectively read-only and read/write. Strings are a typical example of read-only buffers. mmap
and array
instances are typical examples of read/write buffers, and like all read/write buffers they are also acceptable where a read-only buffer is required (i.e., for a t#
).
When one of the arguments is a Python sequence of known fixed length, you can use format codes for each of its items, and corresponding C addresses among the variable arguments, by grouping the format codes in parentheses. For example, code (ii)
corresponds to a Python sequence of two numbers and, among the remaining arguments, corresponds to two addresses of int
s.
The format string may include a vertical bar (|
) to indicate that all following arguments are optional. In this case, you must initialize the C variables, whose addresses you pass among the variable arguments for later arguments, to suitable default values before you call PyArg_ParseTuple
. PyArg_ParseTuple
does not change the C variables corresponding to optional arguments that were not passed in a given call from Python to your C-coded function.
For example, here’s the start of a function to be called with one mandatory integer argument, optionally followed by another integer argument defaulting to 23 if absent (rather like def f(x, y=23):
in Python, except that the function must be called with positional arguments only and the arguments must be numeric):
PyObject
*
f
(
PyObject
*
self
,
PyObject
*
args
)
{
int
x
,
y
=
23
;
if
(
!
PyArg_ParseTuple
(
args
,
"i|i"
,
&
x
,
&
y
)
return
NULL
;
/* rest of function snipped */
}
The format string may optionally end with :
name
to indicate that name
must be used as the function name if any error messages result. Alternatively, the format string may end with ;
text
to indicate that text
must be used as the entire error message if PyArg_ParseTuple
detects errors (this form is rarely used).
A function that has ml_flags
in its PyMethodDef
set to METH_KEYWORDS
accepts positional and named arguments. Python code calls the function with any number of positional arguments, which the Python interpreter collects into a tuple, and named arguments, which the Python interpreter collects into a dictionary. The C function’s second argument is a borrowed reference to the tuple, and the third one is a borrowed reference to the dictionary. Your C code then calls the PyArg_ParseTupleAndKeywords
function.
C functions that communicate with Python must often build Python values, both to return as their PyObject*
result and for other purposes, such as setting items and attributes. The simplest and handiest way to build a Python value is most often with the Py_BuildValue
function:
Py_BuildValue |
|
The code O&
corresponds to two arguments among the variable arguments: first the address of a converter function you code, then a void*
(i.e., any address). The converter function must have signature PyObject*
convert
(void*)
. Python calls the conversion function with the void*
from the variable arguments as the only argument. The conversion function must either return NULL
and raise an exception (as covered in Chapter 5) to indicate an error, or return a new reference PyObject*
built from data obtained through the void*
.
The code {...}
builds dictionaries from an even number of C values, alternately keys and values. For example, Py_BuildValue("{issi}",23,"zig","zag",42)
returns a new PyObject*
for {23:'zig','zag':42}
.
Note the crucial difference between the codes N
and O
. N
steals a reference from the corresponding PyObject*
value among the variable arguments, so it’s convenient to build an object including a reference you own that you would otherwise have to Py_DECREF
. O
does no reference stealing, so it’s appropriate to build an object including a reference you don’t own, or a reference you must also keep elsewhere.
To propagate exceptions raised from other functions you call, just return NULL
as the PyObject*
result from your C function. To raise your own exceptions, set the current-exception indicator, then return NULL
. Python’s built-in exception classes (covered in “Standard Exception Classes”) are globally available, with names starting with PyExc_
, such as PyExc_AttributeError
, PyExc_KeyError
, and so on. Your extension module can also supply and use its own exception classes. The most commonly used C API functions related to raising exceptions are the following:
PyErr_Format |
Raises an exception of class
|
PyErr_NewException |
Extends the exception class
|
PyErr_NoMemory |
Raises an out-of-memory error and returns
|
PyErr_SetObject |
Raises an exception of class |
PyErr_SetFromErrno |
Raises an exception of class
|
PyErr_SetFromErrnoWithFilename |
Like |
Your C code may want to deal with an exception and continue, as a try
/except
statement would let you do in Python code. The most commonly used C API functions related to catching exceptions are the following:
PyErr_Clear |
Clears the error indicator. Innocuous if no error is pending. |
PyErr_ExceptionMatches |
Call only when an error is pending, or the whole program might crash. Returns a value |
PyErr_Occurred |
Returns |
PyErr_Print |
Call only when an error is pending, or the whole program might crash. Outputs a standard traceback to |
If you need to process errors in highly sophisticated ways, study other error-related functions of the C API, such as PyErr_Fetch
, PyErr_Normalize
, PyErr_GivenExceptionMatches
, and PyErr_Restore
in the online docs. This book does not cover those advanced and rarely needed possibilities.
The code for a C extension typically needs to use some Python functionality. For example, your code may need to examine or set attributes and items of Python objects, call Python-coded and Python built-in functions and methods, and so on. In most cases, the best approach is for your code to call functions from the abstract layer of Python’s C API. These are functions that you can call on any Python object (functions whose names start with PyObject_
), or on any object within a wide category, such as mappings, numbers, or sequences (with names starting with PyMapping_
, PyNumber_
, and PySequence_
, respectively).
Many of the functions callable on specifically typed objects within these categories duplicate functionality that is also available from PyObject_
functions. In these cases, you should almost invariably use the more general PyObject_
function instead. We don’t cover such almost-redundant functions in this book.
Functions in the abstract layer raise Python exceptions if you call them on objects to which they are not applicable. All of these functions accept borrowed references for PyObject*
arguments and return a new reference (NULL
for an exception) if they return a PyObject*
result.
The most frequently used abstract-layer functions are listed in Table 24-4.
PyCallable_Check |
True when |
PyIter_Check |
True when |
PyIter_Next |
Returns the next item from iterator |
PyNumber_Check |
True when |
PyObject_Call |
Calls the callable Python object |
PyObject_CallObject |
Calls the callable Python object |
PyObject_CallFunction |
Calls the callable Python object |
PyObject_CallFunctionObjArgs |
Calls the callable Python object |
PyObject_CallMethod |
Calls the method named |
PyObject_CallMethodObjArgs |
Calls the method named |
PyObject_DelAttrString |
Deletes |
PyObject_DelItem |
Deletes |
PyObject_DelItemString |
Deletes |
PyObject_GetAttrString |
Returns |
PyObject_GetItem |
Returns |
PyObject_GetItemString |
Returns |
PyObject_GetIter |
Returns an iterator on |
PyObject_HasAttrString |
True if |
PyObject_IsTrue |
True if |
PyObject_Length |
Returns |
PyObject_Repr |
Returns |
PyObject_RichCompare |
Performs the comparison indicated by |
PyObject_RichCompareBool |
Like |
PyObject_SetAttrString |
Sets |
PyObject_SetItem |
Sets |
PyObject_SetItemString |
Sets |
PyObject_Str |
Returns |
PyObject_Type |
Returns |
PySequence_Contains |
True if |
PySequence_DelSlice |
Deletes |
PySequence_Fast |
Returns a new reference to a tuple with the same items as |
PySequence_Fast_GET_ITEM |
Returns the |
PySequence_Fast_GET_SIZE |
Returns the length of |
PySequence_GetSlice |
Returns |
PySequence_List |
Returns a new list object with the same items as |
PySequence_SetSlice |
Sets |
PySequence_Tuple |
Returns a new reference to a tuple with the same items as |
Other functions, whose names start with PyNumber_
, let you perform numeric operations. Unary PyNumber
functions, which take one argument PyObject*
x
and return a PyObject*
, are listed in Table 24-5 with their Python equivalents.
Function | Python equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Binary PyNumber
functions, which take two PyObject*
arguments x
and y
and return a PyObject*
, are similarly listed in Table 24-6.
Function | Python equivalent |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
All the binary PyNumber
functions have in-place equivalents whose names start with PyNumber_InPlace
, such as PyNumber_InPlaceAdd
and so on. The in-place versions try to modify the first argument in place, if possible, and in any case return a new reference to the result, be it the first argument (modified) or a new object. Python’s built-in numbers are immutable; therefore, when the first argument is a number of a built-in type, the in-place versions work just the same as the ordinary versions. The function PyNumber_Divmod
returns a tuple with two items (the quotient and the remainder) and has no in-place equivalent.
There is one ternary PyNumber
function, PyNumber_Power
:
Each specific type of Python built-in object supplies concrete functions to operate on instances of that type, with names starting with Py
type
_
(e.g., PyInt_
for functions related to Python int
s). Most such functions duplicate the functionality of abstract-layer functions or auxiliary functions covered earlier in this chapter, such as Py_BuildValue
, which can generate objects of many types. In this section, we cover just some frequently used functions from the concrete layer that provide unique functionality, or very substantial convenience or speed. For most types, you can check whether an object belongs to the type by calling Py
type
_Check
, which also accepts instances of subtypes, or Py
type
_CheckExact
, which accepts only instances of type
, not of subtypes. Signatures are the same as for function PyIter_Check
, covered in Table 24-4.
All functions whose name start with PyString
are actually named PyBytes
in v3 (and work on Python bytes
objects, not str
ones), but the names starting with PyString
remain available (as synonyms implemented by C preprocessor macros) for convenience. Concrete-layer functions dealing with str
in v3 and unicode
in v2 have names starting with PyUnicode
, but you’re usually better off using the corresponding abstract-layer functions instead, so we don’t cover the concrete-layer equivalents in this book.
Example 24-1 exposes the functionality of Python C API functions PyDict_Merge
and PyDict_MergeFromSeq2
for Python use. The update
method of dict
s works like PyDict_Merge
with override
=1
, but Example 24-1 is slightly more general.
#include <Python.h>
static
PyObject
*
merge
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
{
static
char
*
argnames
[]
=
{
"x"
,
"y"
,
"override"
,
NULL
};
PyObject
*
x
,
*
y
;
int
override
=
0
;
if
(
!
PyArg_ParseTupleAndKeywords
(
args
,
kwds
,
"O!O|i"
,
argnames
,
&
PyDict_Type
,
&
x
,
&
y
,
&
override
))
return
NULL
;
if
(
-
1
==
PyDict_Merge
(
x
,
y
,
override
))
{
if
(
!
PyErr_ExceptionMatches
(
PyExc_TypeError
))
return
NULL
;
PyErr_Clear
();
if
(
-
1
==
PyDict_MergeFromSeq2
(
x
,
y
,
override
))
return
NULL
;
}
return
Py_BuildValue
(
""
);
}
static
char
merge_docs
[]
=
"
merge(x,y,override=False): merge into dict x the items of dict y (or
the pairs that are the items of y, if y is a sequence), with
optional override. Alters dict x directly, returns None.
"
;
static
PyObject
*
mergenew
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
{
static
char
*
argnames
[]
=
{
"x"
,
"y"
,
"override"
,
NULL
};
PyObject
*
x
,
*
y
,
*
result
;
int
override
=
0
;
if
(
!
PyArg_ParseTupleAndKeywords
(
args
,
kwds
,
"O!O|i"
,
argnames
,
&
PyDict_Type
,
&
x
,
&
y
,
&
override
))
return
NULL
;
result
=
PyObject_CallMethod
(
x
,
"copy"
,
""
);
if
(
!
result
)
return
NULL
;
if
(
-
1
==
PyDict_Merge
(
result
,
y
,
override
))
{
if
(
!
PyErr_ExceptionMatches
(
PyExc_TypeError
))
return
NULL
;
PyErr_Clear
(
);
if
(
-
1
==
PyDict_MergeFromSeq2
(
result
,
y
,
override
))
return
NULL
;
}
return
result
;
}
static
char
mergenew_docs
[]
=
"
mergenew(x,y,override=False): merge into dict x the items of dict y
(or the pairs that are the items of y, if y is a sequence), with
optional override. Does NOT alter x, but rather returns the
modified copy as the function's result.
"
;
static
PyMethodDef
merge_funcs
[]
=
{
{
"merge"
,
(
PyCFunction
)
merge
,
METH_KEYWORDS
,
merge_docs
},
{
"mergenew"
,
(
PyCFunction
)
mergenew
,
METH_KEYWORDS
,
mergenew_docs
},
{
NULL
}
};
static
char
merge_module_docs
[]
=
"Example extension module"
;
static
struct
PyModuleDef
merge_module
=
{
PyModuleDef_HEAD_INIT
,
"merge"
,
merge_module_docs
,
-
1
,
merge_funcs
};
PyMODINIT_FUNC
PyInit_merge
(
void
)
{
return
PyModule_Create
(
&
merge_module
);
}
This example declares as static
every function and global variable in the C source file except PyInit_merge
(in v3; it would be named initmerge
in v2), which must be visible from the outside so Python can call it. Since the functions and variables are exposed to Python via PyMethodDef
structures, Python does not need to see their names directly. Therefore, declaring them static
is best: this ensures that their names don’t accidentally end up in the whole program’s global namespace, as might otherwise happen on some platforms, possibly causing conflicts and errors.
The format string "O!O|i"
passed to PyArg_ParseTupleAndKeywords
indicates that the function merge
accepts three arguments from Python: an object with a type constraint, a generic object, and an optional integer. At the same time, the format string indicates that the variable part of PyArg_ParseTupleAndKeywords
’s arguments must contain four addresses in the following order: the address of a Python type object, two addresses of PyObject*
variables, and the address of an int
variable. The int
variable must be previously initialized to its intended default value, since the corresponding Python argument is optional.
And indeed, after the argnames
argument, the code passes &PyDict_Type
(i.e., the address of the dictionary type object). Then it passes the addresses of the two PyObject*
variables. Finally, it passes the address of the variable override
, an int
that was previously initialized to 0
, since the default, when the override
argument isn’t explicitly passed from Python, is “no overriding.” If the return value of PyArg_ParseTupleAndKeywords
is 0
, then the code immediately returns NULL
to propagate the exception; this automatically diagnoses most cases where Python code passes wrong arguments to our new function merge
.
When the arguments appear to be okay, it tries PyDict_Merge
, which succeeds if y
is a dictionary. When PyDict_Merge
raises a TypeError
, indicating that y
is not a dictionary, the code clears the error and tries again, this time with PyDict_MergeFromSeq2
, which succeeds when y
is a sequence of pairs. If that also fails, it returns NULL
to propagate the exception. Otherwise, it returns None
in the simplest way (i.e., with return Py_BuildValue("")
) to indicate success.
The mergenew
function basically duplicates merge
’s functionality; however, mergenew
does not alter its arguments, but rather builds and returns a new dictionary as the function’s result. The C API function PyObject_CallMethod
lets mergenew
call the copy
method of its first Python-passed argument, a dictionary object, and obtain a new dictionary object that it then alters (with exactly the same logic as the merge
function). It then returns the altered dictionary as the function result (thus, there’s no need to call Py_BuildValue
in this case).
The code of Example 24-1 must reside in a source file named merge.c. In the same directory, create the following script named setup.py:
from
setuptools
import
setup
,
Extension
setup
(
name
=
'merge'
,
ext_modules
=
[
Extension
(
'merge'
,
sources
=
[
'merge.c'
])])
Now, run python setup.py install
at a shell prompt in this directory (with a user ID having appropriate privileges to write into your Python installation, or use sudo
on Unix-like systems if necessary—or, best, use a virtual environment!). This command builds the dynamically loaded library for the merge
extension module, and copies it to the appropriate directory for your Python installation. Now Python code (in the appropriate virtual environment, if you have, as recommended, used venv
) can use the module. For example:
import
merge
x
=
{
'
a
'
:
1
,
'
b
'
:
2
}
merge
.
merge
(
x
,
[
[
'
b
'
,
3
]
,
[
'
c
'
,
4
]
]
)
(
x
)
# prints:
{'a':1, 'b':2, 'c':4 }
(
merge
.
mergenew
(
x
,
{
'
a
'
:
5
,
'
d
'
:
6
}
,
override
=
1
)
)
# prints:
{'a':5, 'b':2, 'c':4, 'd':6 }
(
x
)
# prints:
{'a':1, 'b':2, 'c':4 }
This example shows the difference between merge
(which alters its first argument) and mergenew
(which returns a new object and does not alter its argument). It also shows that the second argument can be either a dictionary or a sequence of two-item subsequences. It also demonstrates the default operation (where keys that are already in the first argument are left alone) versus the override
option (where keys coming from the second argument take precedence, as in Python dictionaries’ update
method).
In your extension modules, you often want to define new types and make them available to Python. A type’s definition is held in a struct named PyTypeObject
. Most of the fields of PyTypeObject
are pointers to functions. Some fields point to other structs, which in turn are blocks of pointers to functions. PyTypeObject
also includes a few fields that give the type’s name, size, and behavior details (option flags). You can leave almost all fields of PyTypeObject
set to NULL
if you do not supply the related functionality. You can point some fields to functions in the Python C API in order to supply fundamental object functionality in standard ways.
The best way to implement a type is to copy from the Python sources one of three files in the directory Modules, which Python supplies exactly for such didactical purposes, and edit it. The files are named xxlimited.c (v3 only), xxmodule.c, and xxsubtype.c (the latter focused on subclassing built-in types, with two types, one each subclassing from list
and dict
, respectively).
See the online docs for detailed documentation on PyTypeObject
and other related structs. The file Include/object.h in the Python sources contains the declarations of these types, as well as several important comments that you would do well to study.
To represent each instance of your type, declare a C struct that starts, right after the opening brace, with the macro PyObject_HEAD
. The macro expands into the data fields that your struct must begin with in order to be a Python object. These fields include the reference count and a pointer to the instance’s type. Any pointer to your structure can be correctly cast to a PyObject*
. You can choose to look at this practice as a kind of C-level implementation of a (single) inheritance mechanism.
The PyTypeObject
struct defining your type’s characteristics and behavior must contain the size of your per-instance struct, as well as pointers to the C functions you write to operate on your structure. Thus, you normally place the PyTypeObject
toward the end of your C-coded module’s source code, after the definitions of the per-instance struct, and of all the functions operating on instances of the per-instance struct. Each x
pointing to a struct
starting with PyObject_HEAD
, and in particular each PyObject*
x
, has a field x
->ob_type
that is the address of the PyTypeObject
structure that is x
’s Python type object.
Given a per-instance struct such as:
typedef
struct
{
PyObject_HEAD
/* other data needed by instances of this type, omitted */
}
mytype
;
the related PyTypeObject
struct, almost invariably, begins in a way similar to:
static
PyTypeObject
t_mytype
=
{
/* tp_head */
PyObject_HEAD_INIT
(
NULL
)
/* use NULL for MSVC++ */
/* tp_internal */
0
,
/* must be 0 */
/* tp_name */
"mymodule.mytype"
,
/* type name, including module */
/* tp_basicsize */
sizeof
(
mytype
),
/* tp_itemsize */
0
,
/* 0 except variable-size type */
/* tp_dealloc */
(
destructor
)
mytype_dealloc
,
/* tp_print */
0
,
/* usually 0, use str instead */
/* tp_getattr */
0
,
/* usually 0 (see getattro) */
/* tp_setattr */
0
,
/* usually 0 (see setattro) */
/* tp_compare*/
0
,
/* see also richcompare */
/* tp_repr */
(
reprfunc
)
mytype_str
,
/* like Python's __repr__ */
/* rest of struct omitted */
For portability to Microsoft Visual C++, the PyObject_HEAD_INIT
macro at the start of the PyTypeObject
must have an argument of NULL
. During module initialization, you must call PyType_Ready(&t_mytype)
, which, among other tasks, inserts in t_mytype
the address of its type (the type of a type is also known as a metatype), normally &PyType_Type
. Another slot in PyTypeObject
pointing to another type object is tp_base
, which comes later in the structure. In the structure definition itself, you must have a tp_base
of NULL
, again for compatibility with Microsoft Visual C++. However, before you invoke PyType_Ready(&t_mytype)
, you can optionally set t_mytype.tp_base
to the address of another type object. When you do so, your type inherits from the other type, just as a class coded in Python can optionally inherit from a built-in type. For a Python type coded in C, “inheriting” means that, for most fields of PyTypeObject
, if you set the field to NULL
, PyType_Ready
copies the corresponding field from the base type. A type must explicitly assert in its field tp_flags
that it’s usable as a base type; otherwise, no type can inherit from it.
The tp_itemsize
field is of interest only for types that, like tuples, have instances of different sizes, and can determine instance size once and forever at creation time. Most types just set tp_itemsize
to 0
. The fields tp_getattr
and tp_setattr
are generally set to NULL
because they exist only for backward compatibility; modern types use the fields tp_getattro
and tp_setattro
instead. The tp_repr
field is typical of most of the following fields, which are omitted here: the field holds the address of a function, which corresponds directly to a Python special method (here, __repr__
). You can set the field to NULL
, indicating that your type does not supply the special method, or else set the field to point to a function with the needed functionality. If you set the field to NULL
but also point to a base type from the tp_base
slot, you inherit the special method, if any, from your base type. You often need to cast your functions to the specific typedef
type that a field needs (here, the reprfunc
type for the tp_repr
field) because the typedef
has a first argument PyObject* self
, while your functions—being specific to your type—normally use more specific pointers. For example:
static
PyObject
*
mytype_str
(
mytype
*
self
)
{
...
/* rest omitted */
Alternatively, you can declare mytype_str
with a PyObject* self
, then use a cast (mytype*)self
in the function’s body. Either alternative is acceptable style, but it’s more common to locate the casts in the PyTypeObject
declaration.
The task of finalizing your instances is split among two functions. The tp_dealloc
slot must never be NULL
, except for immortal types (i.e., types whose instances are never deallocated). Python calls x
->ob_type->tp_dealloc(
x
)
on each instance x
whose reference count decreases to 0
, and the function thus called must release any resource held by object x
, including x
’s memory. When an instance of mytype
holds no other resources that must be released (in particular, no owned references to other Python objects that you would have to DECREF
), mytype
’s destructor can be extremely simple:
static
void
mytype_dealloc
(
PyObject
*
x
)
{
x
->
ob_type
->
tp_free
((
PyObject
*
)
x
);
}
The function in the tp_free
slot has the specific task of freeing x
’s memory. Often, you can just put in slot tp_free
the address of the C API function _PyObject_Del
.
The task of initializing your instances is split among three functions. To allocate memory for new instances of your type, put in slot tp_alloc
the C API function PyType_GenericAlloc
, which does absolutely minimal initialization, clearing the newly allocated memory bytes to 0
except for the type pointer and reference count. Similarly, you can often set field tp_new
to the C API function PyType_GenericNew
. In this case, you can perform all per-instance initialization in the function you put in slot tp_init
, which has the signature:
int
init_name
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
The positional and named arguments to the function in slot tp_init
are those passed when calling the type to create the new instance, just as, in Python, the positional and named arguments to __init__
are those passed when calling the class. Again, as for types (classes) defined in Python, the general rule is to do as little initialization as feasible in tp_new
and do as much as possible in tp_init
. Using PyType_GenericNew
for tp_new
accomplishes this. However, you can choose to define your own tp_new
for special types, such as ones that have immutable instances, where initialization must happen earlier. The signature is:
PyObject
*
new_name
(
PyObject
*
subtype
,
PyObject
*
args
,
PyObject
*
kwds
)
The function in tp_new
returns the newly created instance, normally an instance of subtype
(which may be a subtype of yours). The function in tp_init
, on the other hand, must return 0
for success, or -1
to indicate an exception.
If your type is subclassable, it’s important that any instance invariants be established before the function in tp_new
returns. For example, if it must be guaranteed that a certain field of the instance is never NULL
, that field must be set to a non-NULL
value by the function in tp_new
. Subtypes of your type might fail to call your tp_init
function; therefore, such indispensable initializations, needed to establish type invariants, should always be in tp_new
for subclassable types.
Access to attributes of your instances, including methods (as covered in “Attribute Reference Basics”), goes through the functions in slots tp_getattro
and tp_setattro
of your PyTypeObject
struct. Normally, you use the standard C API functions PyObject_GenericGetAttr
and PyObject_GenericSetAttr
, which implement standard semantics. Specifically, these API functions access your type’s methods via the slot tp_methods
, pointing to a sentinel-terminated array of PyMethodDef
structs, and your instances’ members via the slot tp_members
, a sentinel-terminated array of PyMemberDef
structs:
typedef
struct
{
char
*
name
;
/* Python-visible name of the member */
int
type
;
/* code defining the data-type of the member */
int
offset
;
/* member's offset in the per-instance struct */
int
flags
;
/* READONLY for a read-only member */
char
*
doc
;
/* docstring for the member */
}
PyMemberDef
;
As an exception to the general rule that including Python.h gets you all the declarations you need, you have to include structmember.h explicitly in order to have your C source see the declaration of PyMemberDef
.
type
is generally T_OBJECT
for members that are PyObject*
, but many other type codes are defined in Include/structmember.h for members that your instances hold as C-native data (e.g., T_DOUBLE
for double
or T_STRING
for char*
). For example, say that your per-instance struct is something like this:
typedef
struct
{
PyObject_HEAD
double
datum
;
char
*
name
;
}
mytype
;
Expose to Python the per-instance attributes datum
(read/write) and name
(read-only) by defining the following array and pointing your PyTypeObject
’s tp_members
to it:
static
PyMemberDef
[]
mytype_members
=
{
{
"datum"
,
T_DOUBLE
,
offsetof
(
mytype
,
datum
),
0
,
"Current datum"
},
{
"name"
,
T_STRING
,
offsetof
(
mytype
,
name
),
READONLY
,
"Datum name"
},
{
NULL
}
};
Using PyObject_GenericGetAttr
and PyObject_GenericSetAttr
for tp_getattro
and tp_setattro
also provides further possibilities, which we do not cover in detail in this book. tp_getset
points to a sentinel-terminated array of PyGetSetDef
structs, the equivalent of having property
instances in a Python-coded class. If your PyTypeObject
’s field tp_dictoffset
is not equal to 0
, the field’s value must be the offset, within the per-instance struct, of a PyObject*
that points to a Python dictionary. In this case, the generic attribute access API functions use that dictionary to allow Python code to set arbitrary attributes on your type’s instances, just like for instances of Python-coded classes.
Another dictionary is per-type, not per-instance: the PyObject*
for the per-type dictionary is slot tp_dict
of your PyTypeObject
struct. You can set slot tp_dict
to NULL
, and then PyType_Ready
initializes the dictionary appropriately. Alternatively, you can set tp_dict
to a dictionary of type attributes, and then PyType_Ready
adds other entries to that same dictionary, in addition to the type attributes you set. It’s generally easier to start with tp_dict
set to NULL
, call PyType_Ready
to create and initialize the per-type dictionary, and then, if need be, add any further entries to the dictionary via explicit C code.
Field tp_flags
is a long
whose bits determine your type struct’s exact layout, mostly for backward compatibility. Set this field to Py_TPFLAGS_DEFAULT
to indicate that you are defining a normal, modern type. Set tp_flags
to Py_TPFLAGS_DEFAULT|Py_TPFLAGS_HAVE_GC
if your type supports cyclic garbage collection. Your type should support cyclic garbage collection if instances of the type contain PyObject*
fields that might point to arbitrary objects and form part of a reference loop. To support cyclic garbage collection, it’s not enough to add Py_TPFLAGS_HAVE_GC
to field tp_flags
; you also have to supply appropriate functions, indicated by the slots tp_traverse
and tp_clear
, and register and unregister your instances appropriately with the cyclic garbage collector. Supporting cyclic garbage collection is an advanced subject, and we don’t cover it further in this book; see the online docs. Similarly, we don’t cover the advanced subject of supporting weak references, also well covered online.
The field tp_doc
, a char*
, is a null-terminated character string that is your type’s docstring. Other fields point to structs (whose fields point to functions); you can set each such field to NULL
to indicate that you support none of those functions. The fields pointing to such blocks of functions are tp_as_number
, for special methods typically supplied by numbers; tp_as_sequence
, for special methods typically supplied by sequences; tp_as_mapping
, for special methods typically supplied by mappings; and tp_as_buffer
, for the special methods of the buffer protocol.
For example, objects that are not sequences can still support one or a few of the methods listed in the block to which tp_as_sequence
points, and in this case the PyTypeObject
must have a non-NULL
field tp_as_sequence
, even if the block of function pointers it points to is in turn mostly full of NULL
s. For example, dictionaries supply a __contains__
special method so that you can check if x
in
d
when d
is a dictionary. At the C code level, the method is a function pointed to by the field sq_contains
, which is part of the PySequenceMethods
struct to which the field tp_as_sequence
points. Therefore, the PyTypeObject
struct for the dict
type, named PyDict_Type
, has a non-NULL
value for tp_as_sequence
, even though a dictionary supplies no other field in PySequenceMethods
except sq_contains
, and therefore all other fields in *(PyDict_Type.tp_as_sequence)
are NULL
.
Example 24-2 is a complete Python extension module that defines the very simple type intpair
, each instance of which holds two integers named first
and second
.
#include "Python.h"
#include "structmember.h"
/* per-instance data structure */
typedef
struct
{
PyObject_HEAD
int
first
,
second
;
}
intpair
;
static
int
intpair_init
(
PyObject
*
self
,
PyObject
*
args
,
PyObject
*
kwds
)
{
static
char
*
nams
[]
=
{
"first"
,
"second"
,
NULL
};
int
first
,
second
;
if
(
!
PyArg_ParseTupleAndKeywords
(
args
,
kwds
,
"ii"
,
nams
,
&
first
,
&
second
))
return
-
1
;
((
intpair
*
)
self
)
->
first
=
first
;
((
intpair
*
)
self
)
->
second
=
second
;
return
0
;
}
static
void
intpair_dealloc
(
PyObject
*
self
)
{
self
->
ob_type
->
tp_free
(
self
);
}
static
PyObject
*
intpair_str
(
PyObject
*
self
)
{
return
PyString_FromFormat
(
"intpair(%d,%d)"
,
((
intpair
*
)
self
)
->
first
,
((
intpair
*
)
self
)
->
second
);
}
static
PyMemberDef
intpair_members
[]
=
{
{
"first"
,
T_INT
,
offsetof
(
intpair
,
first
),
0
,
"first item"
},
{
"second"
,
T_INT
,
offsetof
(
intpair
,
second
),
0
,
"second item"
},
{
NULL
}
};
static
PyTypeObject
t_intpair
=
{
PyObject_HEAD_INIT
(
0
)
/* tp_head */
0
,
/* tp_internal */
"intpair.intpair"
,
/* tp_name */
sizeof
(
intpair
),
/* tp_basicsize */
0
,
/* tp_itemsize */
intpair_dealloc
,
/* tp_dealloc */
0
,
/* tp_print */
0
,
/* tp_getattr */
0
,
/* tp_setattr */
0
,
/* tp_compare */
intpair_str
,
/* tp_repr */
0
,
/* tp_as_number */
0
,
/* tp_as_sequence */
0
,
/* tp_as_mapping */
0
,
/* tp_hash */
0
,
/* tp_call */
0
,
/* tp_str */
PyObject_GenericGetAttr
,
/* tp_getattro */
PyObject_GenericSetAttr
,
/* tp_setattro */
0
,
/* tp_as_buffer */
Py_TPFLAGS_DEFAULT
,
"two ints (first,second)"
,
0
,
/* tp_traverse */
0
,
/* tp_clear */
0
,
/* tp_richcompare */
0
,
/* tp_weaklistoffset */
0
,
/* tp_iter */
0
,
/* tp_iternext */
0
,
/* tp_methods */
intpair_members
,
/* tp_members */
0
,
/* tp_getset */
0
,
/* tp_base */
0
,
/* tp_dict */
0
,
/* tp_descr_get */
0
,
/* tp_descr_set */
0
,
/* tp_dictoffset */
intpair_init
,
/* tp_init */
PyType_GenericAlloc
,
/* tp_alloc */
PyType_GenericNew
,
/* tp_new */
_PyObject_Del
,
/* tp_free */
};
static
PyMethodDef
no_methods
[]
=
{
{
NULL
}
};
static
char
intpair_docs
[]
=
"intpair: data type with int members .first, .second
"
;
static
struct
PyModuleDef
intpair_module
=
{
PyModuleDef_HEAD_INIT
,
"intpair"
,
intpair_docs
,
-
1
,
no_methods
};
PyMODINIT_FUNC
PyInit_intpair
(
void
)
{
PyObject
*
this_module
=
PyModule_Create
(
&
intpair_module
);
PyType_Ready
(
&
t_intpair
);
PyObject_SetAttrString
(
this_module
,
"intpair"
,
(
PyObject
*
)
&
t_intpair
);
return
this_module
;
}
The intpair
type defined in Example 24-2 gives just about no substantial benefits when compared to an equivalent definition in Python, such as:
class
intpair
(
object
):
__slots__
=
'first'
,
'second'
def
__init__
(
self
,
first
,
second
):
self
.
first
=
first
self
.
second
=
second
def
__repr__
(
self
):
return
'intpair(
%s
,
%s
)'
%
(
self
.
first
,
self
.
second
)
The C-coded version does, however, ensure that the two attributes are integers, truncating float or complex number arguments as needed (in Python, you could approximate that functionality by passing the arguments through int
—but it still wouldn’t be quite the same thing, as Python would then also accept argument values such as string '23'
, while the C version wouldn’t). For example:
import
intpair
x
=
intpair
.
intpair
(
1.2
,
3.4
)
# x is: intpair(1,3)
The C-coded version of intpair
occupies a little less memory than the Python version. However, the purpose of Example 24-2 is purely didactic: to present a C-coded Python extension that defines a simple new type.
You can code Python extensions in other classic compiled languages besides C. For Fortran, we recommend Pearu Peterson’s F2PY. It is now part of NumPy, since Fortran is often used for numeric processing. If, as recommended, you have installed numpy
, then you don’t need to also separately install f2py
.
For C++, you can of course use the same approaches as for C (adding extern "C"
where needed). Many C++-specific alternatives exist, but, out of them all, SIP appears to be the only one to be still actively maintained and supported.
A popular alternative is Cython—a “Python dialect” focused on generating C code from Python-like syntax, with a few additions centered on precise specification of the C-level side of things. We highly recommend it, and cover it in “Cython”.
Yet another alternative is CFFI—originated in the PyPy project, and perfect for it, but also fully supporting CPython. The “ABI online” mode of CFFI matches the standard Python library module ctypes
, mentioned in the next section, but CFFI has vast added value, especially the ability to generate and compile C code (“offline,” in CFFI’s terminology) as well as interfacing at the stabler API, rather than ABI, level.
The ctypes
module of the Python standard library lets you load existing C-coded dynamic libraries, and call—in your Python code—functions defined in those libraries. The module is rather popular, because it allows handy “hacks” without needing access to a C compiler or even a third-party Python extension. However, the resulting program can be quite fragile, very platform-dependent, and hard to port to other versions of the same libraries, as it relies on details of their binary interfaces.
We recommend avoiding ctypes
in production code, and using instead one of the excellent alternatives covered or mentioned in this chapter, particularly CFFI
or, even better, cython
.
The Cython language, a “friendly fork” of Greg Ewing’s pyrex, is often the most convenient way to write extensions for Python. Cython is a nearly complete subset of Python (with a declared intention of eventually becoming a complete superset of it1, with the addition of optional C-like types for variables: you can automatically compile Cython programs (source files with extension .pyx) into machine code (via an intermediate stage of generated C code), producing Python-importable extensions. See the preceding URL for deep, thorough, excellent documentation of all details of Cython programming; in this section, we cover only a few essentials to help you get started with Cython.
Cython implements almost all of Python, with four marginal, minor differences that the Cython authors do not intend to fix, and a few more small differences that are considered bugs in current Cython and the authors plan to fix soon (it would be pointless to list them here, as it’s quite possible that by the time you read this they may be fixed).
So, Cython is a vast subset indeed of Python. More importantly, Cython adds to Python a few statements that allow C-like declarations, enabling generation of efficient machine code (via a C-code generation step). Here is a simple example; code it in source file hello.pyx in a new empty directory:
def
hello
(
char
*
name
):
return
'Hello, '
+
name
+
'!'
This is almost exactly like Python—except that parameter name
is preceded by a char*
, declaring that its type must always be a C 0
-terminated string (but, as you see from the body, in Cython, you can use its value as a normal Python string).
When you install Cython (by the download-then-python setup.py install
route, or just pip install cython
), you also gain a way to build Cython source files into Python dynamic-library extensions through the distutils
approach. Code the following in file setup.py in the new directory:
from
setuptools
import
setup
from
Cython.Build
import
cythonize
setup
(
name
=
'hello'
,
ext_modules
=
cythonize
(
'hello.pyx'
))
Now run python setup.py install
in the new directory (ignore compilation warnings; they’re fully expected and benign). Now you can import and use the new module—for example, from an interactive Python session:
>>>
import
hello
>>>
hello
.
hello
(
'
Alex
'
)
'Hello, Alex!'
Given how we’ve coded this Cython, we must pass a string to hello.hello
; passing no arguments, >1 argument, or a nonstring, raises an exception:
>>>
hello
.
hello
()
Traceback (most recent call last):
File"<stdin>"
, line1
, in?
TypeError
:function takes exactly 1 argument (0 given)
>>>
hello
.
hello
(
23
)
Traceback (most recent call last):
File"<stdin>"
, line1
, in?
TypeError
:argument 1 must be string, not int
You can use the keyword cdef
mostly as you would def
, but cdef
defines functions that are internal to the extension module and are not visible on the outside, while def
functions can also be called by Python code that imports the module. cpdef
functions can be called both internally (with speed very close to cdef
ones) and by external Python (just like def
ones), but are otherwise identical to cdef
ones.
For any kind of function, parameters and return values with unspecified types—or, better, ones explicitly typed as object
—become PyObject*
pointers in the generated C code (with implicit and standard handling of reference incrementing and decrementing). cdef
functions may also have parameters and return values of any other C type; def
functions, in addition to untyped (or, equivalently, object
) arguments, can only accept int
, float
, and char*
types. For example, here’s a cdef
function to specifically sum two integers:
cdef
int
sum2i
(
int
a
,
int
b
):
return
a
+
b
You can also use cdef
to declare C variables—scalars, arrays, and pointers like in C:
cdef
int
x
,
y
[23], *z
and struct
, union
, enum
Pythonically (colon on head clause, then indent):
cdef
struct
Ure
:
int
x
,
y
float
z
(Afterward, refer to the new type by name only—e.g., Ure
. Never use the keywords struct
, union
, and enum
except in the cdef
declaring the type.)
To interface with external C code, you can declare variables as cdef extern
, with the same effect that extern
has in the C language. More commonly, the C declarations of some library you want to use are in a .h C header file; to ensure that the Cython-generated C code includes that header file, use the following cdef
:
cdef
extern
from
"
someheader.h
"
:
and follow with a block of indented cdef
-style declarations (without repeating the cdef
in the block). Only declare functions and variables that you want to use in your Cython code. Cython does not read the C header file—it trusts your Cython declarations in the block, not generating any C code for them. Cython implicitly uses the Python C API, covered at the start of this chapter, but you can explicitly access any of its functions. For example, if your Cython file contains:
cdef
extern
from
"Python.h"
:
object
PyString_FromStringAndSize
(
char
*
,
int
)
the following Cython code can use PyString_FromStringAndSize
. This may come in handy, since, by default, C “strings” are deemed to be terminated by a zero character—but with this function you may instead explicitly specify a C string’s length and also get any zero character(s) it may contain.
Conveniently, Cython lets you group such declarations in .pxd files (roughly analogous to C’s .h files, while .pyx Cython files are roughly analogous to C’s .c files). .pxd files can also include cdef inline
declarations, to be inlined at compile time. A .pyx file can import the declarations in a .pxd one by using keyword cimport
, analogous to Python’s import
.
Moreover, Cython comes with several prebuilt .pxd files in its directory Cython/includes. In particular, the .pxd file cpython already has all the cdef extern from "Python.h"
you might want: just cimport cpython
to access them.
A cdef class
statement lets you define a new Python type in Cython. It may include cdef
declarations of attributes (which apply to every instance, not to the type as a whole), which are normally invisible from Python code; however, you can specifically declare attributes as cdef public
to make them normal attributes from Python’s viewpoint, or cdef readonly
to make them visible but read-only from Python (Python-visible attributes must be numbers, strings, or object
s).
A cdef class
supports special methods (with some caveats), properties (with a special syntax), and inheritance (single inheritance only). To declare a property, use the following within the body of the cdef class
:
property
name
:
followed by indented def
statements for methods __get__(self)
and optionally __set__(self,
value
)
and __del__(self)
.
A cdef class
’s __new__
is different from that of a normal Python class: the first argument is self
, the new instance, already allocated and with its memory filled with 0
s. Cython always calls the special method __cinit__(self)
right after the instance allocation, to allow further initialization; __init__
, if defined, is called next. At object destruction time, Cython calls special method __dealloc__(self)
to let you undo whatever allocations __new__
and/or __cinit__
have done (cdef class
es have no __del__
special method).
There are no righthand-side versions of arithmetic special methods, such as __radd__
to go with __add__
, like in Python; rather, if (say) a+b
can’t find or use type(a).__add__
, it next calls type(b).__add__(a, b)
—note the order of arguments (no swapping!). You may need to attempt some type checking to ensure that you perform the correct operation in all cases.
To make the instances of a cdef class
into iterators, define a special method __next__(self)
, like in v3 (even when you’re using Cython with v2).
Here is a Cython equivalent of Example 24-2:
cdef
class
intpair
:
cdef
public
int
first
,
second
def
__init__
(
self
,
first
,
second
):
self
.
first
=
first
self
.
second
=
second
def
__repr__
(
self
):
return
'intpair(
%s
,
%s
)'
%
(
self
.
first
,
self
.
second
)
Like the C-coded extension in Example 24-2, this Cython-coded extension also offers no substantial advantage with respect to a Python-coded equivalent. However, the simplicity and conciseness of the Cython code is much closer to that of Python than to the verbosity and boilerplate needed in C, yet the machine code generated from this Cython file is very close to what gets generated from the C code in Example 24-2.
In addition to the usual Python for
statements, Cython has another form of for
:
for
variable
from
lower_expression
<
=
variable
<
upper_expression
:
This is the most common form, but you can use either <
or <=
on either side of the variable
after the from
keyword; alternatively, you can use >
and/or >=
to have a backward loop (you cannot mix a <
or <=
on one side and >
or >=
on the other).
The for...from
statement is much faster than the usual Python for
variable
in range(...):
, when the variable and loop boundaries are C-kind int
s. However, in modern Cython, for
variable
in range(...):
is optimized to near equivalence to for...from
, so the classic Pythonic for
variable
in range(...):
can usually be chosen for simplicity and readability.
In addition to Python expression syntax, Cython can use some, but not all, of C’s additions to it. To take the address of variable var
, use &
var
, like in C. To dereference a pointer p
, however, use p
[0]
; the equivalent C syntax *
p
is not valid Cython. Where in C you would use p
->
q
, use p
.
q
in Cython. The null pointer uses the Cython keyword NULL
. For char constants, use the syntax c'x'
. For casts, use angle brackets—such as <int>
somefloat
where in C you would code (int)
somefloat
; also, use casts only on C values and onto C types, never with Python values and types (let Cython perform type conversion for you automatically when Python values or types occur).
Euclid’s algorithm for GCD (Greatest Common Divisor) of two numbers is quite simple to implement in pure Python:
def
gcd
(
dividend
,
divisor
):
remainder
=
dividend
%
divisor
while
remainder
:
dividend
=
divisor
divisor
=
remainder
remainder
=
dividend
%
divisor
return
divisor
The Cython version is very similar:
def
gcd
(
int
dividend
,
int
divisor
):
cdef
int
remainder
remainder
=
dividend
%
divisor
while
remainder
:
dividend
=
divisor
divisor
=
remainder
remainder
=
dividend
%
divisor
return
divisor
On a Macbook Air laptop, gcd(454803,278255)
takes about 1 microsecond in the Python version, while the Cython version takes 0.22 microseconds. A speed-up of four-plus times for so little effort can be well worth the bother (assuming, of course, that this function takes up a substantial fraction of your program’s execution time!), even though the pure Python version has practical advantages (it runs in Jython, IronPython, or PyPy just as well as in CPython; it works with long
s just as well as with int
s; it’s effortlessly cross-platform; and so on).
If you have an application already written in C or C++ (or another classic compiled language), you may want to embed Python as your application’s scripting language. To embed Python in languages other than C, the other language must be able to call C functions (how you do that varies, not just by language, but by specific implementation of the language: what compiler, what linker, and so on). In the following, we cover the C view of things; other languages, as mentioned, vary widely regarding what you have to do in order to call C functions from them.
In order for Python scripts to communicate with your application, your application must supply extension modules with Python-accessible functions and classes that expose your application’s functionality. When, as is normal, these modules are linked with your application (rather than residing in dynamic libraries that Python can load when necessary), register your modules with Python as additional built-in modules by calling the PyImport_AppendInittab
C API function:
PyImport_AppendInittab |
|
You may want to set the program name and arguments, which Python scripts can access as sys.argv
, by calling either or both of the following C API functions:
Py_SetProgramName |
Sets the program name, which Python scripts can access as |
PySys_SetArgv |
Sets the program arguments, which Python scripts can access as |
After installing extra built-in modules and optionally setting the program name, your application initializes Python. At the end, when Python is no longer needed, your application finalizes Python. The relevant functions in the C API are as follows:
Py_Finalize |
Frees all memory and other resources that Python is able to free. You should not make any other Python C API call after calling this function. |
Py_Initialize |
Initializes the Python environment. Make no other Python C API call before this one, except |
Your application can run Python source code from a character string or from a file. To run or compile Python source code, choose the mode of execution as one of the following three constants defined in Python.h:
Py_eval_input
The code is an expression to evaluate (like passing 'eval'
to the Python built-in function compile
).
Py_file_input
The code is a block of one or more statements to execute (like 'exec'
for compile
; just like in that case, a trailing '
'
must close compound statements).
Py_single_input
The code is a single statement for interactive execution (like 'single'
for compile
; implicitly outputs the results of expression statements).
Running Python source code is similar to passing a source code string to Python’s exec
or eval
, or, in v2, a source code file to the built-in function execfile
. Two general functions you can use for this task are the following:
PyRun_File |
|
PyRun_String |
Like |
The dictionaries locals
and globals
are often new, empty dict
s (conveniently built by Py_BuildValue("{}")
) or the dictionary of a module. PyImport_Import
is a convenient way to get an existing module object; PyModule_GetDict
gets a module’s dictionary.
When you want to create a new module object on the fly, often in order to populate it with PyRun_
calls, use the PyModule_New
C API function:
PyModule_New |
Returns a new, empty module object for a module named
After this code runs, the module object |
To run Python code repeatedly, and to separate the diagnosis of syntax errors from that of runtime exceptions raised by the code when it runs, you can compile the Python source to a code object, then keep the code object and run it repeatedly. This is just as true when using the C API as when dynamically executing in Python, as covered in “Dynamic Execution and exec”. Two C API functions you can use for this task are the following:
Py_CompileString |
|
PyEval_EvalCode |
|
1 Mostly v2 by default, but you can run Cython with the -3
switch, or include # cython: language_level=3
at the top of the source, to have Cython support v3 instead—strings being Unicode, print being a function, and so forth.)