CHAPTER 20

Extending and Embedding Perl

Sometimes it is more convenient to reach outside Perl for a feature than implement it as Perl code. A classical example is system libraries that we want to make use of from within Perl. As these libraries are not Perl themselves, we need a way to interact with them. Another common case is algorithms that are faster when implemented as pure C than we can manage within Perl. Modules that make use of external code this way are known as extensions. Whatever the reason for using C, C++, or another language, we need some glue to bind the compiled code into our Perl. The glue is called XSUBs, or XS for short, and Perl provides plenty of support to help us write extensions using it.

In this chapter, we will see how to integrate a Perl interpreter into a C or C++ program and how to use C or C++ code from within Perl. Both subjects spend a lot of their conceptual time at the border between Perl and C, and so readers interested in either subject will find value in both sections. In particular, Perl provides an extensive library of C macros and routines for reading, writing, creating, and manipulating Perl data structures in C, as well as manipulating the stack of the Perl interpreter. Whichever direction we are interested in proceeding, managing Perl data from C is more than likely to become important.

While binding C into Perl using XS is relatively easy, it is easier still with the Inline module. Not only can this versatile module automate most of the work necessary to glue Perl and C together, but also it can manage C++, Java, Python, and a number of other languages with equal aplomb with the appropriate supporting module from CPAN. Even with the help of Inline, some knowledge of XS concepts and Perl's C interface will undoubtedly prove useful, however.

We will also spend some time in this chapter looking at various ways in which we can compile Perl programs into C executables and Perl bytecode, along with some of the drawbacks and caveats inherent to the process.

Using Perl from C or C++

To embed Perl into a C or C++ program involves linking the program code against the library that implements the Perl interpreter (for which the perl executable is merely a frontend). However, the interpreter library was built a certain way, depending on our build-time choices, and will not necessarily work correctly if linked against with code compiled a different way.

Therefore, the first thing we need to do is make sure we are building Perl the way we want it to be used as an embedded interpreter. This means that we don't build a threaded interpreter if we want to use it within a program that must run on a nonthreaded platform, for example. Assuming we have a satisfactory interpreter library ready to go, we can now set up our source code to build in a way that is compatible with it.

Setting Up to Build Against Perl

In order to make use of our interpreter, we must make sure to compile source code that uses it with the same definitions and compiler flags that Perl was built with. So when we include header files from the Perl distribution, they provide the same definitions to us as they did when we built the interpeter library.

Luckily, Perl is fully aware of how it was built and can answer any question about it through the -V option or the Config.pm module. For example, to find out what external libraries Perl was linked with, we can execute the following:

> perl -V:lib

This will generate something like


libs='-lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc';

This indicates that this is a threaded Perl interpreter, among other things. We can similarly extract values for the compiler flags with -V:ccflags, the linker flags with -V:ldflags, the actual name of the interpreter library with -V:libperl, and so on. Using the -V option without a qualifying name will dump out all of the compiler and linker flags, so we can see what is available. As presented, these values work as makefile macros, but we can extract just the value and lose the name=...; by adding colons, -V::libs:, for example.

While we could go about generating build commands with these, there is a better way. Since embedding a Perl interpreter is a common task and always requires these flags, the Extutils::Embed module is available to do all the hard work for us. Here is how we get the compiler options:

> perl -MExtUtils::Embed -e ccopts

The ccopts subroutine extracts all of the values related to compiler options and returns something like (depending on the platform and where we installed Perl) this:


-D_REENTRANT -D_GNU_SOURCE -DTHREADS_HAVE_PIDS -fno-strict-aliasing -pipe
-I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
-I/installed/perl/lib/5.8.5/i686-linux-thread-multi/CORE

To get the linker flags, we substitute ccopts with ldopts:

> perl -MExtUtils::Embed -e ldopts

which returns a result along the lines of


-Wl,-E  -L/usr/local/lib
/installed/perl/lib/5.8.5/i686-linux-thread-multi/auto/DynaLoader/DynaLoader.a
-L/installed/perl/lib/5.8.5/i686-linux-thread-multi/CORE -lperl -lnsl -ldl
-lm -lcrypt -lutil -lpthread -lc

We can also extract the compiler name with

$ perl -V::cc:

The extra colons suppress the leading and trailing characters and leave us with just the value in quotes. We can use this directly or in a makefile. In fact, generating makefile macros automatically is a common requirement, and we can generally do something like the following to achieve it (this particular example is specific to gmake, but most make tools provide a similar feature):

# gmake makefile for embed.c

CC=$(shell perl -V::cc:)
CCFLAGS=$(shell perl -MExtUtils::Embed -e ccopts)
LD=$(shell perl -V::ld:)
LDFLAGS=$(shell perl -MExtUtils::Embed -e ldopts)

all: embed

embed.o: embed.c
        $(CC) $(CCFLAGS) -o $@ -c $?

embed: embed.o
        $(LD) -o $@ $? $(LDFLAGS)

For Windows, Perl ships with a utility called genmake, which generates an nmake-compatible makefile. To use it, we just pass it the names of the source file or files that comprise our program:

> perl genmake embed.c

Either way, we now have a makefile that will build a C program with an embedded Perl interpreter for us. Now we just need to write one.

Creating and Invoking a Perl Interpeter

To create a Perl interpreter, we need to create a Perl interpreter instance, then invoke it to execute the code we want it to run. We need to perform some steps first to ensure that the interpreter is initialized correctly, and once we are done with the interpreter, we need to cleanly dispose of it. Fortunately, this can all be done with a few lines of code, as all the hard work has already been done for us by the EXTERN.h and perl.h header files that come in every Perl installation.

To demonstrate, the following short C program creates and invokes a Perl interpreter to evaluate and print out the current time using the Perl built-in function localtime:

/* embed.c */
#include <EXTERN.h>
#include <perl.h>

PerlInterpreter *my_perl;

int main(int argc, char **argv, char **env)
{
    /* initialize */
    PERL_SYS_INIT3(&argc,&argv,&env);

    /* create the interpreter */
    my_perl = perl_alloc();
    perl_construct(my_perl);
    PL_exit_flags |= PERL_EXIT_DESTRUCT_END;

    /* invoke perl with arguments */
    int perl_argc = 3;
    char *code="print scalar(localtime)."\n"";
    char *perl_argv[] = {argv[0], "-e",code};
    perl_parse(my_perl, NULL, perl_argc, perl_argv, env);
    perl_run(my_perl);

    /* clean up */
    perl_destruct(my_perl);
    perl_free(my_perl);

    /* finish */
    PERL_SYS_TERM();
}

The PERL_SYS_INIT3 and PERL_SYS_TERM macros perform some essential startup and shutdown tasks that are necessary to create and dispose of a Perl interpreter cleanly. We should always use them at the start and end of our program.

We create and destroy an individual interpreter with perl_alloc, perl_construct, perl_destruct, and perl_free. In order to have Perl execute END blocks when the interpreter is destroyed, we bitwise-OR include PL_EXIT_DESTRUCT_END in PL_exit_flags (a special macro defined in the Perl headers). This is important because in a normal Perl interpreter this step is taken care of when the program terminates, a safe assumption for the Perl executable, but almost certainly not true for our embedded interpreter. Note that the use of my_perl as the pointer variable is not arbitrary; unless we redefine it, this is the name that macros like PL_exit_flags expect to work with.

In the middle of the program, we actually invoke the interpreter using perl_parse and perl_run. The perl_parse function carries out the task of setting up the interpreter using command-line options. This makes a very simple and convenient interface to set the interpreter up because it is identical to invoking Perl in normal use. Here we initialize a completely new list of arguments using -e to pass in some arbitrary code for the interpreter to execute for us. We can just as easily use -M to load in (pure Perl) modules or any other options that we desire. The second argument to perl_parse, currently NULL, is used to pass in a function pointer for managing modules that are not pure Perl—we will come back to it later.

Before going further, it is worth taking a quick look at the perlclib manual page. This describes many Perl-supplied C functions that should be used in place of the standard C library equivalents that we may be more familiar with. The reason for using these functions is that they are guaranteed to work consistently across all platforms, and so minimize portability problems that may arise from using native functions. For example, do not use malloc and free, but safemalloc and safefree, when talking to the interpreter. Instead of strcmp, use savepv. And so forth—see perldoc perlclib for details.

Evaluating Expressions

We typically want to extract values from the interpreter after having it evaluate code rather than have Perl print them out. We can achieve this with a collection of other macros that are described at length in the perlguts and perlapi manual pages. There are very many of these macros, but perhaps the most useful are eval_pv, which evaluates a string expression, and get_sv, which retrieves a scalar variable.

Here is an expanded example that shows eval_pv and get_sv in action. To keep our manipulations separate from the interpreter startup and shutdown code, it's been split off into the separate subroutine do_stuff:

/* embedeval.c */
#include <EXTERN.h>
#include <perl.h>

PerlInterpreter *my_perl;

void do_stuff(void);

int main(int argc, char *argv[], char *env[])
{

    char *perl_argv[] = {argv[0], "-e","0"};

    /* initialize */
    PERL_SYS_INIT3(&argc,&argv,&env);

    /* create the interpreter */
    my_perl = perl_alloc();
    perl_construct(my_perl);
    PL_exit_flags |= PERL_EXIT_DESTRUCT_END;

    /* invoke perl with arguments */
    perl_parse(my_perl, NULL, 3, perl_argv, env);
    perl_run(my_perl);

    do_stuff();

    /* clean up */
    perl_destruct(my_perl);
    perl_free(my_perl);

    /* finish */
    PERL_SYS_TERM();
}

void do_stuff(void)
{
    SV *intscalar, *strscalar;
    int intval;
    char *strval;
    AV *thetime,*anotherarray;
    SV *isdst_sv,*hour_sv;
    I32 thehour,theminute;

    /* evaluate an expression */
    eval_pv("($int,$str)=(6,'Number 6')",TRUE);

    /* get the result */
    intscalar=get_sv("int",FALSE); /*get $int*/
    intval=SvIV(intscalar);        /*extract integer slot*/
    strscalar=get_sv("str",FALSE); /*get $str*/
    strval=SvPV(strscalar,PL_na);  /*allocate and extract string slot*/
    printf("The answer is %d (%s) ",intval,strval);
}

In this example, we initialize the Perl interpreter with the arguments -e and 0. This simply makes the interpreter ready to evaluate arbitrary expressions, since without it the interpreter will attempt to read code from standard input—just as typing perl on the command line with no arguments does.

The eval_pv function evaluates arbitrary Perl code for us, and it is the equivalent of Perl's eval function. We can use it as many times as we like, with cumulative effect, so we could also have written the following:

eval_pv("$int=5",TRUE);
eval_pv("$int++",TRUE);
eval_pv("$str='Number '.int)",TRUE);

If the second argument is given as the macro TRUE, the interpreter will die if it encounters a fatal error, just as Perl would in normal (that is, non-evaled) code. Rather than letting the interpreter abort, we can choose to handle the error ourselves in C if the second argument of eval_pv is changed to FALSE. This makes eval_pv behave exactly like Perl's eval, and we can check the value of the special variable $@ via the macro ERRSV:

eval_pv("This is not Perl!",FALSE);
if (SvTRUE(ERRSV)) {
    printf ("Error from eval: %s ", SvPV(ERRSV, PL_na)) ;
}

The Perl eval function returns a value, although we are not obliged to use it. Since eval_pv is eval in C, it also returns a value, which allows us to write

strscalar=eval_pv("'Number' '.int), FALSE);
if (! SvTRUE(ERRSV)) {
    strval = SvPV(strscalar,PL_na);
}

eval_pv is really a wrapper for the more generic eval_sv, which evaluates the string part of a scalar value. In other words, it is equivalent to

eval_sv(newSVpv("$int=5",0), FALSE);

The means by which we get values back from the interpreter varies depending on what kind of data type we are looking for. Here we are dealing with scalars and used the SvIV and SvPV macros to extract C data types from the Perl scalar. We will take a closer look at working with scalars next and then go on to consider arrays, hashes, and complex data structures.

Working with Scalars

The get_sv function extracts a scalar variable from the symbol table of the interpreter. As scalars are composite values, the return type is a pointer to an SV—short for scalar value—the C data type of a Perl scalar. We can only extract package variables this way, so if we had declared $int and $str with my, we would not be able to extract them.

get_sv returns a pointer to an SV, which is the Perl data type for a scalar. From this we can extract any of the value slots such as integer, floating-point number, or string using an appropriate macro. SvIV extracts the integer value as we have already seen, while SvNV extracts the floating-point value. SvPV is a little different, since it returns a pointer to a string. We pass the special value PL_na to tell Perl we don't care how long the string is. We could also use SvPV_nolen instead of SvPV to similar effect:

strval = SvPV(strscalar, PL_na);
strval = SvPV_nolen(strscalar);

The second FALSE argument to get_sv indicates that no space should be allocated for this variable. If we used TRUE, we could create the scalar in the symbol table at the same time. Specifically, it will create a scalar if not already present and return a pointer to it:

newscalar = get_sv("doesnotexistyet", TRUE);

We would more likely want to use newSVpv, newSViv or one of their many variants to give the new variable a value at the same time. Note that unlike the preceding, the following statements create the data structure but do not add it to the symbol table, so the interpreter will not be able to see them yet:

intscalar = newSViv(6);                  /* integer */
fltscalar = newSVnv(3.14159);         /* floating point */
strscalar = newSVpv("Number 6", 0);   /* 0=calculate length */
strscalar = newSVpvf("Number %d", 6); /* printf-style */

There are many more variants on these available in the Perl API. The newSVpvf macro is particularly handy to know about, however, because it makes constructing strings easy using C's sprintf function. The regular newSVpv requires a length argument, but the macro will automatically calculate the string length if we supply a zero as we did here. As we saw earlier, we can use SvTRUE to test a scalar for "truth" in the Perl sense of that word. We can also call macros like SvCUR to retrieve the length of the string.

What about references? From the perspective of C, a reference is simply a scalar value whose value is another SV. The "outer" SV is a wrapper for the SV inside. We can dereference it to get the "inner" scalar with SvRV (which returns NULL if the scalar isn't a reference) and create a reference to a scalar with newSVrv or newRV_inc. The abbreviation for references is therefore RV, but there is no RV data type in C. Instead, references are SVs, just like regular scalars, and test an SV to see if it is an RV or not with macros like SVROK. We will return to them later in this chapter.

Working with Arrays

Handling Perl arrays is done with get_av and a range of functions like av_fetch, av_pop, av_push, av_unshift, and av_delete, among others. We can also create arrays in C using newAV and populate it with av_store or av_make.

Following is an example of how we can use some of these functions. Since the interpreter startup and shutdown code remains the same, we will just swap out the do_stuff subroutine with a new one:

void do_stuff (void) {
    AV *thetime,*anotherary; /*AV is C type for arrays*/
    SV *isdst_sv;        >/*array elements are SVs*/
    I32 thehour,theminute;    /*explicitly 32-bit int'*/

    /* create an array */
    eval_pv("@time=localtime",TRUE);
    thetime = get_av("time",FALSE); /* the array */
    theminute = SvIV(* av_fetch(thetime, 1, FALSE));
    thehour = SvIV(* av_fetch(thetime, 2, FALSE));
    printf("It is %d:%d ",thehour,theminute);
    printf("@time has %d elements ",av_len(thetime)+1);
    isdst_sv=av_pop(thetime); /* pop a value */
    printf("@time has %d elements ",av_len(thetime)+1);
}

Notice that most of the code in this subroutine works with arrays directly from C—only the first statement actually runs any Perl code. It also does not mention the interpreter instance by name, because the macros reference my_perl implicitly—this is why we created it as a global variable in the first place.

The av_fetch function does the job of accessing array elements by index. We can also pop and unshift them and, of course, push and shift new SV* values too. If the last argument to av_fetch is a TRUE rather than FALSE as here, then it also serves to create the array element if the array does not yet hold it (just as it would in Perl).

We can also use av_store to set an SV* into the array. The return value is of type SV**, a pointer to the SV pointer passed as the last argument, or NULL if the store failed for any reason. For example, to replace the SV at index 2 of the array, we would write

if (av_store(thetime,2,new_hour_sv)==NULL) {
      /* failed! */
}

The av_fetch function also returns NULL if the array element does not exist. We took a shortcut here, since we passed TRUE as the last argument to eval_pv so we know the evaluation must have succeeded and @time must exist. In more complex code, we would be better advised to check the return value rather than dereference the return value directly as was done here.

We can also loop through an array, which turns out to be nothing more complicated than finding the array length and then iterating through it with a normal C for loop. The next iteration of do_stuff that follows works with @INC. First we add and remove a new element at a high index, then we loop through the array and print out the paths from it:

void do_stuff(void) {
    AV *inc = get_av("INC",TRUE); /* get @INC array */
    SV **valuep;

    /* store value at index 1000 */
    if (av_store(inc, 1000, newSVpv("newvalue",8))==NULL) {
        exit(99); /* NULL=failed store */
    }

    /* test for and retrieve value at index 1000 */
    if (av_exists(inc, 1000)) {
        valuep = av_fetch(inc, 1000, FALSE);
        printf("1000: %s ", SvPV(*valuep, PL_na));
        printf("length of @INC is %d --- ", av_len(inc));
        av_delete(inc, 1000, FALSE); #flag arg needed but ignored
    }

    /* loop over array values and print them out */
    printf("length of @INC is now %d --- ", av_len(inc));
    for (int index=0; index<=av_len(inc); index++) {
         valuep = av_fetch(inc, index, FALSE);
         if (value != NULL) {
               printf("%d: %s ", index, SvPV(*valuep, PL_na));
         }
    }
}

In this code, we test for a failed store or fetch by testing for a return value of NULL. The return value is of type SV**, so assuming it is not NULL we must dereference the pointer to get a value that we can pass to macros like SvPV. Interestingly, the first printout of the array length will report 1000, but the second, after we delete our added element, is something like 5 (depending on how many paths @INC normally has).

Working with Hashes

Working with hashes and references is very similar to arrays and scalars, so now that we know the basics of handling scalar and array values, we can easily extend them to these data types too.

We can store and fetch hash keys with hv_get and hv_store, test them with hv_exists, delete them with hv_delete, and create a new hash in C with newHV. We also have a collection of hv_iter* routines to manage the iterator with. The following subroutine manipulates the %ENV hash, first by adding and then removing a new key, then by printing out the contents using the iterator:

void do_stuff(void) {
    HV *env = get_hv("ENV",TRUE); /* get %ENV hash */
    SV value, *valuep;  /* key value */
    I32 length;         /* key length */
    char *key;          /* key name */

    /* store, retrieve, and delete  a new key*/
    hv_store(env, "newkey", 6, newSVpv("newvalue",8), TRUE);
    if (hv_exists(env, "newkey", 6)) {
        valuep = hv_fetch(env, "newkey", 6, TRUE);
        printf("newkey => %s --- ", SvPV(*valuep, PL_na));
        hv_delete(env, "newkey", 6, G_DISCARD);
    }

    /* iterate over keys and print them out*/
    hv_iterinit(env);
    while ((value = hv_iternextsv(env, &key, &length))) {
        printf("%-20s => %s ", key, SvPV(value, PL_na));
    }
}

The hv_iternextsv function is actually a convenient combination of several other steps, hv_iternext, hv_iterkey, and hv_iterval. There are many, many other functions and macros that we can use to manipulate and test Perl data from C, but this provides a taste of what is possible. The G_DISCARD flag tells Perl to throw away the value and return NULL; otherwise, we get a pointer to the SV value of the key we deleted (if it was present in the hash).

Working with Complex Data Structures

It is relatively simple to create complex data structures, though we may have to keep an eye on our reference counts as we mentioned when we talked about arrays.

Since an array or hash held as a value of another array or hash is really stored through a reference, nesting arrays and hashes is simply a case of wrapping an AV* or HV* value with a scalar RV. To do that, we use newRV_inc or newRV_noinc and recast the pointer to an SV*, as illustrated in this version of do_stuff, which stores an array in an element of a second array and in the value of a hash key:

void do_stuff(void)
{
    SV *scalar,*reference;
    AV *array,*array2;
    HV *hash;
    SV **valuep;

    /* create an array of 10 elements */
    array = newAV();
    for (int i=0; i<10; i++) {
        scalar = newSVpvf("value %d",i+1);
        av_push(array, scalar);
    }

    /* create reference for array ref count is already
       1 from creation so we do not increment it again */
    reference = newRV_noinc((SV *)array);

    /* add reference to a new array */
    if (array2 = newAV()) {
        av_push(array2, reference);
    }

    /* add reference to hash. As the array also holds the
       reference we increment the ref count if successful */
    if (hash = newHV()) {

    SvREFCNT_inc(reference);
        if (hv_store(hash, "array", 6, reference, 0)==NULL) {
            SvREFCNT_dec(reference);
        }
    }

    /* extract array from hash */
    if (valuep = hv_fetch(hash, "array", 6, FALSE)) {
        SV *svref = *valuep;
        if (SvOK(svref) && SvROK(svref)) {
            SV *sv = SvRV(svref);
            if (SvTYPE(sv) == SVt_PVAV) {
                AV *av = (AV *) sv; /*recast*/
                SV **svp;
                if (svp = av_fetch(av, 2, FALSE)) {
                    printf("Got '%s' from index 2 of
                      array ",SvPV(*svp,PL_na));
                }
            }
        }
    }
}

This example manages the reference counts, allowing for the fact that newly constructed data types have a default reference count of 1 to start with, and takes care not to increment them unnecessarily. When we add the array reference to the hash as well as the second array, we do need to increment the count, however.

It also extracts the array reference from the hash, dereferences it, and prints out the string value of one of the scalars stored in it. We happen to know how we created the data structure, but since we should usually check what we are doing we use SvOK and SvROK to verify that the scalar is valid and is indeed a reference. The SvTYPE macro tells us what kind of thing the reference was pointing to. We compare it to SVt_PVAV to check for an array—to check for a hash we would instead use Svt_PVHV. Once we know we have an array, we can recast back to an AV* and access it as normal.

Using C Using Perl Using C

If we want to use Perl modules that themselves make use of underlying C code (also called extensions), we have to go back and adjust the workings of our interpreter a little. In order to know how to load in the C library part of a Perl extension, the interpreter needs some additional help, which we provide by supplying a function pointer as the second argument to perl_parse. The function performs the job of bootstrapping the external C part of any extensions we want to use. Typically, we use the Dynaloader to dynamically load any modules on demand, so this is the only module we need to handle for most cases.

Following is a new version of the embedded interpreter, incorporating code using the Scalar::Util and Socket modules. We use the dualvar function to create a scalar variable with divergent integer and string values within Perl and getservbyname to look up a service. Of course, both of these tasks could be done from C directly, and in the second case without involving Perl at all—but they serve our purposes for this simple example.

Note the EXTERN_C declarations—these are the new code we added—the macro is present so it can be defined to nothing for C and extern C for C++. The xs_init function defined here is passed to perl_parse to carry out the required initialization. The main routine is unchanged apart from the addition of xs_init as the second argument:

/* embedeval.c */
#include <EXTERN.h>
#include <perl.h>

PerlInterpreter *my_perl;

void do_stuff(void);

EXTERN_C void boot_DynaLoader (pTHX_ CV* cv);
EXTERN_C void xs_init(pTHX)
{
    char *file = __FILE__;
    dXSUB_SYS;

    /* DynaLoader is a special case */
    newXS("DynaLoader::boot_DynaLoader", boot_DynaLoader, file);
}

int main(int argc, char **argv, char **env)
{
    /* initialize */
    PERL_SYS_INIT3(&argc,&argv,&env);

    /* create the interpreter */
    my_perl = perl_alloc();
    perl_construct(my_perl);
    PL_exit_flags |= PERL_EXIT_DESTRUCT_END;

    /* invoke perl with arguments */
    char *perl_argv[] = {argv[0], "-e","0"};
    perl_parse(my_perl, xs_init, 3, perl_argv, env);
    perl_run(my_perl);

    do_stuff();

    /* clean up */
    perl_destruct(my_perl);
    perl_free(my_perl);

    /* finish */
    PERL_SYS_TERM();
}

void do_stuff(void)
{
    SV *scalar;
    int intval;
    char *strval;

    /* load a module with eval_pv */
    eval_pv("use Scalar::Util 'dualvar'",TRUE);
    eval_pv("use Socket",TRUE);
    /* evaluate an expression */
    eval_pv("$value=dualvar(6,'Number 6')",TRUE);
    eval_pv("$ssh_service=getservbyname('ssh','tcp')",TRUE);
    /* get the result */
    scalar=get_sv("value",FALSE); /*get $value*/

    intval=SvIV(scalar);          /*extract integer slot*/
    strval=SvPV(scalar,PL_na);    /*allocate and extract string slot*/
    printf("The answer is %d (%s) ",intval,strval);
    printf("SSH port is %d ",SvIV(get_sv("ssh_service",FALSE)));
}

The ExtUtils::Embed module allows us to generate the necessary extra glue for extensions using the -e xsinit option. By default it writes the glue code to a file called perlxsi.c, which we can choose to maintain separately from our other source if we wish or to embed it into the interpreter code as we did previously. This is the simplest way we can use it:

> perl -MExtUtils::Embed -e xsinit

Without a specific list of modules, this just generates code to initialize Dynaloader plus any statically linked extensions (which can be none at all, as in the preceding example). If we have specific dynamically loaded modules in mind, we can instead generate code to handle them directly and skip Dynaloader. For example, to bind the Socket module and an extension of our own called My::Extension, we could use this:

> perl -MExtUtils::Embed -e xsinit -o init_xs.c Socket My::Extension

Here we also used the -o option to change the output file name. This command generates a file called init_xs.c containing

#include <EXTERN.h>
#include <perl.h>

EXTERN_C void xs_init (pTHX);

EXTERN_C void boot_Socket (pTHX_ CV* cv);
EXTERN_C void boot_My__Extension (pTHX_ CV* cv);

EXTERN_C void
xs_init(pTHX)
{
        char *file = __FILE__;
        dXSUB_SYS;

        newXS("Socket::bootstrap", boot_Socket, file);
        newXS("My::Extension::bootstrap", boot_My__Extension, file);
}

While this is convenient, bear in mind that the code is generated without regard to whether the named modules have a dynamic library component to them or indeed actually exist.

Calling Subroutines

We can call Perl subroutines from C too. To do so, we need to manage the interpreter's stack, since C does not understand Perl's calling conventions or concepts like void, scalar, or list context. As usual, macros are available to help:

void do_stuff(void)
{
    dSP;                          /* declare local stack pointer (SP) */
    int count;                    /* number of return arguments */

    /* Perl sub to split a string and return first N parts, reversed */
    eval_pv("sub rsplit ($$;$) { return reverse split $_[0],$_[1],$_[2] }",TRUE);

    ENTER;                        /* prepare to call sub */
    SAVETMPS;                     /* note existing temporaries */

    PUSHMARK(SP);                 /* note where stack pointer started */
    XPUSHs(sv_2mortal(newSVpv(",",1)));
    XPUSHs(sv_2mortal(newSVpv("one,two,three",13)));
    XPUSHs(sv_2mortal(newSViv(2)));
    PUTBACK;                      /* update global stack pointer */

    count = call_pv("rsplit", G_ARRAY); /* call, get number return values */
    printf("Got %d results ",count);

    SPAGAIN;                      /* query the stack */
    for (int i=0; i<count; i++) {
        char *part = POPp;
        printf("result %d is '%s' ",i,part); /* pop (string) results */
        safefree(part);               /* free allocated memory */
    }
    PUTBACK;                      /* clean up global stack */

    FREETMPS;                     /* free any temporaries created by us */
    LEAVE;                        /* finish up */
}

Quite a lot is going on here, but a lot of it is boilerplate code—we write it and forget it. The dSP, ENTER, and SAVETMPS macros carry out essential setup work, creating a local stack pointer and noting existing temporary (that is, mortal) values. The PUSHMARK(SP) macro starts a counter to keep track of how many arguments we push onto the stack. The PUTBACK macro uses that count to update the real stack pointer from our locally modified copy.

Between PUSHMARK(SP) and the first PUTBACK we supply our arguments. There are many macros available to do this, depending on what we are trying to do. XPUSHs simply puts a scalar value onto the stack. The X means "extend the stack" and differentiates this macro from PUSHs, which assumes the stack has already been extended (we could alternatively choose to extend the stack several places then fill in the blanks, so to speak). We push three mortal SVs using the newSVpv and newSViv macros to generate them from C data types.

That takes care of the buildup—now we need to look at the call to the subroutine and the handling of the return values. Calling a named subroutine is done with call_pv. It takes the subroutine name as its first argument and a set of flags as its second. In this case, we need to tell the interpreter that we want to call the subroutine in array context, so we set G_ARRAY. We can also use G_SCALAR or G_VOID—the last of which is important to stop Perl generating values we don't need, which wastes time. The return value from this call is the number of return values waiting for us on the stack.

There are many ways to read values back too. Here we use one of the simpler techniques. First we use SPAGAIN to move the stack pointer to the right place to get to the return values. POPp is one of several macros that extracts a scalar and converts it to a C data type, in this case a pointer value (that is, a string). We could also use POPi for an integer, POPn for a floating-point number, POPs to get the SV itself, or POPl for a long integer. Notice the use of safefree here: if we return new string values, we should also free the memory the interpreter used to create them. The safefree function is Perl's version of free, which we should always use in preference to the native function—see perldoc perlclib for more on this and other preferred Perl versions of standard functions.

We now need to clean up. First we use PUTBACK a second time to reset the global stack pointer from our local copy. FREETMPS cleans up the temporary values that we created and marked as mortal (and which we can tell from others that may exist because we used SAVETMPS to note where ours started). Finally, we use LEAVE to restore the stack pointer to its original condition—the idea being that our call should leave the stack in the same state it was in beforehand.

There are many permutations on this basic theme, far too many to elaborate on in detail. One useful alternative is call_sv, which allows us to call a scalar containing a code reference rather than a named subroutine. We only need to change two lines of code to do this. First:

eval_pv("sub rsplit ($$;$) { return reverse split $_[0],$_[1],$_[2] }",TRUE);

becomes this:

SV *cref=eval_pv("sub ($$;$) { return reverse split $_[0],$_[1],$_[2] }",TRUE);

And secondly:

count = call_pv("rsplit", G_ARRAY);

becomes this:

count = call_sv(cref, G_ARRAY);

This allows us to call anonymous subroutines, which if we create them from C do not even get entered into the interpreter's symbol table. For more advanced examples, see the perlcall manual page and perlapi for the complete list of available macros.

Using C or C++ from Perl

Perl provides extensive support for building modules to interface to an underlying C or C++ library. Such modules are known as extensions, and while this term can also apply to pure Perl modules, in practice it is used almost exclusively to describe a Perl module that depends on an underlying library.

The glue that binds Perl to C or C++ is the XSUB, which stands for extension subroutine. XSUBs are written in a special language called XS, in files with (usually) an .xs extension, and are compiled into C code with the XSUB preprocessor xsubpp that comes standard with Perl. XS is made up of a few declarative section headers and a lot of macros that are actually defined by Perl's header files. XSUBs can be simple declarations that map to an existing C or C++ subroutine or contain compiled code that implements functionality directly. This C or C++ code has access to the Perl interpreter's data, so we can use all the techniques we saw in the previous section to extract Perl data or write it back from C within the XSUB code, including the macros listed in perlapi and the Perl equivalents of C library functions detailed in perlclib.

While XSUBs are the fundamental link between Perl and C, we do not always need to write them ourselves. The h2xs utility, in collaboration with xsubpp and the C::Scan module, can often extract function signatures from C headers and generate XSUB definitions to map them into Perl without us needing to write a line of code ourselves. More interestingly, the Inline module provides an interface to inline foreign language code directly into Perl source files. It natively supports C and is essentially a wrapper for the XSUB autogeneration approach just mentioned, but it also has supporting modules for Java, Tcl, Python, and a host of other languages available from CPAN. We will return to Inline in "Inlining C and Other Languages into Perl" at the end of this section.

Creating Extensions with h2xs

The primary tool for setting up to build Perl extensions is h2xs. This script, which we first saw in Chapter 10, is designed to automatically create the correct directory structure and supporting files to create a Perl module. Previously we used the -X option to suppress the XS support that is the primary function of h2xs and used it to create the basic directory structure for a distributable module:

> h2xs -X -n My::PurePerl::Module

To make h2xs generate the additional support for XSUBs, we drop the -X. If we give the tool some actual C headers and source files to work with, it will automatically generate files based on what it finds. But we can also build our own XSUBs from scratch, which is often required in any case since what h2xs deduces from C source is not always all that close to what we actually require.

For example, say we want to create an extension called Heavy::Fraction, which does not bind against any existing C code. We can set up the module structure with

> h2xs -b 5.6.0 -n Heavy::Fraction

The -n option provides the module name, which is necessary if we are not converting an existing header from which a name can be derived. The -b option tells h2xs what the earliest version of Perl we will be supporting is. Here we chose 5.6.0. This should create all the appropriate files and generate the following output:


Writing Heavy-Fraction/ppport.h
Writing Heavy-Fraction/lib/Heavy/Fraction.pm
Writing Heavy-Fraction/Fraction.xs
Writing Heavy-Fraction/fallback/const-c.inc
Writing Heavy-Fraction/fallback/const-xs.inc
Writing Heavy-Fraction/Makefile.PL
Writing Heavy-Fraction/README
Writing Heavy-Fraction/t/Heavy-Fraction.t
Writing Heavy-Fraction/Changes
Writing Heavy-Fraction/MANIFEST

The only files we need to immediately concern ourselves with here are Fraction.pm, where we can place any additional Perl code our module will be providing, Fraction.xs, the XS file where our C code and XSUB definitions and declarations will go, and Makefile.PL. This is the file from which we generate our makefile:

> perl Makefile.PL

The makefile generated by this command contains targets for the generated C code that use xsubpp to regenerate them from the XS file. We can then build the extension with

> make
> make test

If we have sufficient privileges or are installing to a location under our control (as described in Chapter 10), we can also use

> make install

The contents of Makefile.PL look like this:

use 5.006;
use ExtUtils::MakeMaker;
# See lib/ExtUtils/MakeMaker.pm for details of how to influence
# the contents of the Makefile that is written.
WriteMakefile(
    NAME             => 'Heavy::Fraction',
    VERSION_FROM     => 'lib/Heavy/Fraction.pm', # finds $VERSION
    PREREQ_PM        => {}, # e.g., Module::Name => 1.1
    ($] >= 5.005 ?    ## Add these new keywords supported since 5.005
      (ABSTRACT_FROM  => 'lib/Heavy/Fraction.pm', # retrieve abstract from module
       AUTHOR        => 'Peter Wainwright <[email protected]>') : ()),
    LIBS             => [''], # e.g., '-lm'
    DEFINE           => '', # e.g., '-DHAVE_SOMETHING'
    INC              => '-I.', # e.g., '-I. -I/usr/include/other'
    # Un-comment this if you add C files to link with later:

    # OBJECT             => '$(O_FILES)', # link all the C files too
);
if (eval {require ExtUtils::Constant; 1}) {
  # If you edit these definitions to change the constants used by this module,
  # you will need to use the generated const-c.inc and const-xs.inc
  # files to replace their "fallback" counterparts before distributing your
  # changes.
  my @names = (qw());
  ExtUtils::Constant::WriteConstants(
      NAME             => 'Heavy::Fraction',
      NAMES         => @names,
      DEFAULT_TYPE  => 'IV',
      C_FILE        => 'const-c.inc',
      XS_FILE          => 'const-xs.inc',
  );
} else {
    use File::Copy;
    use File::Spec;
    foreach my $file ('const-c.inc', 'const-xs.inc') {
        my $fallback = File::Spec->catfile('fallback', $file);
        copy ($fallback, $file) or die "Can't copy $fallback to $file: $!";
    }
}

We can enable or disable the generation of various parts of this file and the accompanying files in the module distribution with flags like -A (disable autoloader), -C (disable Changes file), and --skip-ppport (do not generate ppport.h or use Devel::PPPort).

For C++, we need to override the default compiler and linker with the C++ compiler and tell xsubpp to generate C++-compliant code. This is easy enough—the following additions to the argument list of WriteMakefile will do the trick (assuming our compiler is g++, of course):

WriteMakefile(
    ...
    'CC'              => "g++",     # define 'CC' macro
    'LD'              => "$(CC)",  # derive 'LD' from 'CC'
    'XSOPT'           => "-C++",    # define 'XSOPT' macro
);

Note Other xsubpp options that may prove useful for C++ are -heirtype, which recognizes and translates nested C++ namespaces (that is, types with :: in their name), and -except, which adds exception handling stubs to the generated code.


It is valid to insert makefile macros into these definitions, for example, we base LD on CC so if we change CC we also change LD. If our make tool defines a standard macro for the C++ compiler like CPP, we would also write

'CC'          => "$(CPP)", # derive 'CC' from 'CPP'

Additional typemap files (should we have any) can be specified with a TYPEMAPS argument and an associated array reference of typemap file names. We will come back to these later.

If we have extra C sources we want built along with the code generated from the XS file, we can add them by uncommenting and editing the OBJECT line to contain the object file to which the source file is compiled. The default make rules will do the rest for us.

Converting C Constants to Perl Constants

The h2xs script is capable of parsing #define directives into Perl constants, but without help, not much more. If we take a header file defines.h with lines like these:

#define ONE 1
#define TWO 2
#define THREE 3

and run a command like this:

> h2xs -n My::Defines defines.h otherheader.h

we will get a simple module that, once compiled, defines three Perl constants, ONE, TWO, and THREE. The defines.xs file inside the directory My-Defines contains the directive

#include <defines>

Obviously, this header is required for the generated C code to compile, so we copy it into the My-Defines directory. Glancing at Makefile.PL, we see that the @names array under the ExtUtils::Constant section now reads

my @names = (qw(ONE THREE TWO));

This tells the extension which #define constants we want to map into Perl constants. We can now use these constants in our code:

use My::Defines;
print "2 = ",TWO," ";

In fact, if we look in the test script My-Defines/t/My-Defines.t, we will find that a test case has already been set up to test the constants are defined correctly.

Converting Function Prototypes to XSUBs

If we install the C::Scan module and the Data::Flow module on which it depends, we can upgrade h2xs into a tool that can not only convert #define directives, but also scan the source for enums and function prototypes too. To enable this functionality once the extra modules are installed, we just need to add the -x argument:

> h2xs -n My::Functions functions.h

Given a functions.h with these contents:

void subone(int, int);int subtwo(char *input);

we end up with a file My-Functions/Functions.xs containing XSUB declarations suitable for mapping onto the implementations of subone and subtwo, presumably in functions.c or another implementation file:

void
subone(arg0, arg1)
        int     arg0
        int     arg1

int
subtwo(input)
        char *  input

Depending on how complete the function prototype is, the more information the XSUB will contain. So while a function prototype does not need to name its arguments, the XSUB will use the names if it finds them. Otherwise, it will use arg0, arg1, and so on.

In order to use these functions, we can just compile the module. We need to add the separate implementation file to Makefile.PL, by uncommenting and completing the OBJECTS argument:

OBJECT        => 'functions.o', # link all the C files too

A default rule will be used to build this object file from functions.c. If we need to customize the build options, we can do so either by adding the appropriate definitions to Makefile.PL or supplying them in arguments to hx2s. We can add any compiler and linker options, include paths, libraries, and library search paths, we need this way. To add extra compiler flags, use -F or --cpp-flags. To add extra libraries, use -l. For example:

> h2xs -n My::Functions -F "-I/other/include -std=c99"
-L/other/lib -lm -lnsl -lsocket source/*.h

We can now compile the module and use the C functions from our Perl code like normal Perl subroutines, for example:

use My::Functions;
my $my_number = subtwo("my_string");

To find out what these definitions mean, we need to learn about how to define an XSUB.

Defining XSUBs

Rather than pregenerating an XS file, we can create our own XSUB definition file by following a few simple rules. An XS file contains four main sections:

  • Mandatory include directives for Perl headers
  • Optional C or C++ code, including more include directives if required
  • An XS MODULE and PACKAGE declaration to mark the start of XS code
  • XSUB declarations

Let's examine each of these in turn.

Mandatory Headers

The mandatory headers should look familiar if we have already looked at calling Perl from C. They are the same two headers we have seen before, accompanied by a third header, XSUB.h, that is required when writing extensions:

#include <EXTERN.h>
#include <perl.h>
#include <XSUB.h>

These three lines should be at the start of every XS file. For C++, since Perl's headers are C and not C++, we modify this slightly to

#ifdef __cplusplus
extern "C" {
#endif
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
#ifdef __cplusplus
}
#endif

If we added a definition for XSOPT of -C++ in our Makefile.PL, this will be generated for us automatically.

Optional C or C++ Code

Following this we can add any conventional C (or C++, if applicable) code that we like. For binding to a precompiled library, we would include appropriate headers to import the function prototypes so that we can check our XSUB declarations against them. For a simple example, it is simplest to embed the code, so here is a subroutine to calculate the highest number of times one number will fit into another:

int heavyfraction (int num1, int num2) {
    int result;

    if (num1 > num2) {
        result = num1 / num2;
    } else {
        result = num2 / num1;
    }

    return result;
}

Start of XS Code

The end of the C section and the start of the XSUB section is marked by a MODULE and PACKAGE declaration. This tells xsubpp in what module distribution the resulting extension is packaged, and in what Perl namespace it belongs in, as determined by a Perl package declaration. Typically, the module corresponds to the root namespace of the packages defined, and in many cases where there is only one package they are simply the same. For example:

MODULE = Heavy::Fraction    PACKAGE = Heavy::Fraction

We can create multiple .pm files each with its own package declaration by redeclaring the package name with different values, but each time we must include the module first. Since this usually never changes, we often see things like this:

MODULE = Heavy::Fraction  PACKAGE = Heavy::Fraction

... XSUBs ...

MODULE = Heavy::Fraction  PACKAGE = Heavy::Fraction::Heavy

... XSUBS ...

MODULE = Heavy::Fraction  PACKAGE = Heavy::Fraction::Utility

... XSUBS ...

XSUB Definitions

Once we have at least one MODULE and PACKAGE declaration, we can actually write an XSUB. The simplest XSUB declarations simply provide the definitions to bind the C function to a Perl subroutine. To bind heavyfraction, we would write

int
heavyfraction(num1, num2)
    int num1
    int num2

The XS file format is quite precise about this layout—we must put the return type on a separate line and specify the argument list without types, which instead appear indented after it. Other than that, there are no semicolon terminators; this is essentially a K&R-style C function prototype. We can put all of the preceding together and build the Heavy::Fraction module. As a finishing touch, we can add these lines to the test script t/Heavy-Fraction.t and upgrade the plan to two tests:

my $result=Heavy::Fraction::heavyfraction(10,2);
ok($result==5);

If all goes well, we should be able to build the extension and run the test driver successfully. The curious might like to look at the generated Fraction.c file to see what xsubpp actually did with the Fraction.xs file.

Although int is a C data type and not a Perl integer, xsubpp automatically knows how to convert all the basic C data types to and from Perl scalars, so we do not need to add any logic to do it ourselves. Any standard C data type will work here—double, float, short, long, char *, and so on are all mapped transparently. We can also write C functions that take and return Perl types like SV *.

For more complex types, we need to add some extra glue in a typemap file, which we cover in "Mapping Basic C Types into Perl" later in the chapter. For now, we can rely on xsubpp to know how to handle our C types for us.

Ignoring the C Function Return Value

Since XSUBs map to a C function prototype, the return type is automatically mapped and converted to a Perl scalar in the generated code. Although it is relatively rare, we might occasionally want to disregard the return value of a C function that we are binding to (presumably not under our control or we could just rewrite it) on the grounds that it serves no useful purpose and therefore wastes Perl's time reading it back.

Since we cannot simply declare the subroutine void, we instead use NO_OUTPUT to tell xsubpp not to build the code to manage the return value:

NO_OUTPUT int
returns_a_useless_value()

This example XSUB also takes no arguments, so it has no list of argument declarations either. At a bare minimum, therefore, a C subroutine that takes no arguments and returns no values can be bound to in just two lines of XS code. Of course, most useful subroutines will both take arguments and return values. We can manage fixed and variable numbers of both input arguments and output values, but we can only do so much with a simple XSUB declaration—for more advanced uses we will need to add some code.

Adding Code to XSUBs

An XSUB declaration allows us to map a Perl subroutine call onto a preexisting C subroutine of the same name. However, sometimes we do not want to make a direct mapping, either because we need to wrap the C routine in some additional logic before providing it to Perl or because we want to improve on C's calling semantics. C, after all, can only return a single value, while Perl is happy to handle a list.

We can also forego the C subroutine altogether and embed the C code directly into the XSUB definition. The following example embeds some simple C code that returns the number of times the lower of the two numbers passed will fit into the larger:

int
heavyfraction(num1, num2)
    int num1
    int num2
CODE:
    if (num1 > num2) {

        RETVAL = num1 / num2;
    } else {
        RETVAL = num2 / num1;
    }
OUTPUT:
    RETVAL

The code in the CODE section is all standard C, apart from the special variable RETVAL. This macro is automatically defined to be a variable of the same type as the functions return type—in this case, int—and it is automatically placed into a scalar and pushed onto Perl's stack after our code completes so that Perl can receive the computed value when no CODE or PPCODE is present. The OUTPUT block here declares what the XSUB returns to Perl. Here it is technically not needed, since RETVAL is automatically returned for any XSUB with a nonvoid return type. It is good form to include it though, because it helps to underline that our subroutine is returning a value.

In this example, we can make use of just the argument variables and RETVAL to perform the calculation. If we need to create any intermediate values, which is highly likely, we will need to declare them. We can simply declare a local variable at the top of the CODE block, but we are better advised to use PREINIT, so that the generated code places them at the start of the generated function correctly. The next example replaces RETVAL with an explicit and more meaningfully named variable, to illustrate both use of PREINIT and returning something other than RETVAL:

int
difference(num1, num2)
    int num1
    int num2
PREINIT:
    int delta;
CODE:
    delta =  abs(num1 - num2);
OUTPUT:
    delta

We can place simple expressions into the OUTPUT section too, which allows us to create this alternative implementation with no CODE section at all:

int
difference(num1, num2)
    int num1
    int num2
OUTPUT:
    abs(num1-num2);

Returning Multiple Values

CODE and OUTPUT work well when we only have one return value to pass back. The generated C code will manage the job of postprocessing the return value into a suitable form to be placed on the Perl interpreter stack, based on the declared return value of the XSUB. If we have more than one value to pass back though, we cannot use the C function prototype. Instead, we replace CODE with PPCODE, delete the OUTPUT section, and manage the stack ourselves:

int
countdown_list(message, from=10, to=0)
    int from
    int to
PREINIT:
    int delta, i;
PPCODE:

    if (to < from) {
        delta = from - to + 1;
        EXTEND(SP, delta);
        for (i=to; i>=from; i--) {
            XST_mIV(i, 0);
        }
        XSRETURN(delta);
    } else {
        XSRETURN_UNDEF;
    }

Since we want to return several values to the Perl stack, we need to make room for them, which the EXTEND macro does for us. To actually place the return values on the stack, we use XST_mIV, which wraps up an integer value in a scalar SV, marks it as mortal, puts it on the stack, and increments the stack pointer all in one operation. Finally, we use XSRETURN to tell Perl how many values are coming back to it.

In this example, we also use XSRETURN_UNDEF to return undef if the input parameters are not to our liking. We can return any single value this way from either a PPCODE or CODE section. It has the effect of immediately returning just like an explicit return keyword. We could modify the preceding example to handle the from and to values being equal this way:

PPCODE:
    if (to == from) XSRETURN_IV(to);
    if (to < from) {
        ...

Handling Output Arguments

Sometimes a function argument is used for output rather than input. For example, an int * is passed to provide an integer in which the length of a buffer is returned. There is no need for Perl to follow this idiom and pass a variable to be written to since we can return values as SVs or return multiple arguments instead.

We can change the default mapping to tell xsubpp not to treat such arguments as input arguments with the NO_INIT keyword.

For example, take a C function with a void return type and two arguments: an integer for input and a pointer to an integer for output:

void convert_int(int input, int *output);

We can turn this into a conventional subroutine that returns the integer rather than requiring Perl to pass in a second scalar variable parameter with

void
convert_int(input, output)
    int input
    int &output = NO_INIT
OUTPUT
    output

Now we can call the function in Perl with

my $output=convert_int($input);

Detecting Calling Context

In a Perl subroutine, we can use wantarray to detect the calling context. When we call Perl from C, we can communicate this context through call_pv or call_sv by setting one of the G_SCALAR, G_ARRAY, or G_VOID flags. To detect the calling context in a call from C to Perl, we use the GIMME_V macro, which returns the appropriate G_ flag.

In Perl, we frequently use wantarray to return an array reference instead of a list in scalar context rather than have the list counted, as would otherwise be the case:

return wantarray? @result : @result;

We can adapt our countdown_list XSUB to make the same determination like this:

int
countdown_list(message, from=10, to=0)
    int from
    int to
PREINIT:
    int delta, i;
PPCODE:
    if (GIMME_V == G_VOID) XSRETURN_UNDEF;

    if (to < from) {
        delta = from - to + 1;
        if (GIMME_V == G_ARRAY) {
            EXTEND(SP, delta);
            for (i=to; i>=from; i--) {
                XST_mIV(i, 0);
            }
            XSRETURN(delta);
        } else { /* G_SCALAR */
            AV* array=newAV();
            for (i=to; i>=from; i--) {
                av_push(array, newSViv(i));
            }
            XPUSHs(sv_2mortal(newRV_noinc((SV*)array)));
            XSRETURN(1);
        }
    } else {
        XSRETURN_UNDEF;
    }

In void context, we simply return undef without bothering to carry out any computation at all. In array context, we build up the stack as before. In scalar context, we create a Perl array in C, then add scalar values to it. Once we have finished constructing the array, we create a reference to it and push the reference onto the stack. As we only have one value, we use XPUSHs to do the job, but we could also have used EXTEND(1) followed by PUSHs. There are a bewildering number of macros available to manipulate the stack, so these are only some of the ways we can achieve our ends.

Correctly counting references is important when returning Perl data types back to the interpreter. A new data type like the array created in this example automatically has a reference count of 1, even though it is not (yet) referred to by anything the interpreter knows about. When we create the reference for it, we have the choice of incrementing the array's reference count by one with newRV_inc, or leaving it alone with newRV_noinc. It would be incorrect to increment the count to 2, since only the reference knows about the array, so we use newRV_noinc in this case. As a technical note, the scalar reference also has a reference count of 1 on creation, so we do not need to increment its count either.

Assigning Default Input Values

We can define default values for some of the input arguments so that they do not have to be supplied in Perl. This turns out to be very simple. For example:

int
count_down(message, from=10, to=0)
    int from
    int to
PREINIT:
    int i;
CODE:
    if (to < from) {
        for (i=to; i>=from; i--) {
            printf("%d... ",i);
        }
        printf("Liftoff! ");
        RETVAL = 0; /*launched ok*/
    } else {
        RETVAL = 1; /*abort can't count*/
    }
OUTPUT:
    RETVAL

This gives us the ability to call an XSUB with a variable number of arguments, so long as we supply at least the minimum and no more than the maximum. The only proviso with default input values is that we must place the defaulted arguments after any mandatory ones.

Defining Variable Arguments and Prototypes

Perl subroutines support accepting unlimited numbers of arguments. C is normally quite a different story, but fortunately we can make use of the varargs feature of C to create an XSUB that takes variables arguments just like Perl. In C, we declare a subroutine to use variable arguments with an ellipsis (...), and the same convention is understood by XS too. Here is a simple example that calculates the average value of an arbitrary list of supplied integers:

int
average(...)
PREINIT:
    int argno;
CODE:
    RETVAL = 0;
    for (argno=0; argno<items; argno++) {
        RETVAL += SvIV(ST(argno));
    }
    RETVAL = RETVAL / items;
}
OUTPUT:
    RETVAL

The trick to this code is the items variable. When xsubpp sees the ellipsis in the declaration, it automatically sets up items with the number of arguments that we passed from Perl. These arguments are on the Perl stack as we enter the function, so we use the ST macro to retrieve them, passing it the position of the argument we are interested in. Here we simply loop through them.

Because the types of the arguments are not known in advance, the generated C code cannot automatically map Perl types to C ones as it can for fixed arguments. The arguments on the stack are therefore scalar values of type SV* (even a list is passed as scalars, and an array reference is a scalar anyway). To perform a numeric calculation we are interested in, we need to extract the integer value with SvIV.

If we want to require some fixed arguments as well as a variable list, we can do so by putting them first on the line, just like a C varargs declaration:

int
fixed_and_varargs(nargs, returnval, ...)
    int nargs
    char &returnval = NO_INIT
...

The fixed arguments are handled separately, so we can continue to use items and ST to access the variable arguments as before. Alternatively, if we just have a fixed number of optional parameters, we can use a default value of NO_INIT for the optional parameters:

int
fixed_and_fixedoptionalargs(mandatory, optional = NO_INIT)
    int mandatory
    int optional

The value of items will be 1 if only the mandatory argument was passed and 2 if the optional one was passed too.

Regular Perl subroutines can have optional prototypes, and so can XSUBs. In fact, prototypes are automatically provided for XSUBs unless we choose to disable them with

PROTOTYPES: DISABLED

Any XSUB below this line will not get a prototype. We can use ENABLED to reenable prototypes later on in the same XS file. We can also give a specific prototype to a particular XSUB with a one-line PROTOTYPE: section, with the prototype immediately following. For example:

PROTOTYPE: $;$$$

This says pass one scalar, then one to three optional scalar arguments. This is different from the default prototype for an ellipsis, which does not place a limit on the number of passed arguments. Within the XSUB, we can find out how many arguments we actually got with items, as before. We could also generate the same prototype by declaring three trailing arguments with default values.

If we really want to map a return value through supplied arguments (rather than via RETVAL or through stack manipulation), we can do that too. Since we have to have a variable passed in to be able to assign to it, we can define the prototype appropriately, for example:

PROTOTYPE: $$;$$$

We can require that a passed argument be an array or hash reference, just as with a regular prototype. In this case, we need to either define the XSUB with a corresponding argument of type SV * and dereference it in C or provide a typemap to convert from the array reference to something else. We give an example of that later on in the chapter. Intriguingly, we can even pass callbacks this way by prototyping a code reference and then having the XSUB call it with eval_sv.

Using Perl Types as Arguments and Return Values

While xsubpp understands a wide range of C types automatically, it's also possible to use Perl's own C data types. For instance, we can define an XSUB that takes and returns an SV *. The advantage of defining an XSUB that uses Perl types is that we can take advantage of Perl concepts such as undef that do not translate well into C.

For instance, we can create a routine that normally returns an integer, but in the case of an error returns undef. Returning an SV * rather than an int makes this easy. For completeness, this very simple example also takes an SV * as input, though an int would have been equally convenient:

SV *
positive_or_undef(in)
    SV *in
CODE:
    if (SvIV(in) < 0) {
          RETVAL = &PL_sv_undef;
    } else {
          RETVAL = in;
    }
OUTPUT:
    RETVAL

Here PL_sv_undef is a predefined SV that contains an undef value, as seen from Perl. We assign its address to RETVAL, an SV * to return undef to Perl.

There is nothing to stop us receiving or returning array, hash, or code references, so long as we write the XSUB to handle the Perl data structures from C. But if we want to do many conversions of the same types, then we might be better off implementing a typemap to handle the conversion. To see how to do that, it is useful to first look at how xsubpp handles the simpler C types we have been using so far.

Mapping Basic C Types into Perl

The xsubpp script is able to convert between C data types such as int, double, or char * and Perl scalars not through some innate magical knowledge, but because it makes use of a list of standard conversions held in a file called typemap in the ExtUtils directory under the standard library. This file contains all the definitions necessary to convert C types to and from Perl.

We can add our own supplementary typemap file by adding a TYPEMAP argument to the WriteMakefile call in our Makefile.PL:

WriteMakefile(
    NAME                 => 'Heavy::Fraction',
    ...
    TYPEMAP              => ['heavytypemap','lighttypemap']
);

The value of the TYPEMAP argument is an array reference of typemap files. Without path information, these files are looked for in the root directory of the distribution, that is, in the top Heavy-Fraction directory we created with h2xs for our example module earlier in the chapter. Note that it is perfectly acceptable to create a new typemap file for each type we want to handle, containing just the definitions to handle that one type.

Each typemapping consists of an equivalency statement, with the C type on the left and the Perl type on the right, an INPUT definition to convert the Perl type to the C type, and an OUTPUT definition to go in the other direction. These three sections are traditionally gathered together into separate parts of the typemap file, with the equivalencies at the top, the input conversions in the middle, and the output conversions at the bottom.

In order to see how types are mapped, we will take a look at some examples from the default typemap file.

Stating Type Equivalency

The statements of equivalency are all grouped together at the top, and if we look here we see, among other things, the following:

int                     T_IV
char *                  T_PV

This says that a C int is converted to and from a T_IV. A short and a long are also converted to this type. A char * is converted to a T_PV. Each of these Perl types is a conversion target whose meaning is defined by the input and output conversions that come afterward. Perl defines several types that we can use for the right-hand side of these statements, so if we have a C type we can convert to an already defined Perl type, we can just add an equivalency line. For example, a percentage type might allow an integer from 0 to 100, so we convert it to an unsigned integer with

percentage                 T_UV

With this line added to an included typemap, we can use percentage as an input parameter or a return type and xsubpp will be able to handle it transparently, just as it already does for integers. We do not need to add anything else, as the default typemap already provides input and output conversions for T_UV.

If none of the standard definitions will do, we will have to provide our own conversion logic. All we have to do is provide the means to get the C types we want to handle into a C type defined by Perl, and vice versa, and the xsubpp tool will be able to do the rest.

Input Conversion

The input section of the typemap file is marked by the keyword INPUT. After this, we find a definition for each of the Perl types on the right-hand side of the equivalency statements in the first section. Each conversion maps the Perl type to all possible C types that it can be related to. Here is where we find the conversions to turn a Perl scalar into an integer:

T_IV
        $var = ($type)SvIV($arg)

And a Perl scalar into a pointer value:

T_PV
        $var = ($type)SvPV_nolen($arg)

The Perl type appears at the start of a line, on its own. This defines the start of the conversion definition. What follows is perfectly ordinary C code (it can even include if statements and for loops) and is defined using normal C syntax except for the "scalar variables." These are not, in fact, scalar variables at all, but macros that are expanded into suitable forms by xsubpp. They are given this look to give them familiarity:

  • $var is replaced with the C-typed output variable. We need to assign a properly typed expression to it for the conversion to succeed.
  • $arg is the Perl input variable. Notice that it is always an SV*, though it might turn out to be a reference SV that is pointing to something more complex like an AV*, an HV*, or something more esoteric.
  • $type is the C type defined on the left-hand side of the equivalency statement. This is how a single INPUT definition can handle more than one conversion. For these conversions, it is simply used to perform an explicit recast.

Since SvPV_nolen is used to extract the string from the SV for the conversion from T_PV, it is implicit that any C type that is defined as equivalent to it understands a null-terminated string. If it doesn't, we need to create our own mapping. Similarly, anything that is associated with T_IV is by implication handled by simply recasting an integer value into that type. If this is not the case, we need a different mapping.

Armed with this information, we can create our own input conversion. As a simple example, we can take our percentage type and convert it into an integer with our own Perl type, using the exact same conversion but with our own mapping. First, we change the Perl type to one of our own devising. (Recall that the ones already in the default typemap have meaning only because they have INPUT and OUTPUT definitions.) For example, we can put this in our local typemap file:

percent    T_PERCENT

Now we just add

T_PERCENT
        $var = ($type)SvUV($arg);

This converts a Perl SV into a C variable of type percent. We can do a little better than this, though. A percentage can only legally be a value between 0 and 100, so we can add C code to check for this:

T_PERCENT
        {
            IV tmp_$var = SvUV($arg);
            if (tmp$var >=0 && tmp$var <=100) {
                $var = ($type)SvUV($arg);
            } else {
                Perl_croak(aTHX_ "$var is not in range 0..100");
            }
        }

This conversion demonstrates several tricks that we can employ. First, we can declare local variables if we use an outer { ... } block. (We do not need to use this block, however, if we do not have any need to declare a variable. The if...else statement can exist just fine without it.) Second, we can base the name of a temporary variable on the name of the real variable $var—the substitution is made by xsubpp before the C compiler sees it. Third, we can throw an error with Perl_croak.

Output Conversion

The output section of the typemap file is marked by the keyword OUTPUT. After this, we again find a definition for each of the Perl types on the right-hand side of the equivalency statements in the first section. Each conversion maps all possible C types to the Perl type. Here is where we find the conversions to turn an integer into a Perl scalar:

T_IV
        sv_setiv($arg, (IV)$var);

And a pointer value into a Perl scalar:

T_PV
        sv_setpv((SV*)$arg, $var);

We find $var and $arg here too, but now we are converting the value in $var, the C type, to $arg, which is an SV*. Again, notice that the destination variable must always be a Perl scalar of type SV*, though it might be generated as a reference to an AV* and so on. We might do this, for example, to convert a char** into an array reference. The SV* variable that gets substituted for $arg is predefined and preallocated by the time our conversion code is seen, so we use sv_setiv and sv_setpv to simply set the appropriate slot of the scalar from our C data.

For our percent type, the conversion is identical to the normal unsigned integer conversion:

T_PERCENT
        sv_setuv($arg, (UV)$var);

Mapping C Structs into Perl

Now that we understand the basics of typemaps, we can look at taking more complex data types like structs and converting them to and from Perl. There are several modules on CPAN that will attempt to handle this transparently for us, notably Inline::Struct, but if we can't use a prepackaged solution, there are two basic approaches that we can take:

  • Create a custom Perl type and define INPUT and OUTPUT conversions for it.
  • Map the C type to T_PTROBJ, which stores a passed pointer as a void *, and manage all the conversion logic from within the CODE sections of XSUBs (which being C, can access the native C type directly).

The first approach assumes conversion to and from a Perl scalar, since that is how typemaps operate. The second approach allows us to be more flexible about how and what we convert, but we don't get transparent conversion just by naming the types in an XSUB.

Let's look at the first approach. Assume we have a C data type declared as follows:

typedef struct {
    int number;
    char *name;
} serial_t;

First, we add the equivalency statement to the first section of our local typemap file:

serial_t  T_SERIAL

Then we add an input conversion somewhere within the INPUT section. We have free choice of what the Perl equivalent of this structure is—it could be an array reference of two elements, a string of the form "name: number", or any other representation. We will choose a hash reference with keys of "name" and "number", and at the same time see how to handle a blessed hash reference too.

For this implementation, if either key is not present, we will convert to 0 and an empty string, respectively. We know that $arg should contain the reference to a hash and that $var is a variable of type serial_t:

T_SERIAL
        if (SvROK($arg) && SvTYPE(SvRV($arg)) == Svt_PVHV) {
            SV** tmpsvp;
            tmpsvp = hv_fetch((HV*)SvRV($arg),"name",4,FALSE);
            $var.name = (tmpsvp==NULL)
                 ? "" : SvPV_nolen(*tmpsvp);
            tmpsvp = hv_fetch((HV*)SvRV($arg),"number",6,FALSE);
            $var.number = (tmpsvp==NULL)
                 ? 0  : SvIV(*tmpsvp);
        } else {
            Perl_croak(aTHX_ "$var is not a hash reference");
        }

If we wanted to additionally ensure that the passed hash reference was blessed into a particular Perl class, or a subclass, we could replace the first line with

if (SvROK($arg) && sv_derived_from($arg, "Serial")) {

Similarly, the OUTPUT section should contain code to create a hash reference containing the appropriate keys:

T_SERIAL
        {
            HV *serial = newHV();
            hv_store(serial,"name", 4, newSVpv($var.name, 0), 0);

            hv_store(serial,"number",6,newSViv($var.number), 0);
            $var = newSVrv((SV*)serial, "Serial");
        }

The second argument to newSVrv is passed to bless, so this blesses the hash reference into the package Serial. If we leave it as NULL, then we just get a regular unblessed hash reference.

So, how about the second approach? The standard typemap defines a Perl type T_PTROBJ, which simply assigns a pointer to any kind of structure to an SV using sv_setref_pv. The result of this mapping is that an XSUB gets to manipulate the native data type directly. For example, to create a new serial_t structure, we could write this into an XS file:

serial_t *
newserial(name, nummber)
    char *name;
    int number;
CODE:
    RETVAL = (serial_t *)Safemalloc(sizeof(serial_t));
    RETVAL->name = savepv(name);
    RETVAL->number = number;
OUTPUT:
    RETVAL;

And to access the number member:

int
number(serial)
    serial_t *serial;
CODE:
    RETVAL = serial->number;
OUTPUT:
    RETVAL

In Perl, we can now write code like this:

use Serial;
my $serial=newserial("Number 6" => 6);
my $number=$serial->number();

Converting this into a Serial object in Perl is simply a matter of writing a new subroutine that wraps newserial and blesses a reference to the returned scalar, to pick one of several possible approaches.

We are creating new C variables dynamically here, so we need to allocate memory. safemalloc and savepv are supplied by Perl's API to provide equivalents of C memory allocation functions that will always play nicely with the Perl interpreter. The savepv function is the Perl supplied version of strdup, for example. We should generally always use the functions provided by Perl for this purpose.

Mapping C++ Object Classes

Dealing with C++ objects is not so different from C objects. Since C++ classes are by definition opaque values, we cannot reasonably make use of a typemap to do all the work for us, so we are left with the T_PTROBJ approach.

Since handling objects is a common requirement, we can make use of the perlobject.map file by Dean Roehrich. This is a useful aid to creating a Perl/C++ class extension that includes a O_OBJECT type, among other types, which automatically takes care of converting between a blessed Perl object of the Perl class and a pointer to a C++ object instance. It is nearly identical to this example typemapping adapted from the perlxs manual page (most of which should now be understandable to us):

INPUT
O_OBJECT
if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) {
    $var = ($type)SvIV((SV*)SvRV($arg));
} else {
    Perl_warn(aTHX_
        "${Package}::$func_name() -- $var is not a blessed SV reference" );
    XSRETURN_UNDEF;
}

OUTPUT
O_OBJECT
sv_setref_pv($arg, CLASS, (void*)$var);

The only mysterious entity here is Svt_PVMG, which is simply the return value of SvTYPE for a blessed Perl reference, though it is interesting to note the use of Perl_warn with an apparently interpolated string (it isn't, of course, but the text is expanded by xsubpp). This typemap will handle the casting of the blessed scalar to and from an object of the correct C++ type. We still have to supply the logic to manage the transformation from the opaque C++ reference to Perl object, though.

The equivalency statement for a C++ class MyClass would look like this:

MyClass *    O_OBJECT

Let's assume we also call the Perl extension MyClass.pm and create an example constructor, object method, and destructor to map to this C++ class. The key detail to add is to prefix the XSUB subroutine name with a class name and ::. This tells xsubpp that we are mapping onto a C++ class and causes two new macros to be defined:

  • THIS: The C++ object instance
  • CLASS: The C++ class type

Constructors

We can set up a constructor to be called by Perl with an XSUB called new, which tells xsubpp that this XSUB is a constructor. A typical constructor might be as follows:

MyClass *
MyClass::new(from = NO_INIT)
    MyClass *from
PROTOTYPE: $;$
CODE:
    if (items > 1) {
        RETVAL = new MyClass(from);
    } else {
        RETVAL = new MyClass();
    }
OUTPUT:
    RETVAL

This constructor takes an optional argument, which is of the same C++ class (that is, an object to clone) and calls the appropriate version of the C++ constructor accordingly. Note the default argument value of NO_INIT—this tells xsubpp to construct code to allocate memory for a MyClass object if we pass in the corresponding Perl object, but to not bother if no Perl object was passed. The typemap takes care of converting the returned pointer into a blessed SV *.

Why do we test items against 1 rather than 0? This is one way to define an optional argument, as we saw earlier. However, because this is a C++ XSUB, there is an implicit first argument. For object methods, it defines an object pointer in THIS. For a constructor, it defines CLASS. However, we already took care of the class in the typemap, so we don't need to refer to it here.

Methods

A method to call the C++ object from the blessed Perl reference that encapsulates it is simple now that we can create the object. Here is an example method that passes an integer and expects another integer back:

int
MyClass::convertint(in)
    int in
CODE:
    RETVAL = (int)(THIS->convert_an_int(in));
OUTPUT:
    RETVAL

The C++ object is accessed through THIS, which is the implicit first argument, and we call the object method as normal. If the method happens to return a long or similar integer-like type, we can always cast it, as we have in this example. It is worth noting that the Perl method, here convertint, does not have to line up with the C++ method, here convert_an_int.

How about C++ operator methods? These might seem to be a different problem altogether, but in fact we just wrap them like ordinary methods and then use Perl's overload module to map the Perl name for the operator to Perl's equivalent operator type, which we can do in the h2xs-generated Perl module that accompanies the XS file (MyClass.pm in this example).

We can have xsubpp do this job for us by adding an OVERLOAD: line and adjusting the XSUB slightly, in particular adding a third argument to denote which side of the Perl expression we are on (should we care). For example:

bool_t
operator_isequal(lhs,rhs,onright)
    MyClass *lhs    MyClass *rhs
    IV onright
OVERLOAD: == eq
CODE:
    RETVAL operator==(*lhs,*rhs);
OUTPUT:
    RETVAL

This handles both the numeric and string equality operators in Perl. The return type here is bool_t, which we are assuming is the return type of the operator== method in the C++ class and which happens to be predefined in the default typemap. We should adjust this accordingly if the return type is different (for example, int).

To handle the stringify operator (""), specify ""., To enable automatic fallback operator generation, specify FALLBACK: TRUE just after the MODULE and PACKAGE declaration in the XS file.

Destructors

To map a Perl DESTROY block to the destructor of a C++ class is possibly the simplest XSUB of all. We just write

void
MyClass::DESTROY()

The xsubpp script will handle the rest.

Inlining C and Other Languages into Perl

The Inline module, which we mentioned briefly at the start of this chapter, is a convenient wrapper around the functionality of xsubpp and the C::Scan module that encapsulates the process of generating the XS file for us. Better still, it allows us to place the C code bodily within a Perl source file rather than in a separate file. Furthermore, we can embed almost any language if we install the appropriate support module. By default, Inline automatically provides Inline::C, but we can add Inline::CPP to inline C++, Inline::Java to inline Java, Inline::Python to inline Python, and so on. Each of these support modules implements the interface defined by the Inline module and provides support for a specific language.

The embedded code is compiled the first time the program is run, so we do not even need to have a "make" step to use it. Here is a simple script that demonstrates one way we can inline C code:

#!/usr/bin/perl
# inlinefraction.pl
use strict;
use warnings;

use Inline C => qq[
int heavyfraction (int num1, int num2) {
    int result;

    if (num1 > num2) {
        result = num1 / num2;
    } else {
        result = num2 / num1;
    }

    return result;
}
];

print heavyfraction(10,3);

The first time we run this script, there will be a pause as Inline extracts the embedded code, determines the XSUB mappings to be generated for it, and passes the resulting sources to the underlying C compiler. On subsequent invocations, the program will run at full speed, as the C has already been compiled.

Depending on how we want to organize the code, we can inline the C code in a variety of ways. First, there are HERE documents, essentially a syntactic variant on the previous example:

use Inline C => <<_END_OF_C;
int heavyfraction (int num1, int num2) {
    ...
}
_END_OF_C

Or, we can use an intermediate string variable, so long as we make sure it is defined at compile time, for example, by importing it from our own module designed for the purpose:

use My::C::Repository qw($codestring);
use Inline 'C' => $codestring;

Equivalently, but at run time instead of compile time, we can use the language-neutral bind method of the Inline module to the same effect. This approach is useful if we don't want to compile the code unless we intend to use it. Of course, this also means that there will be a delay the first time the use of this code is triggered.

my $codestring='int heavyfraction ...';
Inline->bind(C => $codestring);

Without any qualifying argument, Inline will look for code beneath a special marker named for the language being inlined, after the __END__ marker:

use Inline 'C';

print heavyfraction(10,3);

__END__
__C__
int heavyfraction (int num1, int num2) {
...

With the special keyword DATA, we can place code into a __DATA__ section before the __END__ marker, if it is present:

use Inline C => 'DATA';

print heavyfraction(10,3);

__DATA__
__C__
int heavyfraction (int num1, int num2) {
...

__END__

Finally, we can use the Inline::Files module to remove the need for the special tokens entirely. Implemented as a Perl source filter, it removes the inlined code from the Perl source before the interpreter gets to compile it, leaving Perl's __DATA__ and __END__ free for other uses:

use Inline::Files;
use Inline C;

__C__
int heavyfraction (int num1, int num2) {
...

__DATA__
Now we can put real data here again

__END__
Now we can put real end notes here again

The special token FILE or BELOW can also be used to explicitly request this augmented style of inlined section; we can also still use the DATA section or HERE documents as before.

Just to prove the versatility of Inline, here is a reimplementation of heavyfraction as embedded Python. For this to work, we need to install Inline::Python and, of course, Python itself, if it is not already available:

#!/usr/bin/perl
# inlinepython.pl
use strict;
use warnings;

use Inline Python => <<_END_OF_PYTHON;
def heavyfraction(x,y):

    if x > y:
        return x / y
    else:
        return y / x
_END_OF_PYTHON

print heavyfraction(10,3);

Configuring the Inline Module

The Inline module has several configuration options that we can specify for either all languages or an individual language. Here is how we can have Inline always rebuild C, while at the same time telling it to look in the __DATA__ section for the source:

use Inline C => 'DATA', FORCE_BUILD => 1;

We can configure Inline without invoking its compilation features using the Config keyword. The following is identical in operation to the previous example, but it splits the configuration and invocation into two separate statements:

use Inline C => 'Config', FORCE_BUILD => 1;
use Inline C => 'DATA';

To configure all languages at once, we just omit the language:

use Inline Config => FORCE_BUILD => 1;

The available configuration options are listed in Table 20-1.

Table 20-1. Inline Module Configuration Options

Option Description
DIRECTORY Location of temporary build directory for derived files. Defaults to _Inline.
FORCE_BUILD Force Inline to reextract and rebuild inlined sources. Defaults to 0.
INC Additional include paths, for example, -I/usr/include/extradir.
LIBS Additional libraries, for example, -llight -lheavy.
WARNINGS Enable warnings. Defaults to 1.

While we would not typically want to configure FORCE_BUILD within our code, we can preload and configure the module on the command line. For example:

> perl -MInline=Config,FORCE_BUILD,1 inlinefraction.pl

This will force a rebuild for all inlined language types. We can specify a specific language (should we have inlined several into the same application) as before and configure several at once by specifying -MInline=lang,... for each language in turn.

Collaborating with XSUBs

To configure Inline to collaborate with another module containing C or C++ extensions, we can use the special with token. This causes Inline to import the function signatures from the other module and make the underlying library routines available to the inlined code as well. For instance, we could write this into a Perl source file:

use Inline with => Heavy::Fraction;

This would import our example XS extension from earlier in the chapter and make the heavyfraction routine directly callable from within C (or C++) code inlined into this source file.

Creating Distributable Inlined Modules

To create a distributable module that uses inlined code, we need to set up Makefile.PL to compile the foreign language section up front. This is because normally the inlined code is only handled when an application is run. This clearly won't do for a module.

Fortunately, the solution is simple. We just replace ExtUtils::MakeMaker with Inline::MakeMaker, a subclass of the original module that integrates support for inlined code. Once this is done, we can use make and make dist as usual to create a distributable module archive.

See perldoc Inline-FAQ and Inline::C-Cookbook for more information and examples.

Compiling Perl

One of the most often-requested Perl tools is a Perl-to-C compiler, or at least a Perl-to-something-more-compact compiler. In fact, Perl comes with three, in the form of compiler backends in the B:: family of modules.

  • The B::C module performs a straight translation of a compiled tree of Perl opcodes and operands into C statements to construct the same tree from scratch. It then invokes an embedded Perl interpreter allocated and constructed in C to process the reconstructed tree.
    The advantage of this approach is that the resulting C program will do exactly what the original Perl program did and can run without Perl present. The drawback is that it will be much bigger than the Perl interpreter (a copy of which is linked in to the program) and might not even be any faster. B::C is not guaranteed to generate viable C code in every case, but it is successful most of the time.
  • The B::CC module performs starts the same way, from the compiled Perl code, but it generates low-level C code that mimics the operation of the interpreter directly.
    Like the B::C module, the resulting program should run without needing Perl, and because it does not need to link in the interpreter, it should be both smaller and faster. The drawback is that the module is still experimental and is quite likely to not generate code that functions entirely correctly.
  • The B::Bytecode module converts a compiled tree of opcodes and operands into bytecode; the opcode tree converted into a compact binary form. This compact form can be regenerated back into the compiled tree with the Byteloader module. This module is a translator (implemented using Filter::Util::Call) that simply converts from a compressed representation of Perl rather than the usual human-readable form.

We can invoke any of the backends through the O module, just as we did for the other B:: modules back in Chapter 17. For instance, this compiles the Perl script app.pl into an executable called app.exe:

> perl -MO=C,-oapp.exe app.pl

And this converts the script into bytecode, along with a two-line prefix to load the Byteloader to have the bytecode executed:

> perl -MO=Bytecode,-oapp.plc app.pl

The .plc extension means "compiled Perl" and is traditional for a bytecode-encoded Perl program. To execute this compiled code, we can invoke the Byteloader module like this:

> perl -MbyteLoader app.plc

Alternatively, we can add a short snippet of Perl to the top of the file to do the job for us:

#!/usr/bin/perl
use ByteLoader 0.05;

All the backends are set up to understand an import list that consists of options and arguments like a command line. Here, -o determines the output file name (otherwise, we get the traditional a.out). However, it is more convenient to use the perlcc script, which conveniently encapsulates the process of generating the C code, compiling it, and even running it if we so desire. Here are the same compilations done via perlcc:

> perlcc -o app.exe app.pl
> perlcc -B -o app.plc app.pl

The perlcc script automatically adds the two-line prefix for the bytecode version, so we do not need to add it ourselves.

To use B::CC instead of B::C, we specify the -O option:

> perlcc -O -o app.exe app.pl

For the Perl-to-C compilers only, the intermediate C code can be preserved for the overly curious with the -S option:

> perlcc -o app.exe -S app.pl

Likewise, we can specify additional libraries to link against with the -L and -l options, which have the same meanings as they do to a regular linker. The -c option can be used to generate the intermediate C code without going on to compile it to an executable.

For all three backends, -e can be used to compile a one-liner, -v can be used to increase the output of the tool, and -log can be used to send that output to a log file. Adding -r causes the resulting program to be executed immediately after the compiler has finished its work (not, obviously, in conjunction with -c).

There is no intermediate code for the bytecode compiler—the byte code essentially is the intermediate code—but we can convert it into a (slightly) more readable form with the B::Disassembler module or the disassemble front-end script (found in the B directory in the standard library alongside the compiler backend modules) that invokes it for us. The assemble script carries out the opposite conversion for those who feel compelled to tinker with Perl "assembly language" instead of just writing Perl like normal people. It also has the more practical application of allowing us to update bytecode generated by an older version of the B::Bytecode backend to work with a more modern Perl installation, without needing to go back to the original sources.

We can also, technically, compile Perl into Perl with the B::Deparse module. Although this might not sound interesting, it does have some useful applications. First, the Data::Dumper module can use it to encode code references in serialized data. Second, it can be a useful tool in debugging some uncooperative code, by transforming it into the code it really is, rather than what it looks like. See Chapter 17 for more details.

Naturally, several more sophisticated means of compiling Perl into C are available from CPAN. One in particular that may be worth investigating is PAR, a toolkit for archiving Perl code loosely inspired by Java's jar files. PAR also comes with a utility called pp that will create a self-contained executable in the manner of perlcc, but with superior results. PAR has many other useful tricks too, especially for creating module distributions.

Summary

In this chapter, we looked at how to integrate a Perl interpreter into C or C++ code and use it to evaluate Perl code, call Perl subroutines, and manipulate Perl data structures. We also looked at integrating C code into Perl by writing an extension module in XS, the language of extension subroutines, or XSUBs. In both cases, a good basic understanding of the macros and functions that Perl provides through its C header files is essential. While there are very many of these functions, in this chapter we have covered a good selection of the most common of them along with a range of their possible uses. We looked in particular at the h2xs program and the xsubpp XSUB compiler and how we can use these Perl-provided tools to automate some or all of the process of binding C and Perl together.

The Inline module is a very convenient wrapper for much of the preceding and understands how to drive xsubpp from within an ordinary Perl script. Using it, we can have our code automatically invoke a compiler to generate the bindings to external C libraries and compile embedded C code at the point of execution, notably without a makefile involved. Inline can also provide integration to many other languages including Java, Basic, PHP, and Python. We looked at a simple implementation of a Python-based extension by way of example.

Finally, we saw how to compile Perl programs into C executables or alternatively into Perl bytecode. Either approach makes use of a compiler backend module, either B::C or B::CC for C code, or B::Bytecode for binary bytecode that can be regenerated into a compiled Perl opcode tree with the Byteloader module.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset