There’s much more to Linux than simply using the system. One of the benefits of free software is that you can modify it to suit your needs. This applies equally to the many free applications available for Linux and to the Linux kernel itself.
Linux supports an advanced programming interface, using GNU compilers and tools, such as the gcc compiler, the gdb debugger, and so on. A number of other programming languages, including Perl, Python, and LISP, are also supported. Whatever your programming needs, Linux is a great choice for developing Unix applications. Because the complete source code for the libraries and Linux kernel is provided, programmers who need to delve into the system internals are able to do so.[44]
Linux is an ideal platform for developing software to run under the X Window System. The Linux X distribution, as described in Chapter 10, is a complete implementation with everything you need to develop and support X applications. Programming for X is portable across applications, so the X-specific portions of your application should compile cleanly on other Unix systems.
In this chapter, we’ll explore the Linux programming environment and give you a five-cent tour of the many facilities it provides. Half of the trick to Unix programming is knowing what tools are available and how to use them effectively. Often the most useful features of these tools are not obvious to new users.
Since C programming has been the basis of most large projects (even though it is nowadays being replaced more and more by C++) and is the language common to most modern programmers — not only on Unix, but on many other systems as well — we’ll start out telling you what tools are available for that. The first few sections of the chapter assume you are already a C programmer.
But several other tools are emerging as important resources, especially for system administration. We’ll examine one in this chapter: Perl. Perl is a scripting language like the Unix shells, taking care of grunt work like memory allocation, so you can concentrate on your task. But Perl offers a degree of sophistication that makes it more powerful than shell scripts and, therefore, appropriate for many programming tasks.
Lots of programmers are excited about trying out Java™, the new language from Sun Microsystems. While most people associate Java with interactive programs (applets) on web pages, it is actually a general-purpose language with many potential Internet uses. In a later section, we’ll explore what Java offers above and beyond older programming languages, and how to get started.
The C programming language is by far the most often used in Unix software development. Perhaps this is because the Unix system was originally developed in C; it is the native tongue of Unix. Unix C compilers have traditionally defined the interface standards for other languages and tools, such as linkers, debuggers, and so on. Conventions set forth by the original C compilers have remained fairly consistent across the Unix programming board.
The GNU C compiler, gcc, is one of the most versatile and advanced compilers around. Unlike other C compilers (such as those shipped with the original AT&T or BSD distributions, or those available from various third-party vendors), gcc supports all the modern C standards currently in use — such as the ANSI C standard — as well as many extensions specific to gcc. Happily, however, gcc provides features to make it compatible with older C compilers and older styles of C programming. There is even a tool called protoize that can help you write function prototypes for old-style C programs.
gcc is also a C++ compiler. For those who prefer the more modern object-oriented environment, C++ is supported with all the bells and whistles — including most of the C++ introduced when the C++ standard was released, such as method templates. Complete C++ class libraries are provided as well, such as the Standard Template Library (STL).
For those with a taste for the particularly esoteric, gcc also supports Objective-C, an object-oriented C spinoff that never gained much popularity but may see a second spring due to its usage in Mac OS X. And there is gcj, which compiles Java code to machine code. But the fun doesn’t stop there, as we’ll see.
In this section, we’re going to cover the use of gcc to compile and link programs under Linux. We assume you are familiar with programming in C/C++, but we don’t assume you’re accustomed to the Unix programming environment. That’s what we’ll introduce here.
The latest gcc version at the time of this writing is Version 3.0.4. However, the 3.0 series has proven to be still quite unstable, which is why Version 2.95.3 is still considered the official standard version. We suggest sticking with that one unless you know exactly what you are doing.
Before imparting all the gritty details of gcc, we’re going to present a simple example and walk through the steps of compiling a C program on a Unix system.
Let’s say you have the following bit of code, an encore of the much-overused “Hello, World!” program (not that it bears repeating):
#include <stdio.h> int main( ) { (void)printf("Hello, World! "); return 0; /* Just to be nice */ }
Several steps are required to compile this program into a living, breathing executable. You can accomplish most of these steps through a single gcc command, but we’ve left the specifics for later in the chapter.
First, the gcc compiler must generate an object file from this source code. The object file is essentially the machine-code equivalent of the C source. It contains code to set up the main( ) calling stack, a call to the printf( ) function, and code to return the value of 0.
The next step is to link the object file to produce an executable. As you might guess, this is done by the linker. The job of the linker is to take object files, merge them with code from libraries, and spit out an executable. The object code from the previous source does not make a complete executable. First and foremost, the code for printf( ) must be linked in. Also, various initialization routines, invisible to the mortal programmer, must be appended to the executable.
Where does the code for printf( ) come from? Answer: the libraries. It is impossible to talk for long about gcc without mentioning them. A library is essentially a collection of many object files, including an index. When searching for the code for printf( ), the linker looks at the index for each library it’s been told to link against. It finds the object file containing the printf( ) function and extracts that object file (the entire object file, which may contain much more than just the printf( ) function) and links it to the executable.
In reality, things are more complicated than this. Linux supports two kinds of libraries: static and shared. What we have described in this example are static libraries: libraries where the actual code for called subroutines is appended to the executable. However, the code for subroutines such as printf( ) can be quite lengthy. Because many programs use common subroutines from the libraries, it doesn’t make sense for each executable to contain its own copy of the library code. That’s where shared libraries come in.[45]
With shared libraries, all the common subroutine code is contained in a single library “image file” on disk. When a program is linked with a shared library, stub code is appended to the executable, instead of actual subroutine code. This stub code tells the program loader where to find the library code on disk, in the image file, at runtime. Therefore, when our friendly “Hello, World!” program is executed, the program loader notices that the program has been linked against a shared library. It then finds the shared library image and loads code for library routines, such as printf( ), along with the code for the program itself. The stub code tells the loader where to find the code for printf( ) in the image file.
Even this is an oversimplification of what’s really going on. Linux shared libraries use jump tables that allow the libraries to be upgraded and their contents to be jumbled around, without requiring the executables using these libraries to be relinked. The stub code in the executable actually looks up another reference in the library itself — in the jump table. In this way, the library contents and the corresponding jump tables can be changed, but the executable stub code can remain the same.
Shared libraries also have another advantage: their upgradability. When someone fixes a bug in printf() (or worse, a security hole), you only need to upgrade the one library. You don’t have to relink every single program on your system.
But don’t allow yourself to be befuddled by all this abstract information. In time, we’ll approach a real-life example and show you how to compile, link, and debug your programs. It’s actually very simple; the gcc compiler takes are of most of the details for you. However, it helps to understand what’s going on behind the scenes.
gcc has more features than we could possibly enumerate here. The gcc manual page and Info document give an eyeful of interesting information about this compiler. Later in this section, we’ll give you a comprehensive overview of the most useful gcc features to get you started. This in hand, you should be able to figure out for yourself how to get the many other facilities to work to your advantage.
For
starters, gcc supports the
“standard” C syntax currently in
use, specified for the most part by the ANSI C
standard. The most important feature of this standard is function
prototyping. That is, when defining a function foo( ), which returns an int
and takes two
arguments, a
(of type char *
)
and b
(of type double
), the
function may be defined like this:
int foo(char *a, double b) { /* your code here... */ }
This is in contrast to the older, nonprototype function definition syntax, which looks like this:
int foo(a, b) char *a; double b; { /* your code here... */ }
and which is also supported by gcc. Of course, ANSI C defines many other conventions, but this is the one most obvious to the new programmer. Anyone familiar with C programming style in modern books, such as the second edition of Kernighan and Ritchie’s The C Programming Language (Prentice Hall), can program using gcc with no problem.
The gcc compiler
boasts quite an impressive optimizer. Whereas most C compilers allow
you to use the single switch -O
to specify
optimization, gcc supports multiple levels of
optimization. At the highest level, gcc pulls
tricks out of its sleeve, such as allowing code and static data to be
shared. That is, if you have a static string in your program such as
Hello, World!
, and the ASCII encoding of that
string happens to coincide with a sequence of instruction code in
your program, gcc allows the string data and the
corresponding code to share the same storage. How clever is that!
Of course, gcc allows you to compile debugging information into object files, which aids a debugger (and hence, the programmer) in tracing through the program. The compiler inserts markers in the object file, allowing the debugger to locate specific lines, variables, and functions in the compiled program. Therefore, when using a debugger such as gdb (which we’ll talk about later in the chapter), you can step through the compiled program and view the original source text simultaneously.
Among the other tricks gcc offers is the ability to generate assembly code with the flick of a switch (literally). Instead of telling gcc to compile your source to machine code, you can ask it to stop at the assembly-language level, which is much easier for humans to comprehend. This happens to be a nice way to learn the intricacies of protected-mode assembly programming under Linux: write some C code, have gcc translate it into assembly language for you, and study that.
gcc includes its own assembler (which can be used independently of gcc and is called gas), just in case you’re wondering how this assembly-language code might get assembled. In fact, you can include inline assembly code in your C source, in case you need to invoke some particularly nasty magic but don’t want to write exclusively in assembly.
By now, you must be itching to know how to invoke all these wonderful features. It is important, especially to novice Unix and C programmers, to know how to use gcc effectively. Using a command-line compiler such as gcc is quite different from, say, using a development system such as Visual Studio or C++ Builder under Windows.[46] Even though the language syntax is similar, the methods used to compile and link programs are not at all the same.
Let’s return to our innocent-looking “Hello, World!” example. How would you go about compiling and linking this program?
The first step, of course, is to enter the source code. You
accomplish this with a text editor, such as Emacs or
vi. The would-be programmer should enter the
source code and save it in a file named something like
hello.c
. (As with most C compilers,
gcc is picky about the filename extension; that
is, how it can distinguish C source from assembly source from object
files, and so on. You should use the .c
extension for standard C source.)
To compile and link the program to the executable
hello
, the programmer would use the command:
papaya$ gcc -o hello hello.c
and (barring any errors), in one fell swoop, gcc
compiles the source into an object file, links against the
appropriate libraries, and spits out the executable
hello
, ready to run. In fact, the wary
programmer might want to test it:
papaya$ ./hello
Hello, World!
papaya$
As friendly as can be expected.
Obviously, quite a few things took place behind the scenes when
executing this single gcc command. First of all,
gcc had to compile your source file,
hello.c
, into an object file,
hello.o
. Next, it had to link
hello.o
against the standard libraries and
produce an executable.
By default, gcc assumes that you want not only to compile the source files you specify, but also to have them linked together (with each other and with the standard libraries) to produce an executable. First, gcc compiles any source files into object files. Next, it automatically invokes the linker to glue all the object files and libraries into an executable. (That’s right, the linker is a separate program, called ld, not part of gcc itself — although it can be said that gcc and ld are close friends.) gcc also knows about the “standard” libraries used by most programs and tells ld to link against them. You can, of course, override these defaults in various ways.
You can pass multiple filenames in one gcc
command, but on large projects you’ll find it more
natural to compile a few files at a time and keep the
.o
object files around. If you want only to
compile a source file into an object file and forego the linking
process, use the -c switch with
gcc, as in:
papaya$ gcc -c hello.c
This produces the object file hello.o
and
nothing else.
By default, the
linker produces an executable named, of all things,
a.out
. This is just a bit of left-over gunk from
early implementations of Unix, and nothing to write home about. By
using the -o switch with
gcc, you can force the resulting executable to
be named something different, in this case,
hello
.
The next step
on your path to gcc enlightenment is to
understand how to compile programs using multiple source files.
Let’s say you have a program consisting of two
source files, foo.c
and
bar.c
. Naturally, you would use one or more
header files (such as foo.h
) containing function
declarations shared between the two programs. In this way, code in
foo.c
knows about functions in
bar.c
, and vice versa.
To compile these two source files and link them together (along with
the libraries, of course) to produce the executable
baz
, you’d use the command:
papaya$ gcc -o baz foo.c bar.c
This is roughly equivalent to the three commands:
papaya$gcc -c foo.c
papaya$gcc -c bar.c
papaya$gcc -o baz foo.o bar.o
gcc acts as a nice frontend to the linker and other “hidden” utilities invoked during compilation.
Of course, compiling a program using multiple source files in one command can be time-consuming. If you had, say, five or more source files in your program, the gcc command in the previous example would recompile each source file in turn before linking the executable. This can be a large waste of time, especially if you only made modifications to a single source file since last compilation. There would be no reason to recompile the other source files, as their up-to-date object files are still intact.
The answer to this problem is to use a project manager such as make. We’ll talk about make later in the chapter, in Section 13.2.
Telling gcc to optimize your code as it compiles is a simple matter; just use the -O switch on the gcc command line:
papaya$ gcc -O -o fishsticks fishsticks.c
As we mentioned not long ago, gcc supports different levels of optimization. Using -O2 instead of -O will turn on several “expensive” optimizations that may cause compilation to run more slowly but will (hopefully) greatly enhance performance of your code.
You may notice in your dealings with Linux that a number of programs are compiled using the switch -O6 (the Linux kernel being a good example). The current version of gcc does not support optimization up to -O6, so this defaults to (presently) the equivalent of -O2. However, -O6 is sometimes used for compatibility with future versions of gcc to ensure that the greatest level of optimization is used.
The -g switch to gcc turns on debugging code in your compiled object files. That is, extra information is added to the object file, as well as the resulting executable, allowing the program to be traced with a debugger such as gdb. The downside to using debugging code is that it greatly increases the size of the resulting object files. It’s usually best to use -g only while developing and testing your programs and to leave it out for the “final” compilation.
Happily, debug-enabled code is not incompatible with code optimization. This means that you can safely use the command:
papaya$ gcc -O -g -o mumble mumble.c
However, certain optimizations enabled by -O or -O2 may cause the program to appear to behave erratically while under the guise of a debugger. It is usually best to use either -O or -g, not both.
Before we leave the realm of gcc, a few words on linking and libraries are in order. For one thing, it’s easy for you to create your own libraries. If you have a set of routines you use often, you may wish to group them into a set of source files, compile each source file into an object file, and then create a library from the object files. This saves you from having to compile these routines individually for each program in which you use them.
Let’s say you have a set of source files containing oft-used routines, such as:
float square(float x) { /* Code for square( )... */ } int factorial(int x, int n) { /* Code for factorial( )... */ }
and so on (of course, the gcc standard libraries
provide analogs to these common routines, so don’t
be misled by our choice of example). Furthermore,
let’s say that the code for square( ) is in the file square.c
and that
the code for factorial( ) is in
factorial.c
. Simple enough, right?
To produce a library containing these routines, all you do is compile each source file, as so:
papaya$ gcc -c square.c factorial.c
which leaves you with square.o
and
factorial.o
. Next, create a library from the
object files. As it turns out, a library is just an archive file
created using ar (a close counterpart to
tar). Let’s call our library
libstuff.a
and create it this way:
papaya$ ar r libstuff.a square.o factorial.o
When updating a library such as this, you may need to delete the old
libstuff.a
, if it exists. The last step is to
generate an index for the library, which enables the linker to find
routines within the library. To do this, use the
ranlib command, as so:
papaya$ ranlib libstuff.a
This command adds information to the library itself; no separate index file is created. You could also combine the two steps of running ar and ranlib by using the s command to ar:
papaya$ ar rs libstuff.a square.o factorial.o
Now you have libstuff.a
, a static library
containing your routines. Before you can link programs against it,
you’ll need to create a header file describing the
contents of the library. For example, we could create
libstuff.h
with the contents:
/* libstuff.h: routines in libstuff.a */ extern float square(float); extern int factorial(int, int);
Every source file that uses routines from
libstuff.a
should contain an #include "libstuff.h"
line, as you would do with standard header
files.
Now that we have our library and header file, how do we compile
programs to use them? First of all, we need to put the library and
header file someplace where the compiler can find them. Many users
place personal libraries in the directory lib
in
their home directory, and personal include files under
include
. Assuming we have done so, we can
compile the mythical program wibble.c
using the
command:
papaya$ gcc -I../include -L../lib -o wibble wibble.c -lstuff
The -I option tells
gcc to add the directory
../include
to the include path it uses to search for include files.
-L is similar, in that it tells
gcc to add the directory
../lib
to the library path.
The last argument on the command line is
-lstuff, which tells the linker to link against
the library libstuff.a
(wherever it may be along
the library path). The lib
at the beginning of
the filename is assumed for libraries.
Any time you wish to link against libraries other than the standard
ones, you should use the -l switch on the
gcc command line. For example, if you wish to
use math routines (specified in math.h
), you
should add -lm to the end of the
gcc command, which links against
libm
. Note, however, that the
order of -l options is
significant. For example, if our libstuff
library used routines found in libm
, you must
include -lm after -lstuff
on the command line:
papaya$ gcc -Iinclude -Llib -o wibble wibble.c -lstuff -lm
This forces the linker to link libm
after
libstuff
, allowing those unresolved references
in libstuff
to be taken care of.
Where does gcc
look for libraries? By default, libraries are searched for in a
number of locations, the most important of which is
/usr/lib
. If you take a glance at the contents
of /usr/lib
, you’ll notice it
contains many library files — some of which have filenames ending
in .a
, others ending in
.so.version
. The .a
files
are static libraries, as is the case with our
libstuff.a
. The .so
files
are shared libraries, which contain code to be linked at runtime, as
well as the stub code required for the runtime linker
(ld.so
) to locate the shared library.
At runtime, the program loader looks for shared library images in
several places, including /lib
. If you look at
/lib
, you’ll see files such as
libc.so.6
. This is the image file containing the
code for the libc
shared library (one of the
standard libraries, which most programs are linked against).
By default, the linker attempts to link against shared libraries. However, static libraries are used in several caese — e.g., when there are no shared libraries with the specified name anywhere in the library search path. You can also specify that static libraries should be linked by using the -static switch with gcc.
Now that you know how to create and use static libraries, it’s very easy to take the step to shared libraries. Shared libraries have a number of advantages. They reduce memory consumption if used by more than one process, and they reduce the size of the executable. Furthermore, they make developing easier: when you use shared libraries and change some things in a library, you do not need to recompile and relink your application each time. You need to recompile only if you make incompatible changes, such as adding arguments to a call or changing the size of a struct.
Before you start doing all your development work with shared libraries, though, be warned that debugging with them is slightly more difficult than with static libraries because the debugger usually used on Linux, gdb, has some problems with shared libraries.
Code that goes into a shared library needs to be position-independent. This is just a convention for object code that makes it possible to use the code in shared libraries. You make gcc emit position-independent code by passing it one of the command-line switches -fpic or -fPIC . The former is preferred, unless the modules have grown so large that the relocatable code table is simply too small, in which case the compiler will emit an error message and you have to use -fPIC. To repeat our example from the last section:
papaya$ gcc -c -fpic square.c factorial.c
This being done, it is just a simple step to generate a shared library:[47]
papaya$ gcc -shared -o libstuff.so square.o factorial.o
Note the compiler switch -shared. There is no indexing step as with static libraries.
Using our newly created shared library is even simpler. The shared library doesn’t require any change to the compile command:
papaya$ gcc -I../include -L../lib -o wibble wibble.c -lstuff -lm
You might wonder what the linker does if a shared library
libstuff.so
and a static library
libstuff.a
are available. In this case, the
linker always picks the shared library. To make it use the static
one, you will have to name it explicitly on the command line:
papaya$ gcc -I../include -L../lib -o wibble wibble.c libstuff.a -lm
Another very useful tool for working with shared libraries is ldd. It tells you which shared libraries an executable program uses. Here’s an example:
papaya$ ldd wibble
libstuff.so => libstuff.so (0x400af000)
libm.so.5 => /lib/libm.so.5 (0x400ba000)
libc.so.5 => /lib/libc.so.5 (0x400c3000)
The three fields in each line are the name of the library, the full path to the instance of the library that is used, and where in the virtual address space the library is mapped to.
If ldd outputs not found
for
a certain library, you are in trouble and won’t be
able to run the program in question. You will have to search for a
copy of that library. Perhaps it is a library shipped with your
distribution that you opted not to install, or it is already on your
hard disk, but the loader (the part of the system that loads every
executable program) cannot find it.
In the latter situation, try locating the libraries yourself and find
out whether they’re in a nonstandard directory. By
default, the loader looks only in /lib
and
/usr/lib
. If you have libraries in another
directory, create an environment variable
LD_LIBRARY_PATH
and
add the directories separated by colons. If you believe that
everything is set up correctly, and the library in question still
cannot be found, run the command ldconfig as
root, which refreshes the linker system cache.
If you prefer object-oriented programming, gcc provides complete support for C++ as well as Objective-C. There are only a few considerations you need to be aware of when doing C++ programming with gcc.
First of all, C++ source filenames should end in the extension
.cpp
(most often used), .C
,
or .cc
. This distinguishes them from regular C
source filenames, which end in .c
.
Second, you should use the g++ shell script in lieu of gcc when compiling C++ code. g++ is simply a shell script that invokes gcc with a number of additional arguments, specifying a link against the C++ standard libraries, for example. g++ takes the same arguments and options as gcc.
If you do not use g++, you’ll
need to be sure to link against the C++ libraries in order to use any
of the basic C++ classes, such as the cout
and
cin
I/O objects. Also be sure you have actually
installed the C++ libraries and include files. Some distributions
contain only the standard C libraries. gcc will
be able to compile your C++ programs fine, but without the C++
libraries, you’ll end up with linker errors whenever
you attempt to use standard objects.
[44] On a variety of Unix systems, the authors have repeatedly found available documentation to be insufficient. With Linux, you can explore the very source code for the kernel, libraries, and system utilities. Having access to source code is more important than most programmers think.
[45] It should be noted that some very knowledgeable programmers consider shared libraries harmful, for reasons too involved to be explained here. They say that we shouldn’t need to bother in a time when most computers ship with 20GB hard disks and at least 128 MB of memory preinstalled.
[46] A number of IDEs are available for Linux now. These include both commercial ones like Kylix, the Linux version of Delphi, and open source ones like KDevelop, which we will mention in the next chapter.
[47] In the ancient days of Linux, creating a shared library was a daunting task of which even wizards were afraid. The advent of the ELF object-file format a few years ago has reduced this task to picking the right compiler switch. Things sure have improved!