The Language for the Job

Before implementation can start, an important decision needs to be made: Which programming language is actually going to be used? The choice of programming language will affect both performance and footprint. All programming languages have their own characteristics, their pros and cons, so choosing the language which fits the goals of your project best is therefore all but trivial. Although this book is essentially about optimizing C and C++, this section provides an overview of considerations to make when choosing between the programming languages that are currently most popular. The best optimization to make in C or C++ is perhaps not using it when another language will do better.

Assembly, C, C++, or…?

Before weighing specific language characteristics, it is important to analyze the project and its goals. It is impossible to say which programming language characteristics are positive or negative before you know what you want to do with the language. For instance, it makes no sense to say that C++ is better than Java simply because it allows the programmer more control over what happens inside the software. The statement holds true when talking about writing driver software or large and complicated applications, but quickly setting up an Internet tool which can be run on all hardware platforms without recompilation and which is never allowed to crash is a different story entirely. The following list contains questions that need to be asked to determine which qualities a project expects from a programming language. It also takes into account the environment in which the project is set up.

  • Approximate size of the software to be developed. Is this going to be a small tool, a full-fledged application, or a large, multimodular project? The larger a program gets, the more complicated maintenance work and updates will be. A large project therefore needs a language that allows for readable lines of code and good levels of abstraction. The design should be transparent in the program. For small applications or tools, this is less important as implementers can oversee a much greater proportion of the project.

  • Available programming language knowledge. This is a budget consideration because knowledge comes at a price. When none of the project team members has experience with a certain language, it might prove to be cheaper to choose a better-known language than sending everybody to courses or hiring external implementers.

  • Available software resources. In the same vein as the previous point, you should consider not only knowledge gained on programming languages, but also the built-up software library. Do you have a lot of software parts you can use from other/previous projects? If so, probably little cost will be incurred when using those. When this is compared to rewriting the existing parts in the new language, a different cost picture emerges. You also need to consider development tools (compilers, debuggers, simulation software, and hardware) and the knowledge of using them, as well as existing documentation.

  • Available development time. Some languages simply take up more development time than others. Think of printing a string onscreen either with BASIC or with Assembly. This involves not only the number of code lines that need to be typed to implement a certain action, but also the complexity of performing certain actions in a specific language. This extra time is incurred not only in thinking up and implementing a solution, but also during debugging and testing. Another time consideration is the amount of help and ready-made solutions that come with the different languages. BASIC and C++ come with a range of functional libraries, whereas Assembly comes with whatever you can find around you.

  • Target platform. When this decision has already been made, one might find that the choice of programming language is perhaps artificially limited. This can be because of the availability of quality compilers for that hardware platform or even complexity of the hardware itself. Some systems simply do not invite you to write closely hardware-related solutions.

  • Future project development. When you expect to upgrade the system and making new software versions in future, it is inadvisable to choose a programming language that is already becoming unpopular. When decreasing support (in the shape of compilers, third-party solutions, technical documentation, and so on) is given for a programming language, it is going to be increasingly difficult to maintain development in future. Similarly, it is then also inadvisable to bet on a new language for which there is not yet a lot of support, one that has not proven itself in the field, or one for which the syntax is not even completely standardized. Another advantage of using a more popular language is the increased chance of finding programmers in future to replace or strengthen the current development team. An entirely different, but no less important, future development issue is syntax readability. When you expect to be working on a system for an extended period of time, you want to choose a language that is easily readable and thus extendible. These risks lessen as the need for future development decreases. For one-time projects, it matters only if the current compiler and available documentation and human resources suffice.

  • Desired level of portability. The question here is whether you are writing software for one specific target platform (be it embedded software or not) or whether you expect different hardware configurations to have to execute the software. With most programming languages, you have to recompile the sources to get an executable that runs on a system with a different microprocessor. This does not have to be a problem. Only system-specific parts used might need to be reinvented or bought for new platforms. Other languages are not simply ported. Trying to port the source might even be similar to rewriting the whole program. Clearly you have to decide in advance if the language will have the characteristics desired at porting time.

Table 4.1 shows how the characteristics of the programming languages C, C++, Pascal, (Visual) Basic, Java, and Assembly compare to each other:

Table 4.1. Programming Language Characteristics
Characteristics C C++ Pasc. VB Java Asm.
Portability ++ ++ + - +++ -
Speed of development ++ ++ ++ +++ +++ -
Ease of update/ enhancement + ++ + ++ +++ -
Availability of knowledge ++ ++ + ++ ++ -
Standard solutions + ++ + ++ +++ -
Code readability ++ +++ ++ +++ +++ -
Possibility complex design ++ +++ + + + -
Execution speed ++ ++ + - - +++
Execution footprint +++ + + - - +++
Ease of optimization ++ ++ + - - +++
Time to learn language ++ + ++ ++ ++ -

From this table you can draw the following conclusions on when to use which language:

  • C

    The C programming language seems to be doing quite well in most categories. Only in the categories "complexity of design" and "ease of update/enhancement" does it look like there are better choices around these days. This is because C was basically developed before the object-oriented (OO) approach became so popular. This means that C is still an accessible, universally applicable language, as long as the projects do not become too large or too complex. When large numbers of developers need to update big chunks of each other's code, C can, in fact, easily become a mess.

    This is why C is a good choice for most kind of applications, as long as the programs do not become too large or complex.

  • C++

    The C++ programming language has characteristics very similar to those of C, unsurprisingly, with the exception of being more readable and maintainable when the programs become larger or the development process is more dynamic. The price that is paid for this is the fact that C++ generally has a larger footprint and takes longer for programmers to learn, especially when they are expected to use it well.

    This is why C++ is a good choice for all kinds of applications, as long as the footprint is not too tight.

  • Pascal

    The Pascal programming language is easy to learn, reasonably fast, and quite strict in runtime type checking. This programming language does have a few strange quirks that might have to be programmed around though (strings are limited to 255 characters by default, for example). Also, there are quite a few flavors of Pascal around, which make portability less than obvious. Pascal imitates C and C++ in characteristics, but seems to be less popular at the moment.

    This is why Pascal is seen as a good all-arounder, but watch out for future support.

  • (Visual) Basic

    The Visual Basic programming language comes with a plethora of preimplemented functionality, and it is easy to learn, very stable in use, and has low development times. It is not too fast in execution, however, and the footprint can become somewhat large because of the extra software needed to run an executable (libraries, interpreter).

    This is why (Visual) Basic is a good choice for quickly building programs, user interfaces, and prototypes that aren't time critical (execution slowness is even a plus when writing those first prototypes).

  • Java

    The Java programming language is easy to learn, has a lot of standard solutions, is robust in usage, has a relatively fast development time and, most importantly, is portable, even without recompilation. The price paid for this is that it is not that fast nor is it optimal in footprint.

    Conclusion: Non–time-critical, portable applications; Internet software, downloadable executables, prototypes.

  • Assembly

    The Assembly programming language is prominently fast, small, and easily optimizable, if used expertly. Development times, as learning times, are quite high, and this language does not lend itself to writing large programs with a complex design.

    Conclusion: Small, time-critical programs or program parts. Drivers, interrupt routines, close hardware interaction programs.

Two final remarks on choosing a programming language:

Remember that developers work better with languages they like. A developer using his favorite language will enjoy his work more and is prone to putting extra effort into producing quality software.

In some cases, it is possible, even advisable, to try to mix different programming languages . The next section discusses this in detail.

Mixing Languages

This section describes ways of mixing programming languages, allowing the choice of beneficial language characteristics to be taken on a lower or more modular level—per library or program module, for instance. Before unnecessary complicating matters, however, determine whether you actually need the different language segments to communicate within a single executable file. If not, you can opt for easier, external communication between components created from different languages. Think of using files, pipes, COM/CORBA, or network communication protocols (TCP/IP can also be used between two programs or processes that run on the same computer). For example, a program generated from Turbo Pascal could write data to a file that, in turn, is read by a program running on a Java engine. Similarly, as in Listing 4.14, a C program could route its output to STDOUT which is captured by a C++ program reading from STDIN.

Code Listing 4.14. Routing Data from a C-Compiled Source to a C++-Compiled Source using an External File
/*
 C:
*/
void Send(char * buffer)

{
    printf("%s", buffer);
}

//
// C++:
//
void Communicator::Read(char & buffer)
{
    cin >>
							
							 buffer;
}

Using this kind of external communication does away with the need for more complex, in- executable communication. There are some cons to external communication to keep in mind, though:

  • It is considerably slower than in-executable communication.

  • It is dependent on the performance and availability of the external medium (disk, hard drive, network drive, and so on).

  • Locking problems for read/write actions can be quite invisible and difficult to solve.

For some problem domains, then, it is necessary to look for in-executable solutions. This section looks at two mixing techniques closely related to optimizing C and C++: mixing C and C++ and mixing C (or C++) with Assembly.

Mixing C and C++

This example of mixing C with C++ takes a C object containing a single function and links it with a C++ program. The C++ program contains a main function that calls the linked C function. The C object looks like the code in Listing 4.15.

Code Listing 4.15. A Simple C Program my_c.c
#include <stdio.h>

void my_c_fn()
{
   printf("It's 'C'that we're talking! n");
}

As this object does not have a main function, there is no point in trying to create an executable. Therefore, create an object file using the command

gcc -c my_c.c

The result of this command is the file my_c.o.

Listing 4.16 shows the C++ program that will use the C object.

Code Listing 4.16. Invoking C Functions from C++—my_cpp.cpp
#include <streams.h>

extern "C" void my_c_fn();            // declare our C function

int main(int argc, char *argv[])
{
    cout << "We're in C++ now!" << endl;
    my_c_fn();
}

Create an executable containing both the C++ and C functionality by compiling the file with the command

g++ my_cpp.cpp my_c.o -o my_cpp

Note that both files need to be located in the same directory for the build to work.

When you run the generated executable (my_cpp) the following output will be displayed:

We're in C++ now!
It's 'C'that we're talking!

It is possible to incorporate C functions in C++ by declaring them as external. Moreover, you do not have to declare every single function as external, as in Listing 4.17. Instead, you can declare a whole (source) file external in one go.

Code Listing 4.17. A File of C Functions for C++
#ifdef __cplusplus
extern "C" {
#endif

/* The C-functions here... */


#ifdef _
								
								
								_cplusplus
}
#endif

Now let's look at how to tackle Assembly.

Mixing C/C++ and Assembly

Some C compilers allow the use of Assembly statements in the source code, effectively mixing C and Assembly within a single source file (Aztec C, Turbo C, GNU C, and so on). This use of Assembly statements within other programming languages is called inline Assembly . To notify the compiler that it should switch to parsing Assembly, a keyword is used. Depending on the compiler, the keyword will be something like asm, #asm, or asm(code). It is possible that the compiler also needs an extra option to ensure it will recognize the Assembly statements.

Turbo C example of mixing C and Assembly:

mul(int a, b)
{
    asm    mov ax, word ptr 4[bp]
    asm    word ptr 6[bp]
}

gcc example of mixing C and Assembly:

void do_nothing()
{
    asm("nop");
}

gcc allows only a single Assembly statement per line so a function with multiple lines looks like Listing 4.18.

Code Listing 4.18. Multiple Inline Assembly Lines Within a C++ Source for gcc
#include <stdio.h>

void change_xyz (int x, int y)
{
   int z = 3;

   printf("x.y.z %d %d %d n", x, y, z);
   asm("
       movl $4,8(%ebp)
       movl $5,12(%ebp)
       movl $6,12(%esp)
   ");
   printf("x.y.z %d %d %d n", x, y, z);
}

int main(int argc, char *argv[])
{
   change
								
								
								_xyz(1,2);
}

This program can be compiled with the command:

gcc asm_examp.cpp -o asm_examp

Although small, this example does a number of interesting things. The function change_xyz() will change the value of x and y via the "base pointer", and the value of z (created locally) via the "stack pointer." Note that line 10, movl $6, 12(%esp) can be replaced by movl $6, -4(%esp), which does the same. The result of executing this program is the output

x.y.z 1 2 3
x.y.z 4 5 6

But what does all that %ebp and %esp stuff actually mean? It all has to do with the way of accessing the variables in Assembly. As the variables x and y are passed to the function by value, they reside on the stack in the order in which they are declared—from left to right, first x and then y. You get to the stack via the base pointer, which points to the memory address eight bytes below the first variable. Where you say x in C, you have to say 8(%ebp). And y in this example becomes 12(%ebp). Note that the variables are placed four bytes apart as the integer size is four bytes.

The variable z is a local variable and is placed under the base pointer, -4(%ebp) . If there had been a second local variable (int z, q ) this second variable would be placed directly after x , so q would be -8(%ebp). Refer to chapter 8, "Functions," for more detail.

Now let's look at what happens to variables which are passed by reference. The reference variables are, in fact, placed on the stack exactly the same way as the variables that are passed by value. So the following example

void TestObject::PassByRef(int & a){ ..}

locates a reference to a at 8(%ebp). However, because this is still a reference, you need to read the value of this address to find the address of the actual object (a in this case). The statement a = 5 thus becomes

asm("
movl 8(%ebp), %eax
movl $5, (%eax)
")

%esp has not been mentioned yet though. Similar to the base pointer used in the explanations, %esp is a pointer to the stack. The only difference is that esp and ebp point to different positions within the stack. It is left as an exercise to the reader to determine the difference between esp and ebp; you already know that -4(%ebp) is equal to 12(%esp).

The Assembly statements used as inline Assembly do not have to be written completely from scratch. Instead, you can ask the compiler to generate an Assembly listing for you (gcc -S). This way, you can write a first setup of a function in C or C++, have the compiler translate it, and reuse the generated Assembly in the source files. This process allows you to make optimizations to generated Assembly with the added advantage that the optimizations will not be lost the next time you compile your sources!

Remember, though, that the use of inline Assembly will make your sources less portable because the Assembly statements are written for one specific processor (not, however, one specific operating system, this depends on whether you use OS calls). The Assembly used in this chapter, for example, will run only on 80x86 compatible processors.

Inline Assembly in Developer Studio

As the examples are slightly different when using Microsoft's Developer Studio C++ compiler, this section briefly highlights the main differences.

The inline Assembly keyword Developer Studio is __asm. Assembly statements can be written individually when preceded by the keyword

__asm mov eax,2 ; move the number 2
__asm add eax,2 ; add the number 2

or even

__asm mov eax,2     __asm add eax,2

Assembly statements can also be grouped with a single keyword:

__asm
{
        __asm mov eax,2 ; move the number 2
        __asm add eax,2 ; add the number 2
}

Referring to function parameters is child's play as the symbolic names can simply be used inside the Assembly code as in Listing 4.19.

Code Listing 4.19. Referring to Function Parameters from In-lined Assembly Within a C++ Source for DeveloperStudio
int DoWop(int dow, int op)
{
        __asm
          {
                mov eax, dow    ; retrieve dow
                add eax, op     ; retrieve op
                ; leave result in eax to be returned.
          }
}

The C/C++ call int a = DoWop(1,2); will now result in the freshly created variable a receiving the value 3.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset