30
Developing Cross-Platform and Cross-Language Applications

C++ programs can be compiled to run on a variety of computing platforms, and the language has been rigorously defined to ensure that programming in C++ for one platform is very similar to programming in C++ for another. Yet, despite the standardization of the language, platform differences eventually come into play when writing professional-quality programs in C++. Even when development is limited to a particular platform, small differences in compilers can elicit major programming headaches. This chapter examines the necessary complication of programming in a world with multiple platforms and multiple programming languages.

The first part of this chapter surveys the platform-related issues that C++ programmers encounter. A platform is the collection of all of the details that make up your development and/or run-time system. For example, your platform may be the Microsoft Visual C++ 2017 compiler running on Windows 10 on an Intel Core i7 processor. Alternatively, your platform might be the GCC 7.2 compiler running on Linux on a PowerPC processor. Both of these platforms are able to compile and run C++ programs, but there are significant differences between them.

The second part of this chapter looks at how C++ can interact with other programming languages. While C++ is a general-purpose language, it may not always be the right tool for the job. Through a variety of mechanisms, you can integrate C++ with other languages that may better serve your needs.

CROSS-PLATFORM DEVELOPMENT

There are several reasons why the C++ language encounters platform issues. C++ is a high-level language, and the standard does not specify certain low-level details. For example, the layout of an object in memory is undefined by the standard and left to the compiler. Different compilers can use different memory layouts for objects. C++ also faces the challenge of providing a standard language and a Standard Library without a standard implementation. Varying interpretations of the specification among C++ compiler and library vendors can lead to trouble when moving from one system to another. Finally, C++ is selective in what the language provides as standard. Despite the presence of a Standard Library, programs often need functionality that is not provided by the language or the Standard Library. This functionality generally comes from third-party libraries or the platform, and can vary greatly.

Architecture Issues

The term architecture generally refers to the processor, or family of processors, on which a program runs. A standard PC running Windows or Linux generally runs on the x86 or x64 architecture, and older versions of Mac OS were usually found on the PowerPC architecture. As a high-level language, C++ shields you from the differences between these architectures. For example, a Core i7 processor may have a single instruction that performs the same functionality as six PowerPC instructions. As a C++ programmer, you don’t need to know what this difference is or even that it exists. One advantage to using a high-level language is that the compiler takes care of converting your code into the processor’s native assembly code format.

However, processor differences do sometimes rise up to the level of C++ code. The first one discussed, the size of integers, is very important if you are writing cross-platform code. The others you won’t face often, unless you are doing particularly low-level work, but still, you should be aware that they exist.

Size of Integers

The C++ standard does not define the exact size of integer types. The standard just says the following:

There are five standard signed integer types: signed char, short int, int, long int, and long long int. In this list, each type provides at least as much storage as those preceding it in the list.

The standard does give a few additional hints for the size of these types, but never an exact size. The actual size is compiler-dependent. Thus, if you want to write cross-platform code, you cannot really rely on these types.

Besides these standard integer types, the C++ standard does define a number of types that have clearly specified sizes, all defined in the <cstdint> header file, although some of the types are optional. Here is an overview:

TYPE DESCRIPTION
int8_t
int16_t
int32_t
int64_t
Signed integers of which the size is exactly 8, 16, 32, or 64 bits. This type is defined by the standard as being optional, although most compilers support it.
int_fast8_t
int_fast16_t
int_fast32_t
int_fast64_t
Signed integers with sizes of at least 8, 16, 32, or 64 bits. For these, the compiler should use the fastest integer type it has that satisfies the requirements.
int_least8_t
int_least16_t
int_least32_t
int_least64_t
Signed integers with sizes of at least 8, 16, 32, or 64 bits. For these, the compiler should use the smallest integer type it has that satisfies the requirements.
intmax_t An integer type with the maximum size supported by the compiler.
intptr_t An integer type big enough to store a pointer. This type is also optional, but most compilers support it.

There are also unsigned versions available, such as uint8_t, uint_fast8_t, and so on.

If you want to write cross-platform code, I recommend you to use these <cstdint> types instead of the basic integer types.

Binary Compatibility

As you probably already know, you cannot take a program written and compiled for a Core i7 computer and run it on a PowerPC-based Mac. These two platforms are not binary compatible because their processors do not support the same set of instructions. When you compile a C++ program, your source code is turned into binary instructions that the computer executes. That binary format is defined by the platform, not by the C++ language.

One solution to support platforms that are not binary compatible is to build each version separately with a compiler on each target platform.

Another solution is cross-compiling. When you are using platform X for your development, but you want your program to run on platforms Y and Z, you can use a cross-compiler on your platform X that generates binary code for platforms Y and Z.

You can also make your program open source. When you make your source available to the end user, she can compile it natively on her system and build a version of the program that is in the correct binary format for her machine. As discussed in Chapter 4, open-source software has become increasingly popular. One of the major reasons is that it allows programmers to collaboratively develop software and increase the number of platforms on which it can run.

Address Sizes

When someone describes an architecture as 32-bit, they most likely mean that the address size is 32 bits, or 4 bytes. In general, a system with a larger address size can handle more memory and might operate more quickly on complex programs.

Because pointers are memory addresses, they are inherently tied to address sizes. Many programmers are taught that pointers are always 4 bytes, but this is wrong. For example, consider the following code snippet, which outputs the size of a pointer:

int *ptr = nullptr;
cout << "ptr size is " << sizeof(ptr) << " bytes" << endl;

If this program is compiled and run on a 32-bit x86 system, the output will be as follows:

ptr size is 4 bytes

If you compile it with a 64-bit compiler and run it on an x64 system, the output will be as follows:

ptr size is 8 bytes

From a programmer’s point of view, the upshot of varying pointer sizes is that you cannot equate a pointer with 4 bytes. More generally, you need to be aware that most sizes are not prescribed by the C++ standard. The standard only says that a short integer has as much, or less, space than an integer, which has as much, or less, space than a long integer.

The size of a pointer is also not necessarily the same as the size of an integer. For example, on a 64-bit platform, pointers are 64 bit, but integers could be 32 bit. Casting a 64-bit pointer to a 32-bit integer will result in losing 32 critical bits! The standard does define an std::intptr_t integer type in <cstdint> which is an integer type at least big enough to hold a pointer. The definition of this type is optional according to the standard, but virtually all compilers support it.

Byte Order

All modern computers store numbers in a binary representation, but the representation of the same number on two platforms may not be identical. This sounds contradictory, but as you’ll see, there are two approaches to reading numbers that both make sense.

A single slot in your computer’s memory is usually a byte because most computers are byte addressable. Number types in C++ are usually multiple bytes. For example, a short may be 2 bytes. Imagine that your program contains the following line:

short myShort = 513;

In binary, the number 513 is 0000 0010 0000 0001. This number contains 16 ones and zeros, or 16 bits. Because there are 8 bits in a byte, the computer needs 2 bytes to store the number. Because each individual memory address contains 1 byte, the computer needs to split the number up into multiple bytes. Assuming that a short is 2 bytes, the number is split into two even parts. The higher part of the number is put into the high-order byte and the lower part of the number is put into the low-order byte. In this case, the high-order byte is 0000 0010 and the low-order byte is 0000 0001.

Now that the number has been split up into memory-sized parts, the only question that remains is how to store them in memory. Two bytes are needed, but the order of the bytes is unclear and, in fact, depends on the architecture of the system in question.

One way to represent the number is to put the high-order byte first in memory and the low-order byte next. This strategy is called big-endian ordering because the bigger part of the number comes first. PowerPC and SPARC processors use a big-endian approach. Some other processors, such as x86, arrange the bytes in the opposite order, putting the low-order byte first in memory. This approach is called little-endian ordering because the smaller part of the number comes first. An architecture may choose one approach or the other, usually based on backward compatibility. For the curious, the terms “big-endian” and “little-endian” predate modern computers by several hundred years. Jonathan Swift coined the terms in his eighteenth-century novel Gulliver’s Travels to describe the opposing camps of a debate about the proper end on which to break an egg.

Regardless of the ordering a particular architecture uses, your programs can continue to use numerical values without paying any attention to whether the machine uses big-endian ordering or little-endian ordering. The ordering only comes into play when data moves between architectures. For example, if you are sending binary data across a network, you may need to consider the ordering of the other system. One solution is to use the standard network byte ordering, which is always big-endian. So, before sending data across a network, you convert it to big-endian, and whenever you receive data from a network, you convert it from big-endian to the byte ordering of your system.

Similarly, if you are writing binary data to a file, you may need to consider what will happen when that file is opened on a system with opposite byte ordering.

Implementation Issues

When a C++ compiler is written, it is designed by a human being who attempts to adhere to the C++ standard. Unfortunately, the C++ standard is more than a thousand pages long and written in a combination of prose, language grammars, and examples. Two human beings implementing a compiler according to such a standard are unlikely to interpret every piece of prescribed information in the exact same way or to catch every single edge case. As a result, compilers will have bugs.

Compiler Quirks and Extensions

There is no simple rule for finding or avoiding compiler bugs. The best you can do is to stay up to speed on compiler updates and perhaps subscribe to a mailing list or newsgroup for your compiler. If you suspect that you have encountered a compiler bug, a simple web search for the error message or condition you have witnessed could uncover a workaround or patch.

One area that compilers are notorious for having trouble with is language additions that are added by recent updates to the standard. Although in recent years, vendors of major compilers are pretty quick in adding support for the latest features.

Another issue to be aware of is that compilers often include their own language extensions without making it obvious to the programmer. For example, variable-length stack-based arrays (VLAs) are not part of the C++ language; however, they are part of the C language. Some compilers support both the C and the C++ standard, and can allow the use of VLAs in C++ code. One such compiler is g++. The following compiles and runs as expected with the g++ compiler:

int i = 4;
char myStackArray[i];  // Not a standard language feature!

Some compiler extensions may be useful, but if there is a chance that you will switch compilers at some point, you should see if your compiler has a strict mode where it avoids using such extensions. For example, compiling the previous code with the -pedantic flag passed to g++ yields the following warning:

warning: ISO C++ forbids variable length array 'myStackArray' [-Wvla]

The C++ specification allows for a certain type of compiler-defined language extension through the #pragma mechanism. #pragma is a precompiler directive whose behavior is defined by the implementation. If the implementation does not understand the directive, it ignores it. For example, some compilers allow the programmer to turn compiler warnings off temporarily with #pragma.

Library Implementations

Most likely, your compiler includes an implementation of the C++ Standard Library. Because the Standard Library is written in C++, however, you aren’t required to use the implementation that came bundled with your compiler. You could use a third-party Standard Library that, for example, has been optimized for speed, or you could even write your own.

Of course, Standard Library implementers face the same problems that compiler writers face: the standard is subject to interpretation. In addition, certain implementations may make tradeoffs that are incompatible with your needs. For example, one implementation may optimize for speed, while another implementation may focus on using as little memory as possible for containers.

When working with a Standard Library implementation, or indeed any third-party library, it is important to consider the tradeoffs that the designers made during the development. Chapter 4 contains a more detailed discussion of the issues involved in using libraries.

Platform-Specific Features

C++ is a great general-purpose language. With the addition of the Standard Library, the language is packed full of so many features that a casual programmer could happily code in C++ for years without going beyond what is built in. However, professional programs require facilities that C++ does not provide. This section lists several important features that are provided by the platform or third-party libraries, not by the C++ language or the C++ Standard Library.

  • Graphical user interfaces: Most commercial programs today run on an operating system that has a graphical user interface, containing such elements as clickable buttons, movable windows, and hierarchical menus. C++, like the C language, has no notion of these elements. To write a graphical application in C++, you can use platform-specific libraries that allow you to draw windows, accept input through the mouse, and perform other graphical tasks. A better option is to use a third-party library, such as wxWidgets or Qt, that provides an abstraction layer for building graphical applications. These libraries often provide support for many different target platforms.
  • Networking: The Internet has changed the way we write applications. These days, most applications check for updates through the web, and games provide a networked multiplayer mode. C++ does not provide a mechanism for networking yet, though several standard libraries exist. The most common means of writing networking software is through an abstraction called sockets. A socket library implementation can be found on most platforms, and it provides a simple procedure-oriented way to transfer data over a network. Some platforms support a stream-based networking system that operates like I/O streams in C++. There are also third-party networking libraries available that provide a networking abstraction layer. These libraries often support many different target platforms. IPv6, the successor to IPv4, is gaining traction. Therefore, choosing a networking library that is IPv-independent would be a better choice than choosing one that only supports IPv4.
  • OS events and application interaction: In pure C++ code, there is little interaction with the surrounding operating system and other applications. The command-line arguments are about all you get in a standard C++ program without platform extensions. For example, operations such as copy and paste are not directly supported in C++. You can either use platform-provided libraries, or use third-party libraries that support multiple platforms. For example, both wxWidgets and Qt are examples of libraries that abstract the copy and paste operations and support multiple platforms.
  • Low-level files: Chapter 13 explains standard I/O in C++, including reading and writing files. Many operating systems provide their own file APIs, which are usually incompatible with the standard file classes in C++. These libraries often provide OS-specific file tools, such as a mechanism to get the home directory of the current user.
  • Threads: Concurrent threads of execution within a single program were not directly supported in C++03 or earlier. Since C++11, a threading support library has been included with the Standard Library, as explained in Chapter 23, and C++17 has added parallel algorithms, as discussed in Chapter 18. If you need more powerful threading functionality besides what the Standard Library provides, then you need to use third-party libraries. Examples are the Intel Threading Building Blocks (TBB), and The STE||AR Group High Performance ParalleX (HPX) library.

CROSS-LANGUAGE DEVELOPMENT

For certain types of programs, C++ may not be the best tool for the job. For example, if your Unix program needs to interact closely with the shell environment, you may be better off writing a shell script than a C++ program. If your program performs heavy text processing, you may decide that the Perl language is the way to go. If you need a lot of database interaction, then C# or Java might be a better choice. C# and the WPF framework might be better suited to write modern GUI applications, and so on. Still, if you do decide to use another language, you sometimes might want to be able to call into C++ code, for example, to perform some computational-expensive operations. Fortunately, there are some techniques you can use to get the best of both worlds—the unique specialty of another language combined with the power and flexibility of C++.

Mixing C and C++

As you already know, the C++ language is almost a superset of the C language. That means that almost all C programs will compile and run in C++. There are a few exceptions. Some exceptions have to do with the fact that a handful of C features are not supported by C++, for example, C supports variable-length arrays (VLAs), while C++ does not. Other exceptions usually have to do with reserved words. In C, for example, the term class has no particular meaning. Thus, it could be used as a variable name, as in the following C code:

int class = 1; // Compiles in C, not C++
printf("class is %d
", class);

This program compiles and runs in C, but yields an error when compiled as C++ code. When you translate, or port, a program from C to C++, these are the types of errors you will face. Fortunately, the fixes are usually quite simple. In this case, rename the class variable to classID and the code will compile. The other types of errors you’ll face are the handful of C features that are not supported by C++, but these are usually rare.

The ease of incorporating C code in a C++ program comes in handy when you encounter a useful library or legacy code that was written in C. Functions and classes, as you’ve seen many times in this book, work just fine together. A class method can call a function, and a function can make use of objects.

Shifting Paradigms

One of the dangers of mixing C and C++ is that your program may start to lose its object-oriented properties. For example, if your object-oriented web browser is implemented with a procedural networking library, the program will be mixing these two paradigms. Given the importance and quantity of networking tasks in such an application, you might consider writing an object-oriented wrapper around the procedural library. A typical design pattern that can be used for this is called the façade.

For example, imagine that you are writing a web browser in C++, but you are using a C networking library that contains the functions declared in the following code. Note that the HostHandle and ConnectionHandle data structures have been omitted for brevity.

// netwrklib.h
#include "HostHandle.h"
#include "ConnectionHandle.h"

// Gets the host record for a particular Internet host given
// its hostname (i.e. www.host.com)
HostHandle* lookupHostByName(char* hostName);

// Frees the given HostHandle
void freeHostHandle(HostHandle* host);

// Connects to the given host
ConnectionHandle* connectToHost(HostHandle* host);

// Closes the given connection
void closeConnection(ConnectionHandle* connection);

// Retrieves a web page from an already-opened connection
char* retrieveWebPage(ConnectionHandle* connection, char* page);

// Frees the memory pointed to by page
void freeWebPage(char* page);

The netwrklib.h interface is fairly simple and straightforward. However, it is not object-oriented, and a C++ programmer who uses such a library is bound to feel icky, to use a technical term. This library isn’t organized into a cohesive class and it isn’t even const-correct. Of course, a talented C programmer could have written a better interface, but as the user of a library, you have to accept what you are given. Writing a wrapper is your opportunity to customize the interface.

Before you build an object-oriented wrapper for this library, take a look at how it might be used as-is to gain an understanding of its actual usage. In the following program, the netwrklib library is used to retrieve the web page at www.wrox.com/index.html:

HostHandle* myHost = lookupHostByName("www.wrox.com");
ConnectionHandle* myConnection = connectToHost(myHost);
char* result = retrieveWebPage(myConnection, "/index.html");

cout << "The result is " << result << endl;

freeWebPage(result);
closeConnection(myConnection);
freeHostHandle(myHost);

A possible way to make the library more object-oriented is to provide a single abstraction that recognizes the links between looking up a host, connecting to the host, and retrieving a web page. A good object-oriented wrapper hides the needless complexity of the HostHandle and ConnectionHandle types.

This example follows the design principles described in Chapters 5 and 6: the new class should capture the common use case for the library. The previous example shows the most frequently used pattern: first a host is looked up, then a connection is established, and finally a page is retrieved. It is also likely that subsequent pages will be retrieved from the same host, so a good design will accommodate that mode of use as well.

To start, the HostRecord class wraps the functionality of looking up a host. It’s an RAII class. Its constructor uses lookupHostByName() to perform the lookup, and its destructor automatically frees the retrieved HostHandle. Here is the code:

class HostRecord
{
    public:
        // Looks up the host record for the given host
        explicit HostRecord(std::string_view host);
        // Frees the host record
        virtual ~HostRecord();
        // Returns the underlying handle.
        HostHandle* get() const noexcept;
    private:
        HostHandle* mHostHandle = nullptr;
};

HostRecord::HostRecord(std::string_view host)
{
    mHostHandle = lookupHostByName(const_cast<char*>(host.data()));
}

HostRecord::~HostRecord()
{
    if (mHostHandle)
        freeHostHandle(mHostHandle);
}

HostHandle* HostRecord::get() const noexcept
{
    return mHostHandle;
}

Because the HostRecord class deals with C++ string_views instead of C-style strings, it uses the data() method on host to obtain a const char*, then performs a const_cast() to make up for netwrklib’s const-incorrectness.

Next, a WebHost class can be implemented that uses the HostRecord class. The WebHost class creates a connection to a given host and supports retrieving webpages. It’s also an RAII class. When the WebHost object is destroyed, it automatically closes the connection to the host. Here is the code:

class WebHost
{
    public:
        // Connects to the given host
        explicit WebHost(std::string_view host);
        // Closes the connection to the host
        virtual ~WebHost();
        // Obtains the given page from this host
        std::string getPage(std::string_view page);
    private:
        ConnectionHandle* mConnection = nullptr;
};

WebHost::WebHost(std::string_view host)
{
    HostRecord hostRecord(host);
    if (hostRecord.get()) {
        mConnection = connectToHost(hostRecord.get());
    }
}

WebHost::~WebHost()
{
    if (mConnection)
        closeConnection(mConnection);
}

std::string WebHost::getPage(std::string_view page)
{
    std::string resultAsString;
    if (mConnection) {
        char* result = retrieveWebPage(mConnection,
            const_cast<char*>(page.data()));
        resultAsString = result;
        freeWebPage(result);
    }
    return resultAsString;
}

The WebHost class effectively encapsulates the behavior of a host and provides useful functionality without unnecessary calls and data structures. The implementation of the WebHost class makes extensive use of the netwrklib library without exposing any of its workings to the user. The constructor of WebHost uses a HostRecord RAII object for the specified host. The resulting HostRecord is used to set up a connection to the host, which is stored in the mConnection data member for later use. The HostRecord RAII object is automatically destroyed at the end of the constructor. The WebHost destructor closes the connection. The getPage() method uses retrieveWebPage() to retrieve a web page, converts it to an std::string, uses freeWebPage() to free memory, and returns the std::string.

The WebHost class makes the common case easy for the client programmer:

WebHost myHost("www.wrox.com");
string result = myHost.getPage("/index.html");
cout << "The result is " << result << endl;

As you can see, the WebHost class provides an object-oriented wrapper around the C library. By providing an abstraction, you can change the underlying implementation without affecting client code, and you can provide additional features. These features can include connection reference counting, automatically closing connections after a specific time to adhere to the HTTP specification and automatically reopening the connection on the next getPage() call, and so on.

Linking with C Code

The previous example assumed that you had the raw C code to work with. The example took advantage of the fact that most C code will successfully compile with a C++ compiler. If you only have compiled C code, perhaps in the form of a library, you can still use it in your C++ program, but you need to take a few extra steps.

Before you can start using compiled C code in your C++ programs, you first need to know about a concept called name mangling. In order to implement function overloading, the complex C++ namespace is “flattened.” For example, if you have a C++ program, it is legitimate to write the following:

void MyFunc(double);
void MyFunc(int);
void MyFunc(int, int);

However, this would mean that the linker would see several different names, all called MyFunc, and would not know which one you want to call. Therefore, all C++ compilers perform an operation that is referred to as name mangling and is the logical equivalent of generating names, as follows:

MyFunc_double
MyFunc_int
MyFunc_int_int

To avoid conflicts with other names you might have defined, the generated names usually have some characters that are legal to the linker but not legal in C++ source code. For example, Microsoft VC++ generates names as follows:

?MyFunc@@YAXN@Z
?MyFunc@@YAXH@Z
?MyFunc@@YAXHH@Z

This encoding is complex and often vendor-specific. The C++ standard does not specify how function overloading should be implemented on a given platform, so there is no standard for name mangling algorithms.

In C, function overloading is not supported (the compiler will complain about duplicate definitions). So, names generated by the C compiler are quite simple, for example, _MyFunc.

Now, if you compile a simple program with the C++ compiler, even if it has only one instance of the MyFunc name, it still generates a request to link to a mangled name. However, when you link with the C library, it cannot find the desired mangled name, and the linker complains. Therefore, it is necessary to tell the C++ compiler to not mangle that name. This is done by using the extern "language" qualification both in the header file (to instruct the client code to create a name compatible with the specified language) and, if your library source is in C++, at the definition site (to instruct the library code to generate a name compatible with the specified language).

Here is the syntax of extern "language":

extern "language" declaration1();
extern "language" declaration2();

or it can also be like this:

extern "language" {
    declaration1();
    declaration2();
}

The C++ standard says that any language specification can be used, so in principle, the following could be supported by a compiler:

extern "C" MyFunc(int i);
extern "Fortran" MatrixInvert(Matrix* M);
extern "Pascal" SomeLegacySubroutine(int n);
extern "Ada" AimMissileDefense(double angle);

In practice, many compilers only support "C". Each compiler vendor will inform you which language designators they support.

For example, in the following code, the function prototype for doCFunction() is specified as an external C function:

extern "C" {
    void doCFunction(int i);
}

int main()
{
    doCFunction(8); // Calls the C function.
    return 0;
}

The actual definition for doCFunction() is provided in a compiled binary file attached in the link phase. The extern keyword informs the compiler that the linked-in code was compiled in C.

A more common pattern for using extern is at the header level. For example, if you are using a graphics library written in C, it probably came with an .h file for you to use. You can write another header file that wraps the original one in an extern block to specify that the entire header defines functions written in C. The wrapper .h file is often named with .hpp to distinguish it from the C version of the header:

// graphicslib.hpp
extern "C" {
    #include "graphicslib.h"
}

Another common model is to write a single header file, which is conditioned on whether it is being compiled for C or C++. A C++ compiler predefines the symbol __cplusplus if you are compiling for C++. The symbol is not defined for C compilations. So, you will often see header files in the following form:

#ifdef __cplusplus
    extern "C" {
#endif
        declaration1();
        declaration2();
#ifdef __cplusplus
    } // matches extern "C"
#endif

This means that declaration1() and declaration2() are functions that are in a library compiled by the C compiler. Using this technique, the same header file can be used in both C and C++ clients.

Whether you are including C code in your C++ program or linking against a compiled C library, remember that even though C++ is almost a superset of C, they are different languages with different design goals. Adapting C code to work in C++ is quite common, but providing an object-oriented C++ wrapper around procedural C code is often much better.

Calling C++ Code from C#

Even though this is a C++ book, I won’t pretend that there aren’t snazzier languages out there. One example is C#. By using the Interop services from C#, it’s pretty easy to call C++ code from within your C# applications. An example scenario could be that you develop parts of your application, like the graphical user interface, in C#, but use C++ to implement certain performance-critical or computational-expensive components. To make Interop work, you need to write a library in C++, which can be called from C#. On Windows, the library will be in a .DLL file. The following C++ example defines a FunctionInDLL() function that will be compiled into a library. The function accepts a Unicode string and returns an integer. The implementation writes the received string to the console and returns the value 42 to the caller:

#include <iostream>

using namespace std;

extern "C"
{
    __declspec(dllexport) int FunctionInDLL(const wchar_t* p)
    {
        wcout << L"The following string was received by C++:
    '";
        wcout << p << L"'" << endl;
        return 42;    // Return some value…
    }
}

Keep in mind that you are implementing a function in a library, not writing a program, so you do not need a main() function. How you compile this code depends on your environment. If you are using Microsoft Visual C++, you need to go to the properties of your project and select “Dynamic Library (.dll)” as the configuration type. Note that the example uses __declspec(dllexport) to tell the linker that this function should be made available to clients of the library. This is the way you do it with Microsoft Visual C++. Other linkers might use a different mechanism to export functions.

Once you have the library, you can call it from C# by using the Interop services. First, you need to include the Interop namespace:

using System.Runtime.InteropServices;

Next, you define the function prototype, and tell C# where it can find the implementation of the function. This is done with the following line, assuming you have compiled the library as HelloCpp.dll:

[DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
public static extern int FunctionInDLL(String s);

The first part of this line is saying that C# should import this function from a library called HelloCpp.dll, and that it should use Unicode strings. The second part specifies the actual prototype of the function, which is a function accepting a string as parameter and returning an integer. The following code shows a complete example of how to use the C++ library from C#:

using System;
using System.Runtime.InteropServices;

namespace HelloCSharp
{
    class Program
    {
        [DllImport("HelloCpp.dll", CharSet = CharSet.Unicode)]
        public static extern int FunctionInDLL(String s);

        static void Main(string[] args)
        {
            Console.WriteLine("Written by C#.");
            int result = FunctionInDLL("Some string from C#.");
            Console.WriteLine("C++ returned the value " + result);
        }
    }
}

The output is as follows:

Written by C#.
The following string was received by C++:
    'Some string from C#.'
C++ returned the value 42

The details of the C# code are outside the scope of this C++ book, but the general idea should be clear with this example.

Calling C++ Code from Java with JNI

The Java Native Interface, or JNI, is a part of the Java language that allows programmers to access functionality that was not written in Java. Because Java is a cross-platform language, the original intent was to make it possible for Java programs to interact with the operating system. JNI also allows programmers to make use of libraries written in other languages, such as C++. Access to C++ libraries may be useful to a Java programmer who has a performance-critical or computational-expensive piece of code, or who needs to use legacy code.

JNI can also be used to execute Java code within a C++ program, but such a use is far less common. Because this is a C++ book, I do not include an introduction to the Java language. This section is recommended if you already know Java and want to incorporate C++ code into your Java code.

To begin your Java cross-language adventure, start with the Java program. For this example, the simplest of Java programs will suffice:

public class HelloCpp {
    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
    }
}

Next, you need to declare a Java method that will be written in another language. To do this, you use the native keyword and leave out the implementation:

public class HelloCpp {
    // This will be implemented in C++.
    public static native void callCpp();

    // Remainder omitted for brevity
}

The C++ code will eventually be compiled into a shared library that gets dynamically loaded into the Java program. You can load this library inside a Java static block so that it is loaded when the Java program begins executing. The name of the library can be whatever you want, for example, hellocpp.so on Linux systems, or hellocpp.dll on Windows systems.

public class HelloCpp {
    static {
        System.loadLibrary("hellocpp");
    }

    // Remainder omitted for brevity
}

Finally, you need to actually call the C++ code from within the Java program. The callCpp() Java method serves as a placeholder for the not-yet-written C++ code. Here is the complete Java program:

public class HelloCpp {
    static {
        System.loadLibrary("hellocpp");
    }

    // This will be implemented in C++.
    public static native void callCpp();

    public static void main(String[] args)
    {
        System.out.println("Hello from Java!");
        callCpp();
    }
}

That’s all for the Java side. Now, just compile the Java program as you normally would:

javac HelloCpp.java

Then use the javah program (I like to pronounce it as jav-AHH!) to create a header file for the native method:

javah HelloCpp

After running javah, you will find a file named HelloCpp.h, which is a fully working (if somewhat ugly) C/C++ header file. Inside of that header file is a C function definition for a function called Java_HelloCpp_callCpp(). Your C++ program will need to implement this function. The full prototype is as follows:

JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv*, jclass);

Your C++ implementation of this function can make full use of the C++ language. This example outputs some text from C++. First, you need to include the jni.h header file and the HelloCpp.h file that was created by javah. You also need to include any C++ headers that you intend to use:

#include <jni.h>
#include "HelloCpp.h"
#include <iostream>

The C++ function is written as normal. The parameters to the function allow interaction with the Java environment and the object that called the native code. They are beyond the scope of this example.

JNIEXPORT void JNICALL Java_HelloCpp_callCpp(JNIEnv*, jclass)
{
    std::cout << "Hello from C++!" << std::endl;
}

How to compile this code into a library depends on your environment, but you will most likely need to tweak your compiler’s settings to include the JNI headers. Using the GCC compiler on Linux, your compile command might look like this:

g++ -shared -I/usr/java/jdk/include/ -I/usr/java/jdk/include/linux 
HelloCpp.cpp -o hellocpp.so

The output from the compiler is the library used by the Java program. As long as the shared library is somewhere in the Java class path, you can execute the Java program normally:

java HelloCpp

You should see the following result:

Hello from Java!
Hello from C++!

Of course, this example just scratches the surface of what is possible through JNI. You could use JNI to interface with OS-specific features or hardware drivers. For complete coverage of JNI, you should consult a Java text.

Calling Scripts from C++ Code

The original Unix OS included a rather limited C library, which did not support certain common operations. Unix programmers therefore developed the habit of launching shell scripts from applications to accomplish tasks that should have had API or library support.

Today, many of these Unix programmers still insist on using scripts as a form of subroutine call. Usually, they execute the system() C library call with a string that is the script to execute. There are significant risks to this approach. For example, if there is an error in the script, the caller may or may not get a detailed error indication. The system() call is also exceptionally heavy-duty, because it has to create an entire new process to execute the script. This may ultimately be a serious performance bottleneck in your application.

Using system() to launch scripts is not further discussed in this text. In general, you should explore the features of C++ libraries to see if there are better ways to do something. There are some platform-independent wrappers around a lot of platform-specific libraries, for example, the Boost Asio library, which provides portable networking and other low-level I/O, including sockets, timers, serial ports, and so on. If you need to work with the filesystem, C++17 now includes a platform-independent <filesystem> API, as discussed in Chapter 20. Concepts like launching a Perl script with system() to process some textual data may not be the best choice. Using techniques like the regular expressions library of C++, see Chapter 19, might be a better choice for your string processing needs.

Calling C++ Code from Scripts

C++ contains a built-in general-purpose mechanism to interface with other languages and environments. You’ve already used it many times, probably without paying much attention to it—it’s the arguments to and return value from the main() function.

C and C++ were designed with command-line interfaces in mind. The main() function receives the arguments from the command line, and returns a status code that can be interpreted by the caller. In a scripting environment, arguments to and status codes from your program can be a powerful mechanism that allows you to interface with the environment.

A Practical Example: Encrypting Passwords

Assume that you have a system that writes everything a user sees and types to a file for auditing purposes. The file can be read only by the system administrator so that she can figure out who to blame if something goes wrong. An excerpt of such a file might look like this:

Login: bucky-bo
Password: feldspar

bucky-bo> mail
bucky-bo has no mail
bucky-bo> exit

While the system administrator may want to keep a log of all user activity, she may also want to obscure everybody’s passwords in case the file is somehow obtained by a hacker. She decides to write a script to parse the log files, and to use C++ to perform the actual encryption. The script then calls out to a C++ program to perform the encryption.

The following script uses the Perl language, though almost any scripting language could accomplish this task. Note also that these days, there are libraries available for Perl that perform encryption, but, for the sake of this example, let’s assume the encryption is done in C++. If you don’t know Perl, you will still be able to follow along. The most important element of the Perl syntax for this example is the ` character. The ` character instructs the Perl script to shell out to an external command. In this case, the script will shell out to a C++ program called encryptString.

The strategy for the script is to loop over every line of a file, userlog.txt, looking for lines that contain a password prompt. The script writes a new file, userlog.out, which contains the same text as the source file, except that all passwords are encrypted. The first step is to open the input file for reading and the output file for writing. Then, the script needs to loop over all the lines in the file. Each line in turn is placed in a variable called $line.

open (INPUT, "userlog.txt") or die "Couldn't open input file!";
open (OUTPUT, ">userlog.out") or die "Couldn't open output file!";
while ($line = <INPUT>) {

Next, the current line is checked against a regular expression to see if this particular line contains the Password: prompt. If it does, Perl stores the password in the variable $1.

    if ($line =~ m/^Password: (.*)/) {

If a match is found, the script calls the encryptString program with the detected password to obtain an encrypted version of it. The output of the program is stored in the $result variable, and the result status code from the program is stored in the variable $?. The script checks $? and quits immediately if there is a problem. If everything is okay, the password line is written to the output file with the encrypted password instead of the original one.

        $result = `./encryptString $1`;
        if ($? != 0) { exit(-1); }
        print OUTPUT "Password: $result
";
    } else {

If the current line is not a password prompt, the script writes the line as-is to the output file. At the end of the loop, it closes both files and exits.

        print OUTPUT "$line";
    }
}
close (INPUT);
close (OUTPUT);

That’s it. The only other required piece is the actual C++ program. Implementation of a cryptographic algorithm is beyond the scope of this book. The important piece is the main() function because it accepts the string that should be encrypted as an argument.

Arguments are contained in the argv array of C-style strings. You should always check the argc parameter before accessing an element of argv. If argc is 1, there is one element in the argument list and it is accessible as argv[0]. The 0th element of the argv array is generally the name of the program, so actual parameters begin at argv[1].

Following is the main() function for a C++ program that encrypts the input string. Notice that the program returns 0 for success and non-0 for failure, as is standard in Linux.

int main(int argc, char* argv[])
{
    if (argc < 2) {
        cerr << "Usage: " << argv[0] << " string-to-be-encrypted" << endl;
        return -1;
    }
    cout << encrypt(argv[1]);
    return 0;
}

Now that you’ve seen how easily C++ programs can be incorporated into scripting languages, you can combine the strengths of the two languages for your own projects. You can use a scripting language to interact with the operating system and control the flow of the script, and a traditional programming language like C++ for the heavy lifting.

Calling Assembly Code from C++

C++ is considered a fast language, especially relative to other languages. Yet, in some rare cases, you might want to use raw assembly code when speed is absolutely critical. The compiler generates assembly code from your source files, and this generated assembly code is fast enough for virtually all purposes. Both the compiler and the linker (when it supports link time code generation) use optimization algorithms to make the generated assembly code as fast as possible. These optimizers are getting more and more powerful by using special processor instruction sets such as MMX, SSE, and AVX. These days, it’s very hard to write your own assembly code that outperforms the code generated by the compiler, unless you know all the little details of these enhanced instruction sets.

However, in case you do need it, the keyword asm can be used by a C++ compiler to allow the programmer to insert raw assembly code. The keyword is part of the C++ standard, but its implementation is compiler-defined. In some compilers, you can use asm to drop from C++ down to the level of assembly right in the middle of your program. Sometimes, the support for the asm keyword depends on your target architecture. For example, Microsoft VC++ 2017 supports the asm keyword when compiling in 32-bit mode, but asm is not supported when compiling in 64-bit mode.

Assembly code can be useful in some applications, but I don’t recommend it for most programs. There are several reasons to avoid assembly code:

  • Your code is no longer portable to another processor once you start including raw assembly code for your platform.
  • Most programmers don’t know assembly languages and won’t be able to modify or maintain your code.
  • Assembly code is not known for its readability. It can hurt your program’s use of style.
  • Most of the time, it is not necessary. If your program is slow, look for algorithmic problems, or consult some of the other performance suggestions from Chapter 25.

Practically, if you have a computationally expensive block of code, you should move it to its own C++ function. If you determine, using performance profiling (see Chapter 25), that this function is a performance bottleneck, and there is no way to write the code smaller and faster, you might use raw assembly code to try to increase its performance.

In such a case, one of the first things you want to do is declare the function extern "C" so the C++ name mangling is suppressed. Then, you can write a separate module in assembly code that performs the function more efficiently. The advantage of a separate module is that there is both a “reference implementation” in C++ that is platform-independent, and also a platform-specific high-performance implementation in raw assembly code. The use of extern "C" means that the assembly code can use a simple naming convention (otherwise, you have to reverse-engineer your compiler’s name mangling algorithm). Then, you can link with either the C++ version or the assembly code version.

You would write this module in assembly code and run it through an assembler, rather than using inline asm directives in C++. This is particularly true in many of the popular x86-compatible 64-bit compilers, where the inline asm keyword is not supported.

However, you should only use raw assembly code if there are significant performance improvements. Increasing the performance by a factor of 2 might possibly justify the effort. A factor of 10 is compelling. An improvement of 10 percent is not worth the effort.

SUMMARY

If you take away one point from this chapter, it should be that C++ is a flexible language. It exists in the sweet spot between languages that are too tied to a particular platform, and languages that are too high-level and generic. Rest assured that when you develop code in C++, you aren’t locking yourself into the language forever. C++ can be mixed with other technologies, and has a solid history and code base that will help guarantee its relevance in the future.

In Part V of this book, I discussed software engineering methods, writing efficient C++, testing and debugging techniques, design techniques and patterns, and cross-platform and cross-language application development. This is a terrific way to end your journey through Professional C++ programming because these topics help good C++ programmers become great C++ programmers. By thinking through your designs, experimenting with different approaches in object-oriented programming, selectively adding new techniques to your coding repertoire, and practicing testing and debugging techniques, you’ll be able to take your C++ skills to the professional level.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset