Chapter 1. The Nature of the Beast

In this book we are referring to C​+​+ as a “beast.” This isn’t from any lack of love or understanding; it comes from a deep respect for the power, scope, and complexity of the language,1 the monstrous size of its installed base, number of users, existing lines of code, developed libraries, available tools, and shipping projects.

For us, C​+​+ is the language of choice for expressing our solutions in code. Still, we would be the first to admit that users need to mind the teeth and claws of this magnificent beast. Programming in C​+​+ requires a discipline and attention to detail that may not be required of kinder, gentler languages that are not as focused on performance or giving the programmer ultimate control over execution details. For example, many other languages allow programmers the opportunity to ignore issues surrounding acquiring and releasing memory. C​+​+ provides powerful and convenient tools for handling resources generally, but the responsibility for resource management ultimately rests with the programmer. An undisciplined approach can have disastrous consequences.

Is it necessary that the claws be so sharp and the teeth so bitey? In other popular modern languages like Java, C#, JavaScript, and Python, ease of programming and safety from some forms of programmer error are a high priority. But in C​+​+, these concerns take a back seat to expressive power and performance.

Programming makes for a great hobby, but C​+​+ is not a hobbyist language.2 Software engineers don’t lose sight of programming ease of use and maintenance, but when designing C​+​+, nothing has or will stand in the way of the goal of creating a truly general-purpose programming language that can be used in the most demanding software engineering projects.

Whether the demanding requirements are high performance, low memory footprint, low-level hardware control, concurrency, high-level abstractions, robustness, or reliable response times, C​+​+ must be able to do the job with reasonable build times using industry-standard tool chains, without sacrificing portability across hardware and OS platforms, compatibility with existing libraries, or readability and maintainability.

Exposure to the teeth and claws is not just the price we pay for this power and performance—sometimes, sharp teeth are exactly what you need.

C​+​+: What’s It Good For?

C​+​+ is in use by millions3 of professional programmers working on millions of projects. We’ll explore some of the features and factors that have made C​+​+ the language of choice in so many situations. The most important feature of C​+​+ is that it is both low- and high-level. Due to that, it is able to support projects of all sizes, ensuring a small prototype can continue scaling to meet ever-increasing needs.

High-Level Abstractions at Low Cost

Well-chosen abstractions (algorithms, types, mechanisms, data structures, interfaces, etc.) greatly simplify reasoning about programs, making programmers more productive by not getting lost in the details and being able to treat user-defined types and libraries as well-understood and well-behaved building blocks. Using them, developers are able to conceive of and design projects of much greater scope and vision.

The difference in performance between code written using high-level abstractions and code that does the same thing but is written at a much lower level4 (at a greater burden for the programmer) is referred to as the “abstraction penalty.”

As an example: C​+​+ introduced an I/O model based on streams. The streams model offers an interface that is, in the common case, slightly slower than using native operating system calls. However, in most cases, it is fast enough that programmers choose the superior portability, flexibility, and type-safety of streams to faster but less-friendly native calls.

C​+​+ has features (user-defined types, type templates, algorithm templates, type aliases, type inference, compile-time introspection, runtime polymorphism, exceptions, deterministic destruction, etc.) that support high-level abstractions and a number of different high-level programming paradigms. It doesn’t force a specific programming paradigm on the user, but it does support procedural, object-based, object-oriented, generic, functional, and value-semantic programming paradigms and allows them to easily mix in the same project, facilitating a tailored approach for each part.

While C​+​+ is not the only language that offers this variety of approaches, the number of languages that were also designed to keep the abstraction penalty as low as possible is far smaller.5 Bjarne Stroustrup, the creator of C++, refers to his goal as “the zero-overhead principle,” which is to say, no abstraction penalty.

A key feature of C​+​+ is the ability of programmers to create their own types, called user-defined types (UDTs), which can have the power and expressiveness of built-in types or fundamentals. Almost anything that can be done with a fundamental type can also be done with a user-defined type. A programmer can define a type that functions as if it is a fundamental data type, an object pointer, or even as a function pointer.

C​+​+ has so many features for making high-quality, easy to use libraries that it can be thought of as a language for building libraries. Libraries can be created that allow users to express themselves in a natural syntax and still be powerful, efficient, and safe. Libraries can be designed that have type-specific optimizations and to automatically clean up resources without explicit user calls.

It is possible to create libraries of generic algorithms and user-defined types that are just as efficient or almost as efficient as code that is not written generically.

The combination of powerful UDTs, generic programming facilities, and high-quality libraries with low abstraction penalties make programming at a much higher level of abstraction possible even in programs that require every last bit of performance. This is a key strength of C​+​+.

Low-Level Access When You Need It

C​+​+ is, among other things, a systems-programming language. It is capable of and designed for low-level hardware control, including responding to hardware interrupts. It can manipulate memory in arbitrary ways down to the bit level with efficiency on par with hand-written assembly code (and, if you really need it, allows inline assembly code). C​+​+, from its initial design, is a superset of C,6 which was designed to be a “portable assembler,” so it has the dexterity and memory efficiency to be used in OS kernels or device drivers.

One example of the kind of control offered by C​+​+ is the flexibility available for where user-defined types can be created. Most high-level languages create objects by running a construction function to initialize the object in memory allocated from the heap. C​+​+ offers that option, but also allows for objects to be created on the stack. Programmers have little control over the lifetime of objects created on the stack, but because their creation doesn’t require a call to the heap allocator, stack allocation is typically orders of magnitude faster. Due to its limitations, stack-based object allocation can’t be a general replacement for heap allocation, but in those cases where stack allocation is acceptable, C​+​+ programmers win by avoiding the allocator calls.

In addition to supporting both heap allocation and stack allocation, C​+​+ allows programmers to construct objects at arbitrary locations in memory. This allows the programmer to allocate buffers in which many objects can be very efficiently created and destroyed with great flexibility over object lifetimes.

Another example of having low-level control is in cache-aware coding. Modern processors have sophisticated caching characteristics, and subtle changes in the way the data is laid out in memory can have significant impact on performance due to such factors as look-ahead cache buffering and false sharing.7 C​+​+ offers the kind of control over data memory layout that programmers can use to avoid cache line problems and best exploit the power of hardware. Managed languages do not offer the same kind of memory layout flexibility. Managed language containers do not hold objects in contiguous memory, and so do not exploit look-ahead cache buffers as C​+​+ arrays and vectors do.

Wide Range of Applicability

Software engineers are constantly seeking solutions that scale. This is no less true for languages than for algorithms. Engineers don’t want to find that the success of their project has caused it to outgrow its implementation language.

Very large applications and large development teams require languages that scale. C​+​+ has been used as the primary development language for projects with hundreds of engineers and scores of modules.8 Its support for separate compilation of modules makes it possible to create projects where analyzing and/or compiling all the project code at once would be impractical.

A large application can absorb the overhead of a language with a large runtime cost, either in startup time or memory usage. But to be useful in applications as diverse as device drivers, plug-ins, CGI modules, and mobile apps, it is necessary to have as little overhead as possible. C​+​+ has a guiding philosophy of “you only pay for what you use.” What that means is that if you are writing a device driver that doesn’t use many language features and must fit into a very small memory footprint, C​+​+ is a viable option, where a language with a large runtime requirement would be inappropriate.

Highly Portable

C​+​+ is designed with a specific hardware model in mind, and this model has minimalistic requirements. This has made it possible to port C​+​+ tools and code very broadly, as machines built today, from nanocomputers to number-crunching behemoths, are all designed to implement this hardware model.

There are one or more C​+​+ tool chains available on almost all computing platforms.9 C​+​+ is the only high-level language alternative available on all of the top mobile platforms.10

Not only are the tools available, but it is possible to write portable code that can be used on all these platforms without rewriting.

With the consideration of tool chains, we have moved from language features to factors outside of the language itself. But these factors have important engineering considerations. Even a language with perfect syntax and semantics wouldn’t have any practical value if we couldn’t build it for our target platform.

In order for an engineering organization to seriously consider significant adoption of a language, it needs to consider availability of tools (including analyzers and other non-build tools), experienced engineers, software libraries, books and instructional material, troubleshooting support, and training opportunities.

Extra-language factors, such as the installed user base and industry support, always favor C​+​+ when a systems language is required and tend to favor C​+​+ when choosing a language for building large-scale applications.

Better Resource Management

In the introduction to this chapter, we discussed that other popular languages prioritize ease of programming and safety over performance and control. Nothing is a better example of the differences between these languages and C​+​+ than their approaches to memory management.

Most popular modern languages implement a feature called garbage collection, or GC. With this approach to memory management, the programmer is not required to explicitly release allocated memory that is no longer needed. The language runtime determines when memory is “garbage” and recycles it for reuse. The advantages to this approach may be obvious. Programmers don’t need to track memory, and “leaks” and “double dispose” problems11 are a thing of the past.

But every design decision has trade-offs, and GC is no exception. One issue with it is that collectors don’t recognize that memory has become garbage immediately. The recognition that memory needs to be released will happen at some unspecified future time (and for some, implementations may not happen at all—if, for example, the application terminates before it needs to recycle memory).

Typically, the collector will run in the background and decide when to recycle memory outside of the programmer’s control. This can result in the foreground task “freezing” while the collector recycles. Since memory is not recycled as soon as it is no longer needed, it is necessary to have an extra cushion of memory so that new memory can be allocated while some unneeded memory has not yet been recycled. Sometimes the cushion size required for efficient operation is not trivial.

An additional objection to GC from a C​+​+ point of view is that memory is not the only resource that needs to be managed. Programmers need to manage file handles, network sockets, database connections, locks, and many other resources. Although we may not be in a big hurry to release memory (if no new memory is being requested), many of these other resources may be shared with other processes and need to be released as soon as they are no longer needed.

To deal with the need to manage all types of resources and to release them as soon as they can be released, best-practice C​+​+ code relies on a language feature called deterministic destruction.

In C​+​+, one way that objects are instantiated by users is to declare them in the scope of a function, causing the object to be allocated in the function’s stack frame. When the execution path leaves the function, either by a function return or by a thrown exception, the local objects are said to have gone out of scope.

When an object goes out of scope, the runtime “cleans up” the object. The definition of the language specifies that objects are cleaned up in exactly the reverse order of their creation (reverse order ensures that if one object depends on another, the dependent is removed first). Cleanup happens immediately, not at some unspecified future time.

As we pointed out earlier, one of the key building blocks in C​+​+ is the user-defined type. One of the options programmers have when defining their own type is to specify exactly what should be done to “clean up” an object of the defined type when it is no longer needed. This can be (and in best practice is) used to release any resources held by the object. So if, for example, the object represents a file being read from or written to, the object’s cleanup code can automatically close the file when the object goes out of scope.

This ability to manage resources and avoid resource leaks leads to a programming idiom called RAII, or Resource Acquisition Is Initialization.12 The name is a mouthful, but what it means is that for any resource that our program needs to manage, from file handles to mutexes, we define a user type that acquires the resource when it is initialized and releases the resource when it is cleaned up.

To safely manage a particular resource, we just declare the appropriate RAII object in the local scope, initialized with the resource we need to manage. The resource is guaranteed to be cleaned up exactly once, exactly when the managing object goes out of scope, thus solving the problems of resource leaks, dangling pointers, double releases, and delays in recycling resources.

Some languages address the problem of managing resources (other than memory) by allowing programmers to add a finally block to a scope. This block is executed whenever the path of execution leaves the function, whether by function return or by thrown exception. This is similar in intent to deterministic destruction, but with this approach, every function that uses an object of a particular resource managing type would need to have a finally block added to the function. Overlooking a single instance of this would result in a bug.

The C​+​+ approach, using RAII, has all the convenience and clarity of a garbage-collected system, but makes better use of resources, has greater performance and flexibility, and can be used to manage resources other than memory. Generalizing resource management instead of just handling memory is a strong advantage of this approach over garbage collection and is the reason that most C​+​+ programmers are not asking that GC be added to the language.

Industry Dominance

C​+​+ has emerged as the dominant language in a number of diverse product categories and industries.13 What these domains have in common is either a need for a powerful, portable systems-programming language or an application-programming language with uncompromising performance. Some domains where C​+​+ is dominant or near dominant include search engines, web browsers, game development, system software and embedded computing, automotive, aviation, aerospace and defense contracting, financial engineering, GPS systems, telecommunications, video/audio/image processing, networking, big science projects, and ISVs.14

1 When we refer to the C​+​+ language, we mean to include the accompanying standard library. When we mean to refer to just the language (without the library), we refer to it as the core language.

2 Though some C​+​+ hobbyists go beyond most professional programmers’ day-to-day usage.

3 http://www.stroustrup.com/bs_faq.html#number-of-C++-users

4 For instance, one can (and people do) use virtual functions in C, but few will contest that p→vtable→foo(p) is clearer than p→foo().

5 Notable peers are the D programming language, Rust, and, to a lesser extent, Google Go, albeit with a much smaller installed base.

6 Being a superset of C also enhances the ability of C​+​+ to interoperate with other languages. Because C’s string and array data structures have no memory overhead, C has become the “connecting” interface for all languages. Essentially all languages support interacting with a C interface and C​+​+ supports this as a native subset.

7 http://www.drdobbs.com/parallel/eliminate-false-sharing/217500206

8 For a small sample of applications and operating systems written in C​+​+: http://www.stroustrup.com/applications.html

9 “An incomplete list of C​+​+ compilers”: http://www.stroustrup.com/compilers.html

10 C​+​+ is supported on iOS, Android, Windows Mobile, and BlackBerry: http://visualstudiomagazine.com/articles/2013/02/12/future-c-plus-plus.aspx

11 It would be hard to over-emphasize how costly these problems have been in non-garbage collected languages.

12 It may also stand for Responsibility Acquisition Is Initialization when the concept is extended beyond just resource management.

13 http://www.lextrait.com/vincent/implementations.html

14 Independent software vendors, the people that sell commercial applications for money. Like the creators of Office, Quicken, and Photoshop.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset