2How to select an OS

In “the old days,” you would pick a CPU first and then discuss the operating system (OS)—after discussing whether such was really necessary at all. Today, it is more common to choose the OS first and then a CPU, or a CPU-family. In reality, it is often an iterative process. You may, for example, decide to go for Linux, but this requires a MMU (memory management unit), and this may drive you toward too big and expensive CPUs. In that case, you may have to redo you original choice. Table 2.1 is a “kick-start” to this chapter. It shows an overall trend from simple to advanced.

Table 2.1: Task management—from simple to advanced.

OS/Kernel/Language Type
Simple main Strictly polling
Ruby Co-routines
Modula-2 Co-routines
Windows 3 Nonpreemptive scheduler
ARM mbed simple Interrupts + main with FSM
OS-9 Preemptive real-time kernel
Enea OSE Preemptive real-time kernel
Windows CE Preemptive real-time kernel
QNX Neutrino Preemptive real-time kernel
SMX Preemptive real-time kernel
Windows NT Preemptive OS
ARM mbed advanced Preemptive scheduler
Linux Preemptive OS
RT-Linux Preemptive real-time OS
VxWorks Preemptive real-time OS

We will dive into the various types and degrees of operating systems and their pros and cons. Along the way, important parameters are introduced. The solutions are ordered in the most reader-friendly way toward a full preemptive real time operating system (RTOS).

2.1No OS and strictly polling

The simplest embedded system has no operating system, leaving some low-level details to the programmer. If you are using “C” there is a main() function from which your “official” program starts at power-up. Since there is no OS, this must be assured by configuring the compiler, linker, and locater.2 It is necessary to initially call a small assembly program that copies the program to RAM, disables interrupts, clears the data-area, and prepares the stack and stack-pointer.

I once used a compiler package that did all the above. Unfortunately, the vendor had forgotten the call that executes the code with all the global C-variable initializations, something that you normally take for granted. So after realizing this, I had the choice between completing the tool myself or remembering to initialize all global variables explicitly in a special “init” function in main(). This is a typical example of the difference between programming in the embedded world and on a PC, where the tools are more “polished” than those for the smaller systems.

In an OS-less system, main() has an infinite loop that could look like this:

Listing 2.1: Round-robin scheduling

This is a “round-robin” scheme with the slight enhancement that JobA has received more “attention” (not really priority) by giving it access to the CPU with shorter intervals than the other processes. Within each job, we read the relevant inputs from our code when we have the time. This is known as “polling.” We might even make a loop where we test an input again and again until it goes from one state to another. This is called “busy-waiting” as the CPU does nothing else than loop. Introducing such a loop, in say JobB, is a disaster for JobA and JobC—they will not be executed until this state-change occurs. And what if the state-change we are waiting for in this loop is actually depending on JobA or JobC doing something? In such a scenario, we have a deadlock. Another problem with a busy-wait loop is that you waste a lot of energy, as the CPU is not allowed to go into any form of power-saving. So busy-waiting in a loop may at times be okay, but not in a system as simple as this.

Another concept, still without any operating system whatsoever, but a lot more clever, is to introduce finite state machines (FSMs) where you read all inputs, decide what has changed, and take action as shown in Listing 2.2.

Listing 2.2: Main with finite state machine

Listing 2.3 is one of three finite state machines that together control a TOE —TCP offload engine. The TOE implements the actual transmissions of TCP in hardware while the rest is handled in the embedded software, via the FSMs. Later, we will look into sockets and TCP, and it can be seen that the listing very directly represents a good part of Figure 7.8, which is a graphic representation of the TCP connection states. For now, it is more relevant to look into the concept of a FSM.

Each column is a state of a TCP socket, that at a given time is the “current state.” Each row represents an event that occurs while in this state, for example, an ACK (acknowledge) has been received. Each element in the table contains the action to take, as well as the next state. In order to fit the table into this book, it has been split in two. In the real C-code, the part after “table continuing here” is placed to the right of the lines above, so that we have a table with 7 rows and 7 columns (it is coincidental that the number of states and events is the same). FSMs are not just practical in a simple OS-less system but can be used anywhere. The FSM shown in Listing 2.3 was used in a Linux system. FSMs are very popular among hardware designers, but not used by many software designers, which is a pity. Nevertheless, many modern frameworks contain FSMs inside, offering “event-driven” models.

Listing 2.3: One of three finite state machines for a TOE

One nice thing about FSMs is that they are both implementation and documentation, and they typically can fit into a single page on the screen. Compared to a number of if-else or switch clauses, the use of FSMs is much “cleaner” and, therefore, much easier to create and keep error-free. With the good overview, it is easy to spot a missing combination of a state and an event. It is also a compact solution. In the example, the same FSM-code works for all sockets; we only need to keep the current state per socket, and the statemachine is called with two parameters only: the socket-handle and the incoming event. Incidentally, when not coding in C++ but in C, it is a common pattern that the first parameter is the “object you do not have.” Thus, if the C++ version is socket->open(a, b), this becomes open(socket, a, b) in C.

Figure 2.1: Finite state machine in main.

Figure 2.1 shows an OS-less system. The main advantage is simplicity. There is no third-party OS that you need to understand and get updates of. This can be very relevant if the system is to last many years. A part of this simplicity is that the application may read inputs and write to outputs directly. There is no “driver” concept. This can be practical in a small system with a single developer, but it is also the downside, as a simple error can lead to disasters. Figure 2.1 introduces a small setup that we will see a couple of more times:

Input 1 – leading to some processing and eventually to a change in output 1.

Input 2 – leading to some processing and eventually to a change in output 2.

Input 3 – leading to some processing and eventually to a change in outputs 3 and 4.

2.2Co-routines

Co-routines are not like the tasks (which we will get back to) of an OS, but they do have similar traits:

  1. There can be many instances of the same co-routine, typically one per resource, for example, an actor in a game or a cell in a bio-simulation.
  2. Each instance can pause at some point while the CPU executes something else. It keeps its state and can continue on from the given point.
  3. This pause must be invoked by the co-routine itself by “yielding” to another coroutine. There is, however, no caller and callee. This is supported by some languages, not “C” but, for example, by Ruby and Modula-2.
    Co-routines are mostly of academic interest in today’s embedded world. They might come in fashion again—you never know.

2.3Interrupts

Instead of polling the various inputs, interrupts are generated when inputs change. One or more interrupt routines read the inputs and take action. An interrupt is what happens when an external event in hardware, asynchronously triggers a change in the execution flow. Typically, a given pin on the CPU is mapped to an interrupt-number. In a fixed place in the memory-layout, you find the interrupt-vector, which is an array with a fixed number of bytes per interrupt—containing mainly the address that the CPU must jump to. This is the address of the interrupt service routine (ISR). When entering the ISR, most CPUs have all interrupts disabled.3

In such a “purely interrupt-controlled system,” the interrupt service routine may in principle do everything that is related to a given event. See Figure 2.2.

Figure 2.2: Purely interrupt controlled system.

There are many variants of such a system:

  1. Each input has its own interrupt service routine (ISR), and its own interrupt priority. Thus one interrupt may interrupt the main program (stacking the registers it plans to use), and then this may become interrupted by the next, higher level interrupt, etc. This is known as nested interrupts and typically only can happen if the original ISR has reenabled interrupts. This can be done by the OS, if such exists, before giving control to the programmer, or by the programmer himself in our current OS-less case. Nested interrupts is very normal in the bigger systems, but if all actions to inputs are done inside the interrupt routines, it becomes very important that the nested interrupt “understands” the exact state of the system. This again depends on how far we got in the interrupted interrupt, which is almost impossible to know. This is one of the many reasons why you should defer as much of the action as possible, to something that runs later, at a lower priority. But then it’s no longer a purely interrupt-based system. Furthermore, many systems do not have interrupt levels enough to match all the inputs.
  2. As above, each input has its own interrupt service routine (ISR), and its own interrupt priority. In this system, however, nested interrupts are not allowed. This means that all other interrupts must wait until the first is done. This is really bad for the interrupt latency—the worst-case reaction time—on the other interrupts. Thus it normally becomes even more important to defer most work until later. This again is bad news for a purely interrupt-based system.
  3. Both of the above scenarios may have the option for many inputs to trigger the same interrupt. The first thing the ISR must do then is to find out which input actually changed state. This is daisy-chaining interrupts. The order in which you test for the various events becomes a “subpriority” so to speak.

From the above, it is clear that a purely interrupt-controlled system, with nothing deferred to low-priority handlers, has some huge challenges.

A particularly nasty problem with interrupts I once experienced, is related to the difference between “edge triggered” and “level triggered” interrupts. If an interrupt is level triggered, you will keep getting the interrupt until the level is changed, either by the hardware itself or from code, typically in your ISR. An edge triggered interrupt, on the other hand, only happens at the up- or down-going edge of the pulse. If your interrupts are disabled in that short instant, you never get an interrupt, unless the edge is latched in CPU-hardware, which is not done in all CPUs.

The general rule in any good system with interrupts is like guerrilla warfare: “move fast in, do the job, and move fast out.” This is to achieve the best interrupt latency for the other interrupts. This means that the actual ISR only does the minimal stuff needed. This could, for example, be to set a flag or read a sample from an A/D converter before it is overwritten by the next sample. In the latter case, the ISR will save the sample in a buffer, which later will be read from a standard process or task. In such a system, the interrupt latency must be less than 1/fs—where fs is the sample frequency. A system like this can thus detect external events very fast, but it does not offer any help to the developer in terms of multitasking (a concept we will see shortly).

Figure 2.3: Interrupt system with finite state machine in main.

If, however, the main loop is broken up into small pieces of code, organized with the help of finite state machines, it is possible to react to the flags set in the ISR, as soon as one of the small pieces of code is done, and then decide (via the FSM) what the next piece of code is; see Figure 2.3.

This is exactly what ARM has done in their basic version of the free ‘‘mbed’’ OS. Here, the flags from the ISRs are called events. ARM mbed prioritizes the interrupts in the usual way and also offers priority on the “pseudo threads”—the small pieces of code. This simply means that if “pseudo threads” A and B both are waiting for an event from the same interrupt, the one with the highest priority is started first. Since all of these “pseudo threads” are started on a specific point and run to end, there is no preemption. One task in the application code never takes over the CPU from another; it is only interrupted by ISRs and these can use the single CPU stack for the specific registers they use. This saves a lot of RAM space, and is very practical in a small system.

Thus mbed is tailored, for example, for the small 32-bit Cortex M0 CPUs that have scarce resources (including no MMU). What makes mbed interesting is that it has a lot of the extras, normally seen on larger OS’es: TCP/IP stack, Bluetooth LE stack, etc. It also boasts a HAL (hardware abstraction layer), making the code the same for other CPUs in the ARM family. In this way, mbed is well positioned and does look very interesting.

Note that ARM mbed alternatively can be configured to use a preemptive scheduler as described in the next section. This takes up more space, but also makes mbed a member of a more serious club.

2.4A small real-time kernel

Typically, the aforementioned concepts are only used in very small and simple systems. It is really practical to separate the various tasks. A real-time kernel offers exactly that—a task concept. You can also say that a kernel is all about managing resources. The basic theory is that you set aside a task for each independent resource in the system. This could be a printer, a keyboard, a hard-drive, or a production-line “station” (or parts of this). It is not uncommon though, to have more tasks if this makes your code more maintainable. It is, nevertheless, not a good idea simply to assign a task to each developer, as this will require more coordination among tasks. The less coordination needed between tasks, the better. In fact, almost all the quirks that can make the use of a kernel complex are related to interaction between tasks. Figure 2.4 shows a system with a preemptive scheduler (we will get back to this shortly).

Figure 2.4: OS with preemptive scheduling.

The states used in Figure 2.4 are:

Dormant
The task is not yet brought to life. This must be done explicitly by the application.

Ready
The task can run, only it’s waiting for the current “running” task to “get of the CPU.”

Running
Actually executing code. There can only be one running task per CPU—or rather per CPU core. Many modern CPU chips contain several cores. This is really like several CPUs in one house. Intel’s hyperthreaded virtual cores also count here.

Blocked
The task is waiting for something to happen. This could, for example, be a socket in a recv() call, waiting for input data. When data arrives, the task becomes “Ready.” A socket will also block if you write to it with send() and the assigned OS transmit buffer is full.

Most kernels today support preemption. This means that application code in tasks is not only interrupted by ISRs. When a task has run for an allowed time—the so-called timeslice—the scheduler may stop it “in mid-air” and instead start another task. As no one knows which registers the current or the next task needs, all registers must be saved on a stack per task. This is the context switch. This is different from an interrupt routine where you only need to save the registers used by the routine itself.4 A context switch can even occur before the time-slice is used. If somehow a higher prioritized task has become ready, the scheduler may move the currently running, low priority task to ready (thus not executing anymore but still willing to do so) to make room for running the high-priority task.

More advanced kernels support priority inversion. This is when a high priority task is currently blocked, waiting for a low priority task to do something that will unblock it. In this case, the low priority tasks “inherits” the priority of the waiting task until this is unblocked.

Figure 2.5 shows our little system again—now full-blown with interrupts and tasks.

Figure 2.5: Tasks and interrupts.

In some ways, the tasks are now simpler, as each task is blocked until awakened by the OS, due to something that happened in the ISR. The figure shows how the three ISRs respectively use an x, y, or z data-structure, and that the three tasks each wait on one of these. There is not necessarily a 1:1 correspondence—all three tasks might, for example, have waited on “y.” The nature of x, y, and z is different from OS to OS. In Linux, the awaiting tasks calls wait_event_interruptible() while the ISR calls wake_up_interruptible(). Linux uses wait queues so that several tasks may be awoken by the same event. The term “interruptible” used in the call does not refer to the external interrupt, but to the fact that the call may also unblock on a “signal” such as, for example, CTRL-C from the keyboard. If the call returns nonzero, this is what has happened. In a normal embedded system, you are not so interested in the use of signals.

Tasks can communicate to each other with these low-level mechanisms, as well as semaphores, but often a better way is to use messages. C-structs or similar can be mapped to such messages that are sent to a queue. The queue is sometimes referred to as a mailbox.

A highly recommended design is one where all tasks are waiting at their own specific queue, and are awakened when there is “mail.” A task may have other queues it waits on in specific cases, or it may block waiting for data in or out, but it usually returns to the main queue for the next “job.” This is not much different from a manager, mainly driven by mails in his outlook in-box. One specific advantage in using messages is that some kernels seamlessly extend this to work between cores—a local network so to speak. Another advantage is that it allows you to debug on a higher level which we will see later. “Zero message queue” is a platform independent implementation supporting this.

A word of caution: do not try to repair deadlocks by changing priorities on tasks. The simple rule on mutexes, semaphores, etc. is that they must always be acquired in the same order by all tasks, and released in the opposite order. If one task “takes A then B” and another “takes B then A,” then one day the two tasks will have taken respectively A and B; both will block and never unblock. Even though the rule is simple, it is not easy to abide to. Thus it is preferable to keep resource-sharing at a minimum. This is especially true in systems with multiple cores that actually do execute in parallel. Coordinating resources in this scenario is inefficient, and copying data may be preferred in many cases.

2.5A nonpreemptive operating system

We typically talk about an operating system when there is a user interface as well as a scheduler. Even though the OS in this way is “bigger” than a kernel, it may have a less advanced scheduler. A very good example here is Windows 3, released in May 1990. This was a huge and very complex operating system with a lot of new and exciting GUI stuff and tools. With Windows 3.1 (1992), we got “true-types” which was a breakthrough in printing for most people.

However, from an RTOS (real time operating system) point-of-view, Windows 3 was not so advanced. Windows 3 did support interrupts, and it had a task concept, but contrary to a small RTOS kernel, it did not support preemption. Thus Windows 3 was like the system in Figure 2.4 without the preemption action. Another thing missing in Windows 3—which all good OSs have today—was the support for the MMU. This was supported in the Intel 80386 CPU, which was the standard at the time, but the software hadn’t caught up with the hardware.

Anyway, an input could generate an interrupt as seen before. Even though this meant that a long-awaited resource was now ready, the scheduler could not move a low-prioritized task away from the CPU to make way for the now ready high-priority task. A task could not get to the CPU until the currently running task yielded. This is not the same kind of “yield” as with co-routines. The Windows version is easy to implement in C. The way Windows 3 yields is that it performs specific OS calls, typically sleep(). Sleep takes as input the minimum number of seconds (or microseconds) the process wants to be taken of the CPU—thus making time for other tasks.

In the Windows 3 code, you would often see sleep(0). This meant that the task could continue, but on the other hand was also prepared to leave the CPU at this point. Moreover, Windows 3 introduced a variant of Berkeley Sockets called WinSock. As we will see later, if you try to read data that hasn’t arrived yet from a socket, your task will be blocked in a preemptive system.

In the days of Windows 3, this was standard in Unix, but Windows couldn’t handle it—yet. Instead, Microsoft invented WinSock where a socket could tell you that it “WOULDBLOCK” if only it could, so could you please write a kind of loop around it with a sleep(), so that you do not continue before there is data or the socket is closed.

It would not be surprising, if this was the kind of behavior that made Linus Thorvalds start writing Linux. The lack of preemption support was also one of the main reasons why Microsoft developed Windows NT—which in newer versions is known as Windows XP, Vista, 7, 8, or 10—“Windows” to all modern people. This is not just an anecdote. You may still see kernels for very small systems that are nonpreemptive, like ARM mbed in its simple version.

It is important that not only the OS or kernel supports preemption, but also that the C-library code is supporting this well. There are two overlapping terms we need to consider here:

Reentrant
A reentrant function can be used recursively; in other words, by the same thread. In order for this to happen, it must not use static data, instead it uses the stack. A classic example of a non-reentrant C-function is strtok(), which tokenizes a string very fast and efficiently, but keeps and modifies the original string while parsing it into tokens. It is called again and again until the original string is completely parsed. Should you want to start parsing another string, before the first is completely done, you will overwrite what’s left of the first original string.

Thread-safe
A thread-safe function may be used from different threads of execution in parallel. This is accomplished by the use of OS locks or critical sections. If for instance, the function increments a 16-bit number in external memory, things can go wrong. To increment the number, we need to read it into the CPU, add one and write it back. If the CPU has a word size of 8-bits, it reads the low byte first, adds one to it, and writes it back. If this operation meant a carry, the next byte is incremented in the same way. Unfortunately, another thread, or interrupt service routine, might read the two bytes before the high-byte is incremented. Even with a 16-bit word size this can happen if data is not word aligned. In this case, the CPU needs to read two 16-bit words. You want to make the whole operation “atomic.” This can be done by disabling interrupts for the duration of the complete read-modify-write operation, using relevant OS-macros. Many modern CPUs have specific assembler instructions for doing such atomic functions. C11 and C11++ compliant compilers can use these functions—either explicitly like std::atomic<>::fetch_add(), or by declaring variables like std::atomic<unsigned> counter, and then simply write counter++ in the code.
More complex scenarios, like, for example, engine control, may require several operations to be performed without something coming in between them. In this case, the OS-macros or manual interrupt disable/enable is needed.

The two terms are sometimes mixed up. What you need as an embedded programmer is typically “the full Monty”: you need library functions to be thread-safe as well as reentrant. Many kernels and operating systems supply two versions of their libraries, one for multitasking (thread-safe), and one not. The reason for having the latter at all, is that it is smaller and executes faster. It makes sense in nonpreemptive systems, or systems with no OS at all, as we saw at the beginning of this chapter.

It should be noted that there are now modern nonblocking sockets. In order to create servers with tens of thousands of connections, there are various solutions known as asynchronous I/O, IO-port-completion, and similar terms, creating an FSM within the OS—thus featuring an event-driven programming model. The basic IoT device will however not serve many clients directly. Typically, there will only be one or two clients in the cloud, with these servicing the many real clients. On top of this, we often see a local Bluetooth or Wi-Fi control. For this reason, and because classic sockets are universally implemented, we are focusing on the classic socket paradigm. In any case, the underlying TCP is the same.

2.6Full OS

The small preemptive kernel gives you everything you need in order to multitask and handle interrupts. Gradually, kernels have added file systems and TCP/IP stacks to their repertoire. When a kernel comes with drivers for many different types of hardware, as well as tools that the user may run on the commandline prompt, and typically a Graphical User Interface (GUI)—then we have a full OS.

Today Linux is the best known and used full OS in the embedded world. Windows also comes in versions targeting the embedded world, but somehow Microsoft never really had its heart in it. Windows CE is dying out. Very few hardware vendors are supporting it, thus there is not the wealth of drivers that we are accustomed to with desktop Windows. The development environment, “Platform Builder,” can be very disappointing if you are used to Visual Studio. Microsoft marketed first Windows XP, later Windows 8 in a “fragmented” version for the embedded world, and is now marketing Windows 10. However, the embedded world typically demands operating systems that are maintained longer than it takes for Microsoft to declare something a “legacy,” and Windows is difficult to shrink down to small systems. If you can use a standard industrial PC with standard Windows in an application, then by all means do it. You can take advantage of Visual Studio with C# and all its bells and whistles. This is a fantastic and very productive environment.

Neither Linux, nor Windows are what can be called real-time systems (except Windows CE). There are many definitions of the term “real-time,” but the most commonly used is that there must be a deterministic, known, interrupt latency. In other words, you need to know the worst-case time it takes from something in the hardware changes state, until the relevant ISR is executing its first instruction. Both Linux and Windows are designed for high throughput, not for deterministic interrupt latency. An example of a true real-time OS (RTOS) is VxWorks from WindRiver.

If it’s not real-time, how come Linux is so popular in the embedded world? First and foremost, it is popular for its availability of drivers and libraries for almost anything. Your productivity as an embedded developer is much higher when you can draw on this massive availability. Secondly, there’s the community. Should you get stuck, there are many places to ask for help. In most cases, you only have to browse a little to find a similar question with a good answer. Finally, we have the open source, which we will discuss separately.

The fact remains that generally high throughput is actually very nice, and in reality there are typically not many hard real-time demands in a system. Reading the A/D converter sample before the next sample overwrites it is one such example. There are several solutions to this problem:

Apply a real-time patch to Linux. In this way, Linux becomes a real-time system, but there is no such thing as a free lunch. In this case, the cost is that some standard drivers are not working any more. As this is one of the main reasons for choosing Linux, it can be a high price.

Add external hardware to handle the few hard real-time cases. This could, for example, be an FPGA collecting 100 samples from the A/D. Theoretically, Linux still might not make it, but in reality it’s not a problem with the right CPU.

Add internal hardware. Today, we see ARM SoCs containing two cores: one with a lot of horsepower, perfect for Linux, and another one that is small and well suited for handling interrupts. As the latter does nothing else, it can work without an OS, or with a very simple kernel. This CPU shares some memory space with the bigger CPU, and can thus place data in buffers, ready for the bigger brother. The DS-MDK environment from ARM/Keil actually supports such a concept, for development hosted on Windows as well as Linux. In the simple example with an A/D converter, many CPUs are however capable of buffering data directly from an I2S bus or similar.

Another problem with Linux is that it demands a memory management unit (MMU). This is, in fact, a very nice component in the larger CPUs that cooperate with the OS. It guarantees that one task cannot in any way mess up another task, or even read its data. Actually, tasks in such a system are often called processes. A process is protected from other processes by the MMU, but this also means that there is no simple sharing of memory. When this is relevant, a process may spawn threads. Threads in the same process-space can share memory, and thus are very much like tasks in a smaller system without MMU. Processes are very relevant on a PC, and practical to have in an embedded system, but if you want a really small system it won’t have an MMU. It is possible to compile Linux to work without the MMU. Again, this may hurt software compatibility.

There is a lesson to learn from Ethernet. Ethernet is not perfect and it cannot guarantee a deterministic delay like Firewire can. Still Firewire has lost the battle, while Ethernet has survived since 1983 (i. e., at various speeds and topologies). The cheap “good-enough” solution—in this case the nonreal-time Linux—wins over the expensive perfect solution, if it can solve problems outside a small community.

2.7Open source, GNU licensing, and Linux

It is a well-known fact that Linux brought the ways of Unix to the PC-world, mainly servers, and now is taking over the embedded world. Interestingly, Unix clones are old news to the embedded world. Many years before Linux was born, RTOSs such as OS-9, SMX, QNX, and many others, used the Unix style. It was even standardized as “POSIX.” The idea was to make it feasible to switch from one to another. So why is Linux so successful? One explanation is the massive inertia it got via PC-hardware in embedded systems. Another explanation is that it is open source.

Kernels and OSs come in two major flavors: open or closed source. If you come from the Windows world you might wonder why so many embedded developers want open source. Surely a lot of people believe in the cause, the concept of not monopolizing knowledge. However, to many developers “open” literally means that you can open the lid and look inside.

Here are some reasons why this is so important:

  1. If you are debugging a problem, you may eventually end in the kernel/OS. If you have the source, you may be able to find out what is wrong. Often times you may be able to make a work-around in your own code. This would be pure guesswork using closed source.
  2. As above—but there is no possible work-around, you need to change the OS. With open source you can actually do this. You should definitely try to get your change into the official code base, so that the next update contains your fix. There is nothing more frustrating than finding a bug, and then realize that you have found it before. Also, the GNU Public License, GPL, requires you to make the improvement public, so getting it back into the official kernel makes life easier, which is the whole point.
  3. As stated earlier, a lot of embedded codes lives for many years. It’s a pain if the OS doesn’t, and if it is open source you can actually maintain it, even if no one else does.

If you come from the “small kernel” embedded world, you are probably used to compiling and linking one big “bin” or “exe” or similar. This keeps you in total control of what the user has on his or her device. You may have noticed that embedded Linux systems look much like a PC in the way that there are tons of files in a similar-looking file-system. This is a consequence of the open source licensing concept. If you are a commercial vendor, you charge for your system which includes a lot of open source code besides your application. This is okay, as long as the parts originating from open source are redistributed “as is.” This makes configuration control much more difficult, and you may want to create your own distribution, or “distro.” See Section 6.9 on Yocto.

GNU means “GNU is Not Unix.” It was created in the US university environment as a reaction to some lawsuits on the use of Unix. The purpose is to spread the code without prohibiting commercial use. GNU is very focused on not being “fenced in” by commercial interests. The basic GNU license allows you to use all the open source programs. However, you cannot merge them into your source code, and not even link to them without being affected by the “copy-left” rule which means that your program source must then also be public.

Many sites claim that you are allowed to dynamically link without being affected by the copy-left clause. However, a FAQ on gnu.org raises and answers the question: Does the GPL have different requirements for statically versus dynamically linked modules with a covered work? No. Linking a GPL covered work statically or dynamically with other modules is making a combined work based on the GPL covered work. Thus the terms and conditions of the GNU general public license cover the whole combination.

This means that your code must call all this GPL code as executables. This is not a “work-around” but the intended way. This fact is probably responsible for keeping one of the really nice features from Unix: you can call programs on the command-line and you can call them from your program or script—it works the same way. The Linux philosophy states that programs should be “lean and mean” or in other words: do only one thing, but do it good. This, together with the fact that most programs use files, or rather stdin and stdout, allows you to really benefit from the GPL programs this way. This is very different from Windows where commandline programs are rarely used from within applications; see Section 4.4.

But if you are not allowed to link to anything, then what about libraries? It will be impossible to create anything proprietary working with open source. This is where the “Lesser GNU Public License” (LGPL) comes into the picture. The founders of GNU realized it would inhibit the spread of the open source concept if this was not possible. All the system libraries are under this license, allowing linking in any form. However, if you statically link, you must distribute your object file (not the source), thus enabling other people to update when newer libraries become available. This makes dynamic linking the preferred choice.

The GNU org are very keen on not having too much code slip into the LGPL. There is even a concept called “sanitized headers.” This is typically headers for LGPL libraries that are shaved down and pre-approved by GNU for use in proprietary code. In order to use a library, you need the header files, and the fact that someone even thinks that sanitizing these is needed, shows how serious the GPL is. The main rule is to keep things completely separate—never start a proprietary code module based on open source. There are alternatives to the GPL, such as FreeBSD and MIT licenses, aiming to make it easier to make a living on products based on their code. Such libraries may also be used from the proprietary code.

Still, Linux adheres to GNU. There is a much debated case on LKM s (loadable kernel modules). As stated by the name, these are program parts, dynamically loaded into the kernel. One vendor has made a proprietary LKM. I am not a lawyer, but I fail to see how this cannot violate the GPL. The way I understand it, this has been ignored, not accepted by the GNU community.

2.8OS constructs

Table 2.2 presents a short list and explanation of some OS constructs.

Table 2.2: OS primitives for scheduling.

Concept Basic Usage
atomic A Linux macro assuring atomicity on variables, not normally atomic, for example, a variable in external memory.
critical section Generally a block of code that must only be accessed by one thread at a time. Typically protected by a mutex. Specifically, on Windows a critical section is a special, effective mutex for threads in the same process.
event Overloaded term. Kernel-wise Windows uses events which other threads/processes may wait on—blocking or not—in, for example, WaitForMultipleObjects().
semaphore Can handle access to n instances of a resource at the same time. The semaphore is initialized to “n.” When a process or thread wishes to access the protected data, the semaphore is decremented. If it becomes 0, the next requesting process/thread is blocked. When releasing the resource, the semaphore is incremented.
lock A mutex can be said to implement a lock.
mutex Like a semaphore initialized to 1. However, only the owner of the “lock” can “unlock.” The ownership facilitates priority inversion.
signal A Unix/Linux asynch event like CTRL-C or kill -9. A process can block until it is ready, but it may also be “interrupted” in the current flow to run a signal handler. Like interrupts, signals can be masked.
spinlock A low-level mutex in Linux that does not sleep, and thus can be used inside the kernel. It is a busy-wait, effective for short waits. Used in multiprocessor systems to avoid concurrent access.
queue High-level construct for message passing.

2.9Further reading

Andrew S. Tanenbaum: Modern Operating Systems
This is a kind of classic and very general in its description of operating systems. The latest edition is the 4th.

Jonathan Corbet and Alessandro Rubini: Linux Device Drivers
This is a central book on Linux drivers, and if you understand this, you understand everything about critical sections, mutexes, etc. The latest edition is the 3rd.

lxr.linux.no
This is a good place to start browsing the Linux source code. There are also git archives at www.kernel.org, but lxr.linux.no is very easy to simply jump around in. It is good when you just want to learn the workings of Linux, but is also good to have in a separate window when you are debugging.

Mark Russinovich et al.: Windows Internals Parts 1 and 2
To avoid it all being Linux, these books by the fabulous developers of “Sysinternals” are included. This was originally a website with some fantastic tools that were, and still are, extremely helpful for a Windows developer. These guys knew more about Windows than Microsoft did, until they “merged” with Microsoft.

Simon: An Embedded Software Primer
This book includes a small kernel called uC, and is using this for samples on how to setup and call tasks, ISRs, etc. It includes a description of some specific low-level HW circuits. The book uses Hungarian notation which can be practical in samples, but not recommended in daily use.

C. Hallinan: Embedded Linux Primer
This is a very thorough book on the Linux OS.

Michael Kerrisk: The Linux Programming Interface
This is a huge reference book—not something you read from cover to cover. Nevertheless, once you start reading, you may end up having read more than planned—simply because it is well written and filled with good examples.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset