Popular Scheduling Implementations

We’ll now look at how all of this plays out in the implementation of the Java virtual machine on several popular platforms. In many ways, this is a section that we’d rather not have needed to write: Java is a platform-independent language, and to have to provide platform-specific details of its implementations certainly violates that precept. But we stress that there are very few times when these details actually matter.

On the other hand, one of the hallmarks of Java in the real world is that vendors of Java virtual machines are allowed to compete over the implementation of those virtual machines: which one is faster, which one can run more threads, and so on. As long as the implementation of the virtual machine follows the Java language specification and conforms to the Java Compatibility Kit for testing, then it is a valid Java virtual machine. Because of the flexibility that the specification allows for thread scheduling, all of the implementations we’ll discuss are certainly valid Java virtual machines (at least in the area of thread scheduling support).

Green Threads

The first model that we’ll look at is the simplest. In this model, the operating system doesn’t know anything about threads at all; it is up to the virtual machine to handle all the details of the threading API. From the perspective of the operating system, there is a single process and a single thread.

Each thread in this model is an abstraction within the virtual machine: the virtual machine must hold within the thread object all information related to that thread. This includes the thread’s stack, a program counter that indicates which Java instruction the thread is executing, and other bookkeeping information about the thread. The virtual machine functions by loading this information into memory and operating on it: it will run the instruction pointed to by the program counter, get the next instruction and run that instruction, and so on.

When it is time for the virtual machine to run another thread, it will do so by saving all of the information regarding the state of the current thread and then replacing that information with the state of the target thread. The target thread’s stack becomes the stack on which the virtual machine is operating, and the instruction that the virtual machine will next execute becomes the instruction pointed to by the program counter in the target thread.

Of course, this all happens at a logical level: the implementation of a particular virtual machine may be somewhat different. But the salient fact is that the operating system has no idea that the virtual machine has switched threads in this manner. As far as the operating system is concerned, the virtual machine is just executing arbitrary code; the fact that the code is emulating many different threads is unknown outside of the virtual machine.

This model is known in Java as the green-thread model. There is no particular significance to the term green —it does not mean that these threads are somehow unripe (that is, less robust or useful) than other thread models. In other circles, these threads are often called user-level threads , because they exist only within the user level of the application: no calls into the operating system are required to handle any of the thread details. In Solaris 1 (SunOS 4.1.3), this threading model was called lwp , but don’t confuse that with the Solaris 2 LWP model, which we’ll discuss later.

Because this model does not depend on the operating system to provide any thread-specific capabilities, green threads are fairly portable. In fact, the threading model itself is very portable, although it does require some code to be written in assembly language: for example, certain parts of the code must be able to execute an atomic test-and-set instruction on the CPU. Accessing this instruction is usually possible only in assembly code. But while the threading code itself is portable, use of green threads complicates other implementation details of the virtual machine: the virtual machine must handle all I/O in a nonblocking fashion, for example. This makes the virtual machine somewhat harder to write.

Still, the green-thread model remains the standard model for the reference implementation of Java, simply because it is more portable than the other models that we will discuss. And in fact, porting this model to most operating systems is not that daunting a task. It’s often assumed that porting Java to Windows 3.1 is so difficult because of the lack of thread support in Windows 3.1. In fact, there are many user-level thread libraries available for Windows 3.1, and the green-thread library itself can easily run on that platform. Other problems, such as the lack of 32-bit support and porting the AWT, remain harder to overcome.

The green-thread model is common on most Unix platforms, although Unix platforms often also support a native-thread model. Java-enabled browsers on Unix platforms almost always use a green-thread model (although the Java plug-in may use either model).

Because threads in the green-thread model are unknown to the operating system, a Java virtual machine that is implemented with green threads can only run a single thread at a time, even on a machine that has many CPUs.

Scheduling of green threads

For the most part, green threads are scheduled exactly as we discussed earlier. In most implementations of green threads, there is no notion of round-robin scheduling, so green threads will not automatically timeslice. Scheduling is entirely the responsibility of the virtual machine, and the virtual machine usually changes the currently running thread only when another thread of higher priority becomes runnable (or when the currently running thread blocks). However, this is not a requirement of the Java specification, and a green-thread implementation could conceivably include a timer to do round-robin scheduling.

The reference implementation of the green-thread model uses priority inheritance, so that the priority of a thread will be temporarily raised if it is holding a lock on which a higher-priority thread is waiting.

Depending on the operating system, however, green threads may not exhibit a precise scheduling behavior: there may be a very small period of time during which a lower-priority thread is running even though a higher-priority thread may want to run.

As an example, consider what happens when a Java thread reads data from a socket that has no data. The expectation is that the thread will block—but the virtual machine itself cannot afford to block. Hence, the virtual machine must have some way of being notified when data is ready on the socket, and only then can it allow the thread to read the data on the socket. In most operating systems, that’s possible using asynchronous I/O: when data is available on the socket, the virtual machine will receive a signal that will interrupt what it is presently doing. In response to the signal, the virtual machine can run the thread that wants to read the data.

In some operating systems, however, there is no facility for asynchronous I/O. In those cases the virtual machine must periodically poll the socket to see if data is available. This polling usually happens periodically, perhaps every few milliseconds. When data is available, then the virtual machine may schedule the thread that wants to read the data.

Now, say that a high-priority thread is waiting for data from the socket, and meanwhile, a lower-priority thread is performing some other calculation. If the virtual machine is dependent on polling the socket to see when data is ready, then there will be a very small window of time after data arrives on the socket and before the virtual machine next polls the socket. During this time, the lower-priority thread will still be executing, even though the higher-priority thread should be the one that is executing.

Situations like this almost never affect an actual program—the delay is very slight, and there are many other factors that may have influenced the program anyway: what if the pending data had been delayed even longer in coming across the network? What if another process on the machine prevented the Java application from running for a few microseconds? Java is not a real-time platform, so the scheduling anomaly that we’ve just described is unlikely to have any impact on a Java program.

However, be aware that this is one reason why even if one thread in the Java virtual machine causes another thread to become unblocked there may be a delay before the higher-priority thread actually runs. Say that we run the following code fragment in a low-priority thread:

public class LockTest {
    Object someObject = new Object();
    class ThreadA extends Thread {
        ThreadA() {
            setPriority(Thread.MAX_PRIORITY);
        }
        public void run() {
            synchronized(someObject) {
                someObject.wait();
            }
            someObject.methodA();
        }
    }
    class ThreadB extends Thread {
        ThreadB() {
            setPriority(Thread.NORM_PRIORITY);
        }
        public void run() {
            synchronized(someObject) {
                someObject.notify();
            }
            someObject.methodB();
        }
    }
    static void main(String args[]) {
        new ThreadA().start();
        new ThreadB().start();
    }
}

In this example, we’re starting two threads: ThreadA, which has a priority of 10, and ThreadB, which has a priority of 5. Since we start ThreadA first, the expectation is that it will begin its run() method and block on the wait() method. Then ThreadB will start and notify ThreadA. In a strict priority model, ThreadA will wake up, preempt ThreadB, and execute the methodA() method. Then ThreadB will execute the methodB() method. However, we cannot assume that the priority scheduler of the green-thread model is strict enough to ensure that the methodA() method will be called before the methodB() method is called, even though that will happen in the vast majority of cases.

Windows Native Threads

In the native-threading model used on Windows 95 and Windows NT (more generally, on any 32-bit Windows operating system), the operating system is fully cognizant of the multiple threads that the virtual machine uses, and there is a one-to-one mapping between Java threads and operating system threads. Therefore, the scheduling of Java threads is subject to the underlying scheduling of threads by the operating system.

This model is usually simple to understand because every thread can be thought of as a process. The operating system scheduler makes no real distinction in this case between a process and a thread: it treats each thread like a process. Of course, there are still other differences in the operating system between a thread and a process, but not as far as the scheduler is concerned.

Since there are no popular green-thread implementations of Java virtual machines on Windows operating systems, virtually all virtual machines on the Windows platform will use the native Windows thread model. Java-enabled browsers typically use this model as well. However, there are many vendors of Java virtual machines for Windows, and the specifics of each one may vary.

Because the operating system knows about threads, Windows native threads tend to be somewhat heavyweight. This can limit the number of concurrent threads that can run on the platform, since with too many threads, the operating system becomes unresponsive. We’ll show a pooling technique in the next chapter that helps us work around that problem.

But this implementation does allow multiple threads to run simultaneously on a machine with multiple CPUs. Each CPU—whether one or many—will select a currently running thread according to the guidelines that follow.

Scheduling of Windows native threads

In the Windows native threads model, the operating system takes an important role in thread scheduling. In particular, the operating system schedules threads just like it schedules processes. That means that threads are scheduled on a preemptive, priority-based mechanism, just as we’d hope, but there are a few complications added to the generic model we described earlier.

To begin, only seven priorities are recognized by the Windows operating system; these seven priorities must map onto the eleven Java thread priorities. Since one of these priorities is typically reserved for the internal 0-level Java thread priority, the end result is that the ten programmable Java thread priorities must map onto six Windows platform priorities. Different virtual machines will do this differently, but one common implementation performs the mapping listed in Table 6.1.

Table 6-1. Mapping of Java Thread Priorities on Win32 Platforms

Java Priority

Windows 95/NT Priority

0

THREAD_PRIORITY_IDLE

1 (Thread.MIN_PRIORITY)

THREAD_PRIORITY_LOWEST

2

THREAD_PRIORITY_LOWEST

3

THREAD_PRIORITY_BELOW_NORMAL

4

THREAD_PRIORITY_BELOW_NORMAL

5 (Thread.NORM_PRIORITY)

THREAD_PRIORITY_NORMAL

6

THREAD_PRIORITY_ABOVE_NORMAL

7

THREAD_PRIORITY_ABOVE_NORMAL

8

THREAD_PRIORITY_HIGHEST

9

THREAD_PRIORITY_HIGHEST

10 (Thread.MAX_PRIORITY)

THREAD_PRIORITY_TIME_CRITICAL

On this implementation, having a thread of priority 3 and a thread of priority 4 will be the same as having two threads with a priority of 4.

In addition to seven priority levels, the Windows operating system also has five scheduling classes, and a thread in Windows is actually scheduled as a combination of its priority and its scheduling class. However, scheduling classes for threads are not easy to change, so they do not factor into a system where priorities can be changed dynamically by the programmer.

A second complication arises on this platform because the priority that a programmer can assign to a thread (that is, one of the seven platform-specific priorities) is only one piece of the information that the operating system uses to determine the absolute priority of a thread. There are other things that can affect the priority of a thread:

  • Windows native threads are subject to priority inheritance.

  • The actual priority of the thread is based on its programmed (or inverted) priority minus a value that indicates how recently the thread has actually run. This value is subject to continual adjustment: the more time passes, the closer to zero the value becomes. This primarily distinguishes between threads of the same priority, and it leads to round-robin scheduling between threads of the same priority. Hence, on Windows platforms, equal-priority threads will timeslice: all things being equal, each thread of the same priority will receive approximately the same amount of CPU time.

  • On another level, a thread that has not run for a very long time is given a temporary priority boost. The value of this boost decays over time as the thread has a chance to run. This prevents threads from absolute starvation, while still giving preference to higher-priority threads over lower-priority threads. The effect of this priority boost depends on the original priority of the thread: when competing against a thread of priority 5, a thread of priority 3 will run more often than a thread of priority 1.

The upshot of all this is that it is very difficult to guarantee explicit ordering of thread execution on Windows platforms. However, because the operating system ensures that threads do not starve and that equal-priority threads timeslice, this is not usually a problem.

Solaris Native Threads

The last threading model that we’ll look at is the most complex. At an operating system level, Solaris native threads provide a very flexible threading model, but much of that flexibility is lost to the Java programmer, since there is no interface in the Java API to exploit it. A Java programmer can use native methods to call the underlying thread libraries to get at features that aren’t exposed by the Java API; while that’s something that we generally discourage, we’ll have no choice but to follow that path under certain circumstances.

Solaris native threads utilize a two-level model of programming: there are user-level threads, which are unknown to the operating system and are scheduled in the same manner as threads in the green-thread model that we looked at a little earlier. In addition, there are system-level threads (known as lightweight processes , or LWPs), which are known to the operating system and are scheduled in a manner very similar to the Windows thread model that we just discussed. The interplay between these two levels gives Solaris native threads their great flexibility (as well as their complexity).

The Solaris native-thread model is used by Sun’s production version of its virtual machine; beginning with Java 2, it is available in Sun’s reference version of the virtual machine as well. As of this writing, Netscape and Internet Explorer for Solaris (versions 4.0 and earlier) use the green-thread model; it is unknown if those browsers will use the native-thread version of the virtual machine in their Java 2-compatible releases. The HotJava browser can run with either thread model.

Because the operating system knows about these threads, Java programs using Solaris native threads can run multiple threads simultaneously on machines with multiple CPUs.

Scheduling of Solaris native threads

Scheduling of Solaris native threads is a little complex, but it follows the same principles that we’ve already seen. From the perspective of any particular LWP, the scheduling follows the green-thread model. At any point in time, an LWP will run the highest priority thread available. An LWP will not perform any timeslicing among the eligible threads: just as in the green-thread model, once an LWP selects a thread to run, it will continue to run that thread until a higher-priority thread becomes available. There is a difference, however, in how an LWP handles a thread that blocks; we’ll look into that a little further on. There is a one-to-one mapping between the threads that an LWP can run and threads in a Java program; hence, in a Java virtual machine that has a single LWP, scheduling is very much like the green-thread model.

However, programs written with Solaris native threads typically have multiple LWPs, each of which is scheduled by the operating system. Although LWPs themselves have a priority, this priority is unknown to the Java programmer and is unaffected by the priority of the thread that the LWP has selected to run. Hence, all LWPs in the virtual machine have essentially the same priority, and—just like on a Windows platform—this priority is subject to periodic adjustment based on how recently the LWP has run. Thus, LWPs will timeslice among themselves.

Therefore, when an LWP runs, it runs the thread with the highest priority. The LWP eventually loses its timeslice, and another LWP runs. This second LWP chooses the remaining thread with the highest priority and runs that thread. This process continues throughout the lifetime of the virtual machine.

Consider the effects of this process in a virtual machine that has created two threads when given two LWPs by the operating system. Assume that both threads have a priority of 5. When the first LWP comes along, it picks (seemingly arbitrarily) one of the priority 5 threads and starts running it. Eventually, the LWP loses its timeslice; the second LWP comes along and picks the remaining priority 5 thread and begins running that thread. When it loses its timeslice and the first LWP starts running again, the first thread is run. And so it goes—the two threads are timesliced, because each LWP is being given a timeslice by the operating system.

Now, say that there are three threads of equal priority and only two LWPs. In this case, the first two threads will timeslice, and the third thread will not get to run at all (at least, not until one of the first two threads blocks or exits). Again, this is consistent with the scheduling model that we learned about earlier: a thread is subject to starvation unless other threads of the same or higher priority will all eventually block. An LWP, once it has selected a thread, will run that thread until the thread blocks or until a higher-priority thread becomes available.

So, suppose we have two threads and two LWPs, but this time, one thread has a priority of 5 and one has a priority of 4. Interestingly enough, these threads will still timeslice, which means that there will be times when the priority 4 thread is running even though the priority 5 thread is not blocked. When the first LWP comes along, it selects the priority 5 thread and runs that thread. When the second LWP begins to run, it must select a thread; as far as it is concerned, the only available thread is the priority 4 thread, since the priority 5 thread has already been assigned to an LWP. Hence, this second LWP begins running the priority 4 thread, even through the priority 5 thread is not blocked. The two threads timeslice in this example.

We do not mean to imply here that because the priority 5 thread has been running on a particular LWP, it is forever bound to that LWP. Such a situation (known as bound threads ) is possible in a Solaris program, but most implementations of the Java virtual machine use unbound threads. The priority 5 thread may be displaced from its LWP when a higher-priority thread becomes runnable, at which point the priority 5 thread goes back into the pool of waiting threads. When the LWP running the priority 4 thread runs again, it will then displace the priority 4 thread in favor of the priority 5 thread because the priority 5 thread is not presently assigned to any LWP.

The rule of thumb, then, is that N number of LWPs in a Java virtual machine will be running (and timeslicing) the N threads with highest priority in the virtual machine (even if those threads have unequal priorities).

Finally, we note that Solaris native threads use priority inheritance.

LWPs in the virtual machine

How many LWPs does the virtual machine have? The answer varies according to the rules we will lay out here. To answer this question, we must delve into the details of the Solaris thread library a little more.

The Solaris thread model follows the Java threading API fairly well. In the thread library itself, there is the notion of thread priorities. This means that the actual scheduling of threads onto specific LWPs can be handled by the Solaris thread library itself, following the priority-based rules that the virtual machine expects. To the programmer, this scheduling is identical to when the virtual machine schedules using green threads.

The number of LWPs is controlled by the Solaris thread library according to the following guidelines:

  • The virtual machine begins with one LWP. As the virtual machine creates threads, these threads all run on this single LWP, which schedules the individual threads on a priority basis—the highest thread will be the one that the LWP runs.

  • When the thread makes a system call, it becomes (temporarily) bound to the LWP. A system call is any call that must use the kernel to perform work on its behalf. In the Java world, this includes reading or writing most streams (except for streams based on strings) and creating sockets. A thread that is bound to an LWP is not subject to preemption.

  • If the system call returns immediately, then the LWP continues to execute the thread. But the thread at this point becomes unbound, meaning that if a higher-priority thread is created, the LWP will start running it instead.

  • If the system call blocks, then the LWP to which it is bound also blocks. If all LWPs become blocked and there are threads that are waiting to run, then the operating system will automatically create a new LWP, which will select the highest priority thread in the pool of waiting threads and run that thread.

The virtual machine thus never creates new LWPs; that is handled by the operating system and the Solaris thread library. That means that the number of LWPs that exist in the Java virtual machine will be equal to the number of threads in the Java virtual machine that have ever been simultaneously blocked, plus one. In practice, this means that a typical Java application may start with five to seven LWPs, since during the initialization of the virtual machine, there may be four to six LWPs that are simultaneously blocked.

There are therefore usually at least two LWPs available to the Java program; the remaining LWPs tend to be blocked in the course of executing threads that are internal to the virtual machine itself. Hence, you can often rely on two threads of your program to be scheduled onto LWPs and to timeslice between them, even if they are not the same priority.

On a single-processor machine, this number of LWPs is usually sufficient. If it is insufficient—that is, if you create threads that block and tie up the existing LWPs—then new LWPs will be created on demand. If there are one or more unblocked threads in your Java program, there will always be at least one LWP to run those threads.

On a multiprocessor machine, however, this may not be a sufficient number of LWPs, especially if the threads in the Java program are CPU intensive and rarely block. If you have a machine with eight CPUs that’s behaving as a calculation server and you only have two LWPs available, you will not see the scaling that you desire, no matter how many threads you create. In order to get the most benefit from this machine, you need at least eight available LWPs—one for each CPU—so that you can run eight CPU-intensive threads at the same time.

In this case, you must call the operating system-specific library in order to tell it that you want eight (or however many you desire) LWPs. We’ll show an example of how to do that at the end of this section, but it involves using a native method to call the thr_setconcurrency() function of the Solaris thread library in order to create enough LWPs. Depending on your perspective, this can seem either very complicated or very cool.

This seems rather complicated. Do I really need to bother with it? Probably not. If you have a single-CPU machine, you definitely won’t see a benefit from it—in fact, you’ll slightly impede the performance of your machine, because you’ll have too many LWPs competing for the CPU. If you have multiple CPUs, you’ll only see a benefit if you have multiple threads that spend a significant amount of time executing code without blocking. If you write a chat server, then more LWPs isn’t going to help you: while you might have a thread for each client attached to the server, those threads spend most of their time blocked, waiting for input from the clients. In that case, the thread library will create enough LWPs for you: it will create a number sufficient to hold all simultaneously blocked threads, and when a thread unblocks, it will already be on an LWP so that it can process its request.

So you really need to worry about the number of available LWPs only when you write something that has multiple CPU-intensive threads.

This seems pretty cool, but how do I know how many LWPs I need? This is a hard question to answer. Assuming that you have enough threads to demand it, the answer is that you need as many LWPs as you have threads that will be blocked simultaneously, plus one LWP for each thread that you want to run simultaneously. You’ll see the best throughput when the number of running threads is equal to the number of CPUs on the machine. If there are any less, there will be idle CPUs that could be doing more work. If there are more, the LWPs will compete for CPU time.

Of course, there may be other things happening on the machine, which may lead you to want fewer LWPs than CPUs so that other programs can get a sufficient amount of CPU time. But there’s never really an advantage to having more LWPs than there are CPUs—even if you have hundreds of threads that you want to timeslice, you can accomplish that better by introducing some scheduling techniques into your program rather than creating hundreds of LWPs. Despite their name, LWPs still require system resources, and they are much more resource-intensive than are threads. A Java virtual machine on Solaris can easily handle thousands of threads, but not necessarily thousands of LWPs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset