Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Inner-Loop Threading

The issues that we have discussed so far do not change when the loops are nested: if you apply the techniques only to the inner loop, they will work. However, there are some other, very subtle issues that may apply to inner loops. Let’s return to our two dimensional SinTable. As mentioned, a loop interchange should allow the outer loop to be threaded. However, instead of the loop transformation, let’s try to thread the inner loop:

public float[][] getValues() {
    for (int i = 0; i < 360; i++) {
        lookupValues[0][i] = 0;
    }
    for (int j = 1; j < 1000; j++) {
        for (int i = 0; i < 360; i++) {
                        float sinValue = (float)Math.sin((i % 360)*Math.PI/180.0);
                        lookupValues[j][i] = sinValue * (float)i / 180.0f;
                        lookupValues[j][i] += lookupValues[j-1][i]*(float)j/180.0f;
                    }  
    }
    return lookupValues;
}

The first variable that we will classify is the outer-loop index variable, j. We must classify this variable since it is used inside the inner loop. In this case, j is classified as a read-only variable. At first glance, this does not make sense: how could an index variable be read-only? We must only look at the scope that we are attempting to thread. During the execution of the inner loop, the variable has a single value that does not change throughout the entire execution of the loop.

While the lookupValues array variable is a shared variable, the elements can be classified as loop private. Since each iteration of the loop accesses a different member of the array based on the loop index and the read-only variable j, its members may be considered loop private. The members of the lookupValues array are also considered as storeback variables. However, since we will not be creating a local copy of these variables, there is no need to store the variables back.

The last two variables—sinValue and i—are simply classified as loop-private variables, and separate copies are created for each thread. Neither of these variables is used after the loop has completed, so storeback handling is not necessary.

Choosing the loop scheduler is done by examining the algorithm inside the inner loop itself. In this case, there is nothing that should cause any iteration to execute longer than any other iteration. Choosing the default—static or chunk—scheduler is probably best. However, there should be no harm in choosing either the self- or guided self-scheduler.

Once these tasks are completed, threading the loop is done by using the loop handler as usual. However, there is a slight complication: compared with the outer loop, the inner loop will be executed many more times. This means a thread creation and destruction overhead is executed many more times. Furthermore, the loop handler is designed as a “one use” object. A new loop handler will have to be created for each iteration of the outer loop. Although using the loop handler will work without any problems, the overhead may be more significant than for threading a higher-level loop.

We can partially overcome this complication as follows:

public class PoolLoopHandler
             implements Runnable {
    protected class LoopRange {
        public int start, end;
    }
    protected ThreadPool poolThreads;
    protected int startLoop, endLoop, curLoop, numThreads;

    public PoolLoopHandler(int start, int end, int threads) {
        numThreads = threads;
        poolThreads = new ThreadPool(numThreads);
        setRange(start, end);
    }

    public synchronized void setRange(int start, int end) {
        startLoop = start;
        endLoop = end;
        reset();
    }

    public synchronized void reset() {
        curLoop = startLoop;
    }

    protected synchronized LoopRange loopGetRange() {
        if (curLoop >= endLoop)
            return null;
        LoopRange ret = new LoopRange();
        ret.start = curLoop;
        curLoop += (endLoop-startLoop)/numThreads+1;
        ret.end = (curLoop<endLoop)?curLoop:endLoop;
        return ret;
    }
 
    public void loopDoRange(int start, int end) {
    }
 
    public void loopProcess() {
                    reset();
                    for (int i = 0; i < numThreads; i++) {
                        poolThreads.addRequest(this);
                    }
                    try {
                            poolThreads.waitForAll();
                    } catch (InterruptedException iex) {}
                }
 
    public void run() {
        LoopRange str;
        while ((str = loopGetRange()) != null) {
            loopDoRange(str.start, str.end);
        }
    }
}

The fact that our original LoopHandler class can be used only once was merely a design flaw. The loop index can never be set back to the start of the loop, nor can the range of the loop be changed. To fix this, we simply add two new methods, reset() and setRange(), that will reset the index back to the start of the loop and specify new ranges for the loop. To avoid many thread creations and destructions, we will use the ThreadPool class that we implemented in Chapter 7. Instead of creating threads in the loopProcess() method, this method will now assign the tasks to the threads in a thread pool. We can then simply wait for all the threads in the pool to complete their assigned tasks. This all helps somewhat, but the synchronization that we have introduced into the calculation will have an effect on the ultimate acceleration of our program.

A Warning About Inner Loops

Prior to threading any loop, we should always examine that loop. There is no reason to thread the loop if the loop executes in a very short period of time. For these cases, the overhead in the setup and takedown of the threaded loop may be greater than any speed gained from threading the loop.

When moving from the outer loop to the inner loop, we must examine the inner loop. Just because the outer loop is a candidate for threading does not mean the inner loop is a candidate for threading. If the number of iterations in the outer loop is many times higher than the inner loop, the inner loop may execute only for a short period of time. There could also be method calls in the outer loop, and not in the inner loop that is taking a long period of time to execute.

We can implement other scheduling models in the pool handler quite easily:

public class PoolSelfLoopHandler
             extends PoolLoopHandler {
    private int groupSize;

    public PoolSelfLoopHandler(int start, int end,
                                   int size, int threads) {
        super(start, end, threads);
        setSize(size);
    }

    public synchronized void setSize(int size) {
                    groupSize = size;
                    reset();
                }

    protected synchronized LoopRange loopGetRange() {
        if (curLoop >= endLoop)
            return null;
        LoopRange ret = new LoopRange();
        ret.start = curLoop;
        curLoop += groupSize;
        ret.end = (curLoop<endLoop)?curLoop:endLoop;
        return ret;
    }
}

What’s interesting here is the similarity to our original SelfLoopHandler class. However, to be more configurable, we have modified the handler to allow the extra parameters, such as the chunk size, to be changed.

Here’s how we use our new handler:

public class SinTable extends PoolLoopHandler {
    private float lookupValues[][];
    private int j;

    public SinTable() {
        super(0, 360, 12);
        lookupValues = new float[1000][];
        for (int j = 0; j < 1000; j++) {
            lookupValues[j] = new float[360];
        }
    }    

    public void loopDoRange(int start, int end) {
        float sinValue = 0.0f;
        for (int i = start; i < end; i++) {
            sinValue = (float)Math.sin((i % 360)*Math.PI/180.0);
            lookupValues[j][i] = sinValue * (float)i / 180.0f;
            lookupValues[j][i] += lookupValues[j-1][i]*(float)j/180.0f;
        }
    }    

    public float[][] getValues() {
        for (int i = 0; i < 360; i++) {
            lookupValues[0][i] = 0;
        }
        for (j = 1; j < 1000; j++) {
            loopProcess();
        }
        return lookupValues;
    }
}

To implement the SinTable class, we place the code from the inner loop in the loopDoRange() method and then call the loopProcess() method to process the inner loop. Since the j index variable is a read-only shared variable, it is now an instance variable of the SinTable class.

Having a loop handler that can be used more than once is also very important. If we were using the earlier version of the loop handler, we would have had to create a new instance of the loop handler for each inner loop that we executed. This means that the code for the outer loop and the inner loop could not have been in the same class. Furthermore, we would have had to pass a reference to the j variable and lookupValues array to each instance, since these are shared between the different inner loop handlers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Inner-Loop Threading

Create new playlist

Sign In

Sign Up

Inner-Loop Threading

Table of Contents for
Inner-Loop Threading