The issues that we have discussed so far do not change when the loops are nested: if you apply the techniques only to the inner loop, they will work. However, there are some other, very subtle issues that may apply to inner loops. Let’s return to our two dimensional SinTable. As mentioned, a loop interchange should allow the outer loop to be threaded. However, instead of the loop transformation, let’s try to thread the inner loop:
public float[][] getValues() { for (int i = 0; i < 360; i++) { lookupValues[0][i] = 0; } for (int j = 1; j < 1000; j++) { for (int i = 0; i < 360; i++) { float sinValue = (float)Math.sin((i % 360)*Math.PI/180.0); lookupValues[j][i] = sinValue * (float)i / 180.0f; lookupValues[j][i] += lookupValues[j-1][i]*(float)j/180.0f; } } return lookupValues; }
The first variable that we will classify is the outer-loop index
variable, j
. We must classify this variable
since it is used inside the inner loop. In this case,
j
is classified as a read-only variable. At
first glance, this does not make sense: how could an index variable
be read-only? We must only look at the scope that we are attempting
to thread. During the execution of the inner loop, the variable has a
single value that does not change throughout the entire execution of
the loop.
While the lookupValues
array variable is a
shared variable, the elements can be classified as loop private.
Since each iteration of the loop accesses a different member of the
array based on the loop index and the read-only variable
j
, its members may be considered loop private.
The members of the lookupValues
array are also
considered as storeback variables. However, since we will not be
creating a local copy of these variables, there is no need to store
the variables back.
The last two variables—sinValue
and
i
—are simply classified as loop-private
variables, and separate copies are created for each thread. Neither
of these variables is used after the loop has completed, so storeback
handling is not necessary.
Choosing the loop scheduler is done by examining the algorithm inside the inner loop itself. In this case, there is nothing that should cause any iteration to execute longer than any other iteration. Choosing the default—static or chunk—scheduler is probably best. However, there should be no harm in choosing either the self- or guided self-scheduler.
Once these tasks are completed, threading the loop is done by using the loop handler as usual. However, there is a slight complication: compared with the outer loop, the inner loop will be executed many more times. This means a thread creation and destruction overhead is executed many more times. Furthermore, the loop handler is designed as a “one use” object. A new loop handler will have to be created for each iteration of the outer loop. Although using the loop handler will work without any problems, the overhead may be more significant than for threading a higher-level loop.
We can partially overcome this complication as follows:
public class PoolLoopHandler implements Runnable { protected class LoopRange { public int start, end; } protected ThreadPool poolThreads; protected int startLoop, endLoop, curLoop, numThreads; public PoolLoopHandler(int start, int end, int threads) { numThreads = threads; poolThreads = new ThreadPool(numThreads); setRange(start, end); } public synchronized void setRange(int start, int end) { startLoop = start; endLoop = end; reset(); } public synchronized void reset() { curLoop = startLoop; } protected synchronized LoopRange loopGetRange() { if (curLoop >= endLoop) return null; LoopRange ret = new LoopRange(); ret.start = curLoop; curLoop += (endLoop-startLoop)/numThreads+1; ret.end = (curLoop<endLoop)?curLoop:endLoop; return ret; } public void loopDoRange(int start, int end) { } public void loopProcess() { reset(); for (int i = 0; i < numThreads; i++) { poolThreads.addRequest(this); } try { poolThreads.waitForAll(); } catch (InterruptedException iex) {} } public void run() { LoopRange str; while ((str = loopGetRange()) != null) { loopDoRange(str.start, str.end); } } }
The fact that our original LoopHandler class can be used only once
was merely a design flaw. The loop index can never be set back to the
start of the loop, nor can the range of the loop be changed. To fix
this, we simply add two new methods, reset()
and
setRange()
, that will reset the index back to
the start of the loop and specify new ranges for the loop. To avoid
many thread creations and destructions, we will use the ThreadPool
class that we implemented in Chapter 7. Instead of
creating threads in the loopProcess()
method,
this method will now assign the tasks to the threads in a thread
pool. We can then simply wait for all the threads in the pool to
complete their assigned tasks. This all helps somewhat, but the
synchronization that we have introduced into the calculation will
have an effect on the ultimate acceleration of our program.
We can implement other scheduling models in the pool handler quite easily:
public class PoolSelfLoopHandler extends PoolLoopHandler { private int groupSize; public PoolSelfLoopHandler(int start, int end, int size, int threads) { super(start, end, threads); setSize(size); } public synchronized void setSize(int size) { groupSize = size; reset(); } protected synchronized LoopRange loopGetRange() { if (curLoop >= endLoop) return null; LoopRange ret = new LoopRange(); ret.start = curLoop; curLoop += groupSize; ret.end = (curLoop<endLoop)?curLoop:endLoop; return ret; } }
What’s interesting here is the similarity to our original SelfLoopHandler class. However, to be more configurable, we have modified the handler to allow the extra parameters, such as the chunk size, to be changed.
Here’s how we use our new handler:
public class SinTable extends PoolLoopHandler { private float lookupValues[][]; private int j; public SinTable() { super(0, 360, 12); lookupValues = new float[1000][]; for (int j = 0; j < 1000; j++) { lookupValues[j] = new float[360]; } } public void loopDoRange(int start, int end) { float sinValue = 0.0f; for (int i = start; i < end; i++) { sinValue = (float)Math.sin((i % 360)*Math.PI/180.0); lookupValues[j][i] = sinValue * (float)i / 180.0f; lookupValues[j][i] += lookupValues[j-1][i]*(float)j/180.0f; } } public float[][] getValues() { for (int i = 0; i < 360; i++) { lookupValues[0][i] = 0; } for (j = 1; j < 1000; j++) { loopProcess(); } return lookupValues; } }
To implement the SinTable class, we place the code from the inner
loop in the loopDoRange()
method and then call
the loopProcess()
method to process the inner
loop. Since the j
index variable is a read-only
shared variable, it is now an instance variable of the SinTable
class.
Having a loop handler that can be used more than once is also very
important. If we were using the earlier version of the loop handler,
we would have had to create a new instance of the loop handler for
each inner loop that we executed. This means that the code for the
outer loop and the inner loop could not have been in the same class.
Furthermore, we would have had to pass a reference to the
j
variable and lookupValues
array to each instance, since these are shared between the different
inner loop
handlers.