15. Pipeline Optimization (6/6)

722 15. Pipeline Optimization

30 ms, and the speedup is a factor of three (30/10), resulting in 100 frames

per second.

Now, a parallel version of the same program would also divide the jobs

into three work packages, but these three packages will execute at the same

time on the three CPUs. This means that the latency will be 10 ms, and

the work for one frame will also take 10 ms. The conclusion is that the

latency is much shorter when using parallel processing than when using a

multiprocessor pipeline. However, parallel processing often does not map

to hardware constraints. For example, DirectX 10 and earlier allow only

one thread to access the GPU at a time, so parallel processing for the

DRAW stage is more diﬃcult [1057].

XAMPLE:VALVE’S GAME ENGINE PIPELINE. Valve reworked their engine

to take advantage of multicore systems. They found a mix of coarse and

ﬁne-grained threading worked well. Coarse-grained threading is where an

entire subsystem, such as artiﬁcial intelligence or sound, is handled by a

core. Fine-grained uses parallel processing to split a single task among a

set of cores. For the tasks related to rendering, their new pipeline acts as

follows [1057]:

1. Construct scene rendering lists for multiple scenes in parallel (world,

water reﬂections, and TV monitors).

2. Overlap graphics simulations (particles, ropes).

3. Compute character skeletal transformations for all characters in all

scenes in parallel.

4. Compute shadows for all characters in parallel.

5. Allow multiple threads to draw in parallel (which required a rewrite

of low-level graphics libraries). 

Table of Contents for 15. Pipeline Optimization (6/6)