i
i
i
i
i
i
i
i
722 15. Pipeline Optimization
30 ms, and the speedup is a factor of three (30/10), resulting in 100 frames
per second.
Now, a parallel version of the same program would also divide the jobs
into three work packages, but these three packages will execute at the same
time on the three CPUs. This means that the latency will be 10 ms, and
the work for one frame will also take 10 ms. The conclusion is that the
latency is much shorter when using parallel processing than when using a
multiprocessor pipeline. However, parallel processing often does not map
to hardware constraints. For example, DirectX 10 and earlier allow only
one thread to access the GPU at a time, so parallel processing for the
DRAW stage is more difficult [1057].
E
XAMPLE:VALVES GAME ENGINE PIPELINE. Valve reworked their engine
to take advantage of multicore systems. They found a mix of coarse and
fine-grained threading worked well. Coarse-grained threading is where an
entire subsystem, such as artificial intelligence or sound, is handled by a
core. Fine-grained uses parallel processing to split a single task among a
set of cores. For the tasks related to rendering, their new pipeline acts as
follows [1057]:
1. Construct scene rendering lists for multiple scenes in parallel (world,
water reflections, and TV monitors).
2. Overlap graphics simulations (particles, ropes).
3. Compute character skeletal transformations for all characters in all
scenes in parallel.
4. Compute shadows for all characters in parallel.
5. Allow multiple threads to draw in parallel (which required a rewrite
of low-level graphics libraries).
Further Reading and Resources
Though a little dated, Cebenoyan’s article [164] gives an overview of how
to find the bottleneck and techniques to improve efficiency. NVIDIA’s
extensive guide [946] covers a wide range of topics, and ATI’s newer pre-
sentation [1400] provides a number of insights into some of the subtleties
of various architectures. Some of the better optimization guides for C++
are Fog’s [347] and Isensee’s [591], free on the web.
There are a number of tools that make the task of performance tuning
and debugging much simpler. These include Microsoft’s PIX, NVIDIA’s
NVPerfHUD,ATIsGPU PerfStudio,andgDEBugger for OpenGL. See
http://www.realtimerendering.com for a current list. Zarge et al. [1401]
i
i
i
i
i
i
i
i
15.5. Multiprocessing 723
discusses GPU PerfStudio and optimizing performance specifically for Di-
rectX 10. Futuremark’s 3DMark benchmarking suites provides information
about the capabilities of your PC system, while also being entertaining to
watch.
Issues that arise in the design of a parallel graphics API are treated
by Igehy et al. [581]. See the book Parallel Computer Architecture: A
Hardware/Software Approach [212] for more information on parallel pro-
gramming.
i
i
i
i
i
i
i
i
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset