i
i
i
i
i
i
i
i
702 15. Pipeline Optimization
is in place. A smaller screen is likely to also simplify the models displayed,
lessening the load on the geometry stage.
Another approach is the same as that taken with vertex shader pro-
grams, to add more instructions to see the effect on execution speed. Again,
it is important to determine that these additional instructions were not op-
timized away by the compiler.
15.3 Performance Measurements
Before delving into different ways of optimizing the performance of the
graphics pipeline, a brief presentation of performance measurements will
be given. One way to express the performance of the geometry stage is
in terms of vertices per second. As in the geometry stage, one way to
express the fill rate of the rasterizer stage is in terms of pixels per second
(or sometimes texels per second ).
Note that graphics hardware manufacturers often present peak rates,
which are at best hard to reach. Also, since we are dealing with a pipelined
system, true performance is not as simple as listing these kinds of numbers.
This is because the location of the bottleneck may move from one time to
another, and the different pipeline stages interact in different ways during
execution. It is educational to find out what the peak rates represent, and
try to duplicate them: Usually you will find the bus getting in the way,
but the exercise is guaranteed to generate insights into the architecture like
nothing else can.
When it comes to measuring performance for CPUs, the trend has been
to avoid IPS (instructions per second), FLOPS (floating point operations
per second), gigahertz, and simple short benchmarks. Instead, the pre-
ferred method is to use clock cycle counters [1], or to measure wall clock
times for a range of different, real programs [541], and then compare the
running times for these. Following this trend, most independent bench-
marks instead measure the actual frame rate in fps for a number of given
scenes, and for a number of different screen resolutions, antialiasing set-
tings, etc. Many graphics-heavy games include a benchmarking mode or
have one created by a third party, and these benchmarks are commonly
used in comparing GPUs.
To be able to see the potential effects of pipeline optimization, it is
important to measure the total rendering time per frame with double
buffering disabled, i.e., in single-buffer mode. This is because with double
buffering turned on, swapping of the buffers occurs only in synchroniza-
tion with the frequency of the monitor, as explained in the example in
Section 2.1. Other benchmarking tips and numerous optimizations are
provided in NVIDIA’s [946] and ATI’s guides [1400].