i
i
i
i
i
i
i
i
18.3. Architecture 841
host, i.e., on the CPU(s). The interface between the application and the
graphics hardware is called a driver.
The GPU is often thought of in different terms than the CPU. A GPU
is not a serial processor like the CPU, but is rather a dataflow or stream
processor. It is the appearance of data at the beginning of the pipeline
that causes computation to occur, and a limited set of inputs is needed to
perform the computation. This different processing model lends itself in
particular to problems where each datum is affected by only nearby data.
One active area of research is how to apply the GPU to non-rendering
problems of this type, such as fluid flow computation and collision de-
tection. This form of computation is called general purpose computation
on the GPU, abbreviated as GPGPU.AMD’sCTM [994] and NVIDIA’s
CUDA [211] are toolkits that aid the programmer in using the GPU for
non-graphical applications, as well as alternative rendering methods such
as ray tracing. Other architectures, such as Intel’s Larrabee [1221], aim
to provide graphics acceleration using CPU-centric processors with texture
samplers. The Larrabee architecture is expected to be able to attack a
wider range of problems that can benefit from parallelization.
Pipelining and Parallelization
Two ways to achieve faster processing in graphics hardware are pipelining
and parallelism. These techniques are most often used in conjunction with
each other. See Section 2.1 for the basics of pipelining. When discussing
pipelining in this section, we do not mean the conceptual or the functional
stages, but rather the physical pipeline stages, i.e., the actual blocks built
in silicon that are executed in parallel. It is worth repeating that divid-
ing a certain function (for example, the lighting computation of a vertex)
into n pipeline stages ideally gives a performance increase of n. In general,
graphics hardware for polygon rendering is much simpler to pipeline than
a CPU. For example, the Intel Pentium IV has 20 pipeline stages, which
should be compared to NVIDIA’s GeForce3, which has 600–800 stages (de-
pending on which path is taken). The reasons for this are many, including
that the CPU is operation-driven, while most of the GPU is data-driven,
that pixels are rendered independent of each other, and that some parts of
the GPU are highly specialized and fixed-function (not programmable).
The clock frequency of a CPU is often much higher than that of a graph-
ics accelerator, but this hardly touches the huge advantage that modern
accelerators have over the CPU when used correctly. CPUs typically have
much higher clock frequencies because, among other reasons, CPU design-
ers spend much more time on optimizing the blocks. One reason for this is
because CPUs have tens of pipeline stages, compared to hundreds or thou-
sands for GPUs. Also, for GPUs, the clock frequency is most often not the