i
i
i
i
i
i
i
i
18.4. Case Studies 875
rendered, and the Z-buffer is saved out to the local memory. Now, in the
second pass, another 30,000 triangles are rendered. Before the per-pixel
processing of a tile starts, the Z-buffer, and possibly the color buffer (and
stencil) for that tile, is read into the on-chip tile buffer external memory.
This multipass method comes at a cost of more bandwidth usage, and
so there is an associated performance hit. Performance could also drop
in some pathological cases. For example, say there are many different and
long pixel shader programs for small triangles that are to be executed inside
a tile. Switching long shader programs often comes with a significant cost.
Such situations seldom happen with normal content.
More and more mobile phones are being equipped with special-purpose
hardware for three-dimensional graphics. Energy-efficient architectures
such as the one described here will continue to be important, since even
with a “miracle” battery (lasting long), the heat has to dissipate through
the cover of the phone. Too much heat will be inconvenient for the user.
This suggests that there is much research to be done for mobile graphics,
and that fundamental changes in the core architecture should be investi-
gated.
18.4.4 Other Architectures
There are many other three-dimensional graphics architectures that have
been proposed and built over the years. Two major systems, Pixel-Planes
and PixelFlow, have been designed and built by the graphics group at
the University of North Carolina, Chapel Hill. Pixel-Planes was built in
the late 1980s [368] and was a sort-middle architecture with tile-based
rendering. PixelFlow [328, 895, 897] was an example of a sort-last image
architecture with deferred shading and programmable shading. Recently,
the group at UNC also has developed the WarpEngine [1024], which is
a rendering architecture for image-based rendering. The primitives used
are images with depth. Owens et al. describe a stream architecture for
polygon rendering, where the pipeline work is divided in time, and each
stage is run in sequence using a single processor [980]. This architecture
can be software programmed at all pipeline stages, giving high flexibility.
The SHARP architecture developed at Stanford uses ray tracing as its
rendering algorithm [10].
The REYES software rendering architecture [196] used in RenderMan
has used stochastic rasterization for a long time now. This technique is
more efficient for motion blur and depth-of-field rendering, among other
advantages. Stochastic rasterization is not trivial to simply move over
to the GPU. However, some research on a new architecture to perform
this algorithm has been done, with some of the problems solved [14]. See
Figure 18.22 for an example of motion blur rendering. One of the con-