i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 555
is represented by sequential triangle strips. Furthermore, this implies that
triangle meshes (Section 12.4.4) can be, at most, twice as efficient as se-
quential triangle strips. At the limit, triangle meshes can store 0.5 vertices
per triangle.
Note that this analysis holds for only closed meshes. As soon as there
are boundary edges (edges not shared between two polygons), the ratio of
triangles to vertices falls. The Euler-Poincar´e formula still holds, but the
outer boundary of the mesh has to be considered a separate (unused) face
bordering all exterior edges.
This is the theory. In practice, vertices are transformed by the GPU and
put in a first-in, first-out (FIFO) cache. This FIFO cache is limited in size,
typically holding between 12 and 24 vertices, though it can range from 4 to
32. For example, the PLAYSTATION
R
3 system holds about 24 vertices,
depending on the number of bytes per vertex (see Section 18.4.2). If an
incoming vertex is already in this cache, the cached results can be used,
providing a significant performance increase. However, if the triangles in
a triangle mesh are sent down in random order, the cache is unlikely to
be useful. Basic triangle strip algorithms essentially optimize for a cache
size of two, i.e., the last two vertices used. Deering and Nelson [243] first
explored the idea of storing vertex data in a larger FIFO cache by using
additional data to determine which vertices to add to the cache.
Hoppe [564] gives an approach that optimizes the ordering of the trian-
gles and their vertices for a given cache size. The idea is to create triangle
strips that link together and so optimize cache reuse, then explore local
perturbations of the resulting order for greater efficiency.
This work introduces an important measurement of cache reuse, the
average cache miss ratio (ACMR). This is the average number of vertices
that need to be processed per triangle. It can range from 3 (every vertex
for every triangle has to be reprocessed each time) to 0.5 (perfect reuse
on a large closed mesh; no vertex is reprocessed). If the cache size is as
large as the mesh itself, the ACMR is identical to the vertex per triangle
computations described earlier. For a given cache size and mesh ordering,
the ACMR can be computed precisely, so describing the efficiency of any
given approach for that cache size.
A limitation of Hoppe’s method is that the cache size has to be known
in advance. If the assumed cache size is larger than the actual cache size,
the resulting mesh can have significantly less benefit.
Sander et al. [1109] use the idea of a minimum cache size to their advan-
tage. As discussed in Section 12.4.4, a vertex will have on average about
six neighboring vertices, for a total of vertices. Modern GPUs can hold
at least seven vertices in the cache at the same time. In other words, this
cluster of six triangles can be rendered in any order, since all the vertices
will fit in the cache. This freedom to choose the ordering allows them to