i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 551
Figure 12.14. We woul d have to s end (v
0
, v
1
, v
2
, v
3
, v
2
, v
4
, v
5
, v
6
) to the graphics
pipeline to use these triangles as a strip. As can be seen, a swap has been implemented
by including v
2
twice in the list.
there is an actual command for doing a swap, but in OpenGL [969] and
DirectX, a swap command must be implemented by resending a vertex,
and thus there is a penalty of one vertex per swap. This implementation
ofaswapresultsinatrianglewithnoarea. Sincestartinganewtri-
angle strip costs two additional vertices, accepting the penalty imposed
by a swap is still better than restarting. A triangle strip wanting to
send the vertices (v
0
, v
1
, v
2
, v
3
, swap, v
4
, v
5
, v
6
) could be implemented as
(v
0
, v
1
, v
2
, v
3
, v
2
, v
4
, v
5
, v
6
), where the swap has been implemented by re-
sending vertex v
2
. This is shown in Figure 12.14. Starting in DirectX
10, a special 1” restart index is available. This has the same effect as
resending two vertices, but costs only one.
Even with the advantages of data sharing afforded by strips, it is impor-
tant to avoid the small batch problem [1054]. The nature of this problem
is described further in Section 15.4.2. The basic idea is that there is over-
head associated with each separate API draw call, so it is important to
Figure 12.15. Two triangle strips are joined by replicating the end vertex d and beginning
vertex p, forming four degenerate triangles. (Illustration after Huddy [573].)
i
i
i
i
i
i
i
i
552 12. Polygonal Techniques
minimize the number of calls overall. Putting as many triangles as possible
in a single buffer is recommended [261]. Normally, no single triangle strip
can hold all triangles. So to connect two strips, what is done is to double
the last vertex of one strip and the first vertex of the next strip. Doing
so has the effect of creating four degenerate triangles that are never seen
(except in wireframe). See Figure 12.15. Hardware is efficient at detecting
and deleting these degenerate triangles [351].
12.4.3 Creating Strips
Given an arbitrary triangle mesh, it can be useful to decompose it efficiently
into triangle strips. A visualization of the triangle strips of a model is shown
in Figure 12.16. There are free software packages for performing this task,
e.g., OpenSG [972] and NvTriStrip, but it is informative to understand the
principles behind these algorithms.
Obtaining optimal triangle strips has been shown to be an NP-complete
problem [33], and therefore, we have to be satisfied with heuristic methods
that come close to the lower bound on the number of strips. This section
will present one greedy method for constructing sequential triangles strips.
First, some background information is in order.
Every triangle strip creation algorithm starts by creating an adjacency
graph, with each triangle knowing its neighbors. The number of neighbors
of a polygon is called its degree and is an integer between zero and the
number of vertices of the polygon. A dual graph is a visualization of an
adjacency graph, where each triangle is treated as a node and each shared
edge defines a line segment connecting two nodes. See Figure 12.17. Using
the dual graph permits stripification to be thought of in terms of graph
traversal.
The goal of stripification algorithms is to minimize the total number of
strips generated, not to form the longest strips. For example, imagine a
Figure 12.16. Triangle strips formed on various surfaces. (Images from Martin Isenburg’s
triangle strip compression program.)
i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 553
Figure 12.17. Left: a triangle mesh. Right: the dual graph (brown and blue lines) of the
mesh to the left. Connected brown lines indicate the four different triangle strips in this
example.
mesh consists of 100 triangles. One strip of 95 triangles with 5 separate
triangles left over gives a total of 112 vertices to transfer. Five strips of 20
triangles each gives 110 vertices total. For this reason, one useful strategy
is to begin with a low-degree triangle and grow its strip among the lower-
degree candidates [8, 1063]. The SGI algorithm was one of the first strip
creation algorithms, and we present it here because of its relative simplicity.
Improved SGI Stripping Algorithm
Greedy algorithms are optimization methods that make choices that are
locally optimal; they choose what looks best at the moment [201]. The
SGI algorithm [8] is greedy, in the sense that the beginning triangle it
chooses always has the lowest degree (lowest number of neighbors). With
some improvements [176], the SGI algorithm works like this:
1. Choose a starting triangle.
2. Build three different triangle strips: one for each edge of the triangle.
3. Extend these triangle strips in the opposite direction.
4. Choose the longest of these three strips; discard the others.
5. Repeat Step 1 until all triangles are included in a strip.
In Step 1, the original SGI algorithm picks any triangle of lowest degree
(greater than zero, since degree-zero triangles are unconnected to others).
If there is a tie—that is, if there is more than one triangle with the same
lowest degree—then the algorithm looks at the neighbors’ neighbors for
their degrees. If again there is no unambiguous way to make a selection,
a triangle is arbitrarily selected. Finally, the strip is extended as much as
possible in both its start and end directions, making each strip potentially
i
i
i
i
i
i
i
i
554 12. Polygonal Techniques
longer. The idea behind this method is to catch any low-degree triangles,
so that they do not become isolated.
A linear time algorithm can be implemented by using a priority queue
for finding the starting triangle of each new strip [8]. However, Chow
reports that choosing any arbitrary triangle works almost as well [176].
Gumhold and Straßer report the same for their related geometry compres-
sion algorithm [468]. Steps 2 through 4 guarantee that the longest strip in
the current mesh containing the starting triangle is found.
Xiang et al. [1393] present search algorithms of the dual graph, using dy-
namic programming. Reuter et al. [1063] present a variation of the STRIPE
algorithm [321]. By paying careful attention to data locality and neighbor
finding, their results were comparable in quality to software from Xiang et
al. [1393] and about three times as fast to compute. Lord and Brown [794]
provide a good overview of research to date, while also exploring genetic
algorithms for solving the problem.
12.4.4 Triangle Meshes
Triangle fans and strips have their uses, but the trend is to use triangle
meshes as much as possible. Strips and fans allow some data sharing, but
meshes allow full sharing.
The Euler-Poincar´eformulafor connected planar graphs [82], which is
shown in Equation 12.8, helps in determining the average number of vertices
that form a closed mesh:
v e + f +2g =2. (12.8)
Here v is the number of vertices, e is the number of edges, f is the number
of faces,
4
and g is the genus. The genus is the number of holes in the
object. As an example, a sphere has genus 0 and a torus has genus 1.
Since every edge has two faces, and every face has at least three edges
(exactly three for triangles), 2e 3f holds. Inserting this fact into the
formula and simplifying yields f 2v 4. If all faces are triangles, then
2e =3f f =2v 4.
For large closed triangle meshes, the rule of thumb then is that the
number of triangles is about equal to twice the number of vertices. Sim-
ilarly, each vertex is connected to an average of nearly six triangles (and,
therefore, six edges). What is interesting is that the actual network of
the mesh is irrelevant; a randomly connected triangle mesh has the same
average characteristics as a regular grid of triangles, though the variance
differs. Since the average number of vertices per triangle in a strip ap-
proaches one, every vertex has to be sent twice (on average) if a large mesh
4
Each face is assumed to have one loop. If faces can have multiple loops, the formula
becomes v e +2f l +2g =2,wherel is the number of loops.
i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 555
is represented by sequential triangle strips. Furthermore, this implies that
triangle meshes (Section 12.4.4) can be, at most, twice as efficient as se-
quential triangle strips. At the limit, triangle meshes can store 0.5 vertices
per triangle.
Note that this analysis holds for only closed meshes. As soon as there
are boundary edges (edges not shared between two polygons), the ratio of
triangles to vertices falls. The Euler-Poincar´e formula still holds, but the
outer boundary of the mesh has to be considered a separate (unused) face
bordering all exterior edges.
This is the theory. In practice, vertices are transformed by the GPU and
put in a first-in, first-out (FIFO) cache. This FIFO cache is limited in size,
typically holding between 12 and 24 vertices, though it can range from 4 to
32. For example, the PLAYSTATION
R
3 system holds about 24 vertices,
depending on the number of bytes per vertex (see Section 18.4.2). If an
incoming vertex is already in this cache, the cached results can be used,
providing a significant performance increase. However, if the triangles in
a triangle mesh are sent down in random order, the cache is unlikely to
be useful. Basic triangle strip algorithms essentially optimize for a cache
size of two, i.e., the last two vertices used. Deering and Nelson [243] first
explored the idea of storing vertex data in a larger FIFO cache by using
additional data to determine which vertices to add to the cache.
Hoppe [564] gives an approach that optimizes the ordering of the trian-
gles and their vertices for a given cache size. The idea is to create triangle
strips that link together and so optimize cache reuse, then explore local
perturbations of the resulting order for greater efficiency.
This work introduces an important measurement of cache reuse, the
average cache miss ratio (ACMR). This is the average number of vertices
that need to be processed per triangle. It can range from 3 (every vertex
for every triangle has to be reprocessed each time) to 0.5 (perfect reuse
on a large closed mesh; no vertex is reprocessed). If the cache size is as
large as the mesh itself, the ACMR is identical to the vertex per triangle
computations described earlier. For a given cache size and mesh ordering,
the ACMR can be computed precisely, so describing the efficiency of any
given approach for that cache size.
A limitation of Hoppe’s method is that the cache size has to be known
in advance. If the assumed cache size is larger than the actual cache size,
the resulting mesh can have significantly less benefit.
Sander et al. [1109] use the idea of a minimum cache size to their advan-
tage. As discussed in Section 12.4.4, a vertex will have on average about
six neighboring vertices, for a total of vertices. Modern GPUs can hold
at least seven vertices in the cache at the same time. In other words, this
cluster of six triangles can be rendered in any order, since all the vertices
will fit in the cache. This freedom to choose the ordering allows them to
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset