i
i
i
i
i
i
i
i
556 12. Polygonal Techniques
Figure 12.18. An illustration of four generations of Hilbert curves. The thin lines rep-
resent the grid, and the bold lines the Hilbert curve. As can be seen, the curve visits
every grid cell, and it is self-similar at all scales. A nice property is that the Hilbert
curve visits every grid cell within 2 × 2 cells, then within 4 × 4 cells, then within 8 × 8
cells, and so on.
explore nearby triangles as the fan attached to a vertex is being output.
The algorithm is more involved than will be explained here, but is fairly
straightforward to implement and is linear time in execution. Another in-
teresting element of their algorithm is that they attempt to order polygons
essentially from the outside in. By doing so, the object draws faster, as
triangles drawn later are more likely to be hidden by those drawn earlier.
Cache-Oblivious Mesh Layouts
The ideal order for triangles in a mesh is one in which the vertex cache use
is maximized. Solving for different-sized caches may yield different optimal
orderings. For when the target cache size is unknown, cache-oblivious mesh
layout algorithms have been developed that yield orderings that work well,
regardless of size. Such an ordering is sometimes called a universal index
sequence.
An important idea behind this type of algorithm, as well as other related
algorithms, is the concept of a space-filling curve. A space-filling curve is
a single, continuous curve that, without intersecting itself, fills a square or
rectangle containing a uniform grid by visiting every grid cell once. One
example is the Hilbert curve, which is shown in Figure 12.18, and another
is the Peano curve. A space-filling curve can have good spatial coherence,
which means that when walking along the curve, you stay close to the
previously visited points, at all scales. The Hilbert curve is an example of
a curve with good spatial coherence. See Sagan’s book [1095] or Voorhies’
article [1309] for more information on this topic.
If a space-filling curve is used to visit the triangles on a triangle mesh,
then one can expect a high hit ratio in the vertex cache. Inspired by this,
Bogomjakov and Gotsman [125] propose two algorithms, recursive cuts
and minimum linear arrangement, for generating good index sequences for
triangle meshes. For cache sizes from about ten and up, these algorithms
consistently outperform Xiang’s triangle strip method [1393].
i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 557
Yoon et al. [1395] develop a fast, approximate metric for whether a
local permutation of the triangle ordering would reduce the number of
cache misses. With this formula in hand, they use a multilevel algorithm
to optimize the mesh. They first coarsen the mesh by clustering groups of
five nearby vertices into a single node. The coarsened adjacency graph is
clustered again, recursively, until five or less vertices are produced. They
then evaluate all possible orderings on these vertices, choosing the best.
Coarsening is then reversed, with each subset of five vertices exhaustively
searched for the best local permutation. Their algorithm works well with
large, out-of-core meshes of hundreds of millions of triangles. There is also
an open source package called OpenCCL that computes cache-oblivious
mesh layouts using algorithms by Yoon et al.
Forsyth [356] and Lin and Yu [777] provide extremely rapid greedy al-
gorithms that use similar principles. In Forsyth’s method, a vertex-triangle
connectivity map is used throughout, which is faster and simpler to gen-
erate than an edge-based adjacency graph. Vertices are given scores based
on their positions in the cache and by the number of unprocessed triangles
attached to them. The triangle with the highest combined vertex score is
processed next. By scoring the three most recently used vertices a little
lower, the algorithm avoids simply making triangle strips and instead cre-
ates patterns similar to a Hilbert curve. By giving higher scores to vertices
with fewer triangles still attached, the algorithm tends to avoid leaving
isolated triangles behind. The average cache miss ratios achieved are com-
parable to those of more costly and complex algorithms. Lin and Yu’s
method is a little more complex but uses related ideas. For a cache size of
12, the average ACMR for a set of 30 unoptimized models was 1.522; after
optimization, the average dropped below 0.64. This average was lower than
those produced by either Bogomjakov & Gotsman’s or Hoppe’s algorithms.
Given that the best possible ACMR for an infinitely long triangle strip
is 1.0, universal rendering sequences can almost always outperform pure
triangle strip creation algorithms when it comes to vertex cache reuse.
However, there are other factors that can come into play when deciding on
which data format to use. The next section discusses the data structures
most commonly used on modern GPUs.
12.4.5 Vertex and Index Buffers/Arrays
One efficient way to provide a modern graphics accelerator with model
data is by using what OpenGL calls vertex buffer objects and DirectX calls
vertex buffers. We will go with the DirectX terminology in this section;
most of the concepts map to OpenGL.
The idea of a vertex buffer is to store model data in a contiguous chunk
of memory. A vertex buffer is an array of vertex data in a particular for-
i
i
i
i
i
i
i
i
558 12. Polygonal Techniques
mat. The format specifies whether a vertex contains diffuse or specular
colors, a normal, texture coordinates, etc. The size in bytes of a vertex is
called its stride. Alternately, a set of vertex streams can be used. For ex-
ample, one stream could hold an array of positions p
0
p
1
p
2
...and another
a separate array of normals n
0
n
1
n
2
.... In practice, a single buffer contain-
ing all data for each vertex is generally more efficient on modern GPUs,
but not so much that multiple streams should be avoided [1065]. Multiple
streams can help save storage and transfer time. For example, six house
models with different lighting due to their surroundings could be repre-
sented by six meshes with a different set of illumination colors at each mesh
vertex, merged with a single mesh with positions, normals, and texture
coordinates.
How the vertex buffer is accessed is up to the device’s DrawPrimitive
method. The data can be treated as:
1. A list of individual points.
2. A list of unconnected line segments, i.e., pairs of vertices.
3. A single polyline.
4. A triangle list, where each group of three vertices forms a triangle,
e.g., vertices [0, 1, 2] form one, [3, 4, 5] form the next, etc.
5. A triangle fan, where the first vertex forms a triangle with each suc-
cessive pair of vertices, e.g., [0, 1, 2], [0, 2, 3], [0, 3, 4].
6. A triangle strip, where every group of three contiguous vertices forms
a triangle, e.g., [0, 1, 2], [1, 2, 3], [2, 3, 4].
In DirectX 10, triangles and triangle strips can also include adjacent trian-
gle vertices, for use by the geometry shader (see Section 3.5).
The indices in an index buffer point to vertices in a vertex buffer. The
combination of an index buffer and vertex buffer is used to display the
same types of draw primitives as a raw” vertex buffer. The difference
is that each vertex in the index/vertex buffer combination needs to be
stored only once in its vertex buffer, versus repetition that can occur in a
vertex buffer without indexing. For example, the triangle mesh structure
can be represented by an index buffer; the first three indices stored in the
index buffer specify the first triangle, the next three the second, etc. See
Figure 12.19 for examples of vertex and index buffer structures.
Which structure to use is dictated by the primitives and the program.
Displaying a simple rectangle is easily done with a vertex buffer using four
vertices as a two-triangle tristrip. One advantage of the index buffer is data
sharing. In the previous section the post-transform cache was explained,
i
i
i
i
i
i
i
i
12.4. Triangle Fans, Strips, and Meshes 559
Figure 12.19. Different ways of defining primitives: separate triangles, or as a vertex
triangle list or triangle strip, or an indexed vertex list or strip.
i
i
i
i
i
i
i
i
560 12. Polygonal Techniques
and ways to best use it were described. This vertex cache is available only
when rendering index buffers. See Section 18.4.2 for more about vertex
caches.
Index buffers also avoid the small batch problem, where the API over-
head of processing a single mesh is considerably less than processing a
number of triangle strips. However, separate triangle strips can be stitched
together into larger vertex buffers by duplicating vertices, as discussed in
Section 12.4.
Lastly, the amount of data that needs to be transferred and stored on
the GPU is often smaller when an index buffer is used. The small overhead
of including an indexed array is far outweighed by the savings achieved
by sharing vertices. The index buffer can contain separate triangles or
tristrips stitched together by duplicating vertices. One strategy for saving
space is to find a cache-efficient ordering for the triangle set, then determine
which structure is more space efficient by examining how many tristrips the
ordering forms. The rule is that a set of tristrips with an average of two
triangles or more will take less storage than using an indexed triangle list.
With the DirectX 10 restart index form, this average drops to 1.5 triangles
per tristrip.
An index and vertex buffer provide a way of describing a polygonal
mesh. However, the data is typically stored with the goal of rendering ef-
ficiency, not compact storage. For example, one way to store a cube is to
save its eight corner locations in one array, and its six different normals in
another, along with the six four-index loops that define its faces. This com-
pact representation is used in many model file formats, such as Wavefront
OBJ and VRML. On the GPU, a vertex buffer would be expanded out
and store 24 different vertices, as each corner location has three separate
normals, depending on the face. The index buffer would store indices (as
triplets or in strips) defining the 12 triangles forming the surface. More
compact schemes are possible by storing the mesh in texture maps and
using the vertex shader’s texture fetch mechanism [151], but such usage is
not common.
There are higher-level methods for allocating and using vertex and index
buffers to achieve greater efficiency. For example, a buffer can be stored
on the GPU for use each frame, and multiple instances and variations of
an object can be generated from the same buffer. Section 15.4.2 discusses
such techniques in depth.
Shader Model 4.0 and its ability to output processed vertices to a new
buffer allow more malleable ways to treat vertex buffers. For example, a
vertex buffer describing a triangle mesh could be treated as a simple set
of points in an initial pass. The vertex shader could be used to perform
per-vertex computations as desired, with the results sent to a new vertex
buffer using stream out. On a subsequent pass, this new vertex buffer could
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset