12. Polygonal Techniques (6/9)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

556 12. Polygonal Techniques

Figure 12.18. An illustration of four generations of Hilbert curves. The thin lines rep-

resent the grid, and the bold lines the Hilbert curve. As can be seen, the curve visits

every grid cell, and it is self-similar at all scales. A nice property is that the Hilbert

curve visits every grid cell within 2 × 2 cells, then within 4 × 4 cells, then within 8 × 8

cells, and so on.

explore nearby triangles as the fan attached to a vertex is being output.

The algorithm is more involved than will be explained here, but is fairly

straightforward to implement and is linear time in execution. Another in-

teresting element of their algorithm is that they attempt to order polygons

essentially from the outside in. By doing so, the object draws faster, as

triangles drawn later are more likely to be hidden by those drawn earlier.

Cache-Oblivious Mesh Layouts

The ideal order for triangles in a mesh is one in which the vertex cache use

is maximized. Solving for diﬀerent-sized caches may yield diﬀerent optimal

orderings. For when the target cache size is unknown, cache-oblivious mesh

layout algorithms have been developed that yield orderings that work well,

regardless of size. Such an ordering is sometimes called a universal index

sequence.

An important idea behind this type of algorithm, as well as other related

algorithms, is the concept of a space-ﬁlling curve. A space-ﬁlling curve is

a single, continuous curve that, without intersecting itself, ﬁlls a square or

rectangle containing a uniform grid by visiting every grid cell once. One

example is the Hilbert curve, which is shown in Figure 12.18, and another

is the Peano curve. A space-ﬁlling curve can have good spatial coherence,

which means that when walking along the curve, you stay close to the

previously visited points, at all scales. The Hilbert curve is an example of

a curve with good spatial coherence. See Sagan’s book [1095] or Voorhies’

article [1309] for more information on this topic.

If a space-ﬁlling curve is used to visit the triangles on a triangle mesh,

then one can expect a high hit ratio in the vertex cache. Inspired by this,

Bogomjakov and Gotsman [125] propose two algorithms, recursive cuts

and minimum linear arrangement, for generating good index sequences for

triangle meshes. For cache sizes from about ten and up, these algorithms

consistently outperform Xiang’s triangle strip method [1393].

12.4. Triangle Fans, Strips, and Meshes 557

Yoon et al. [1395] develop a fast, approximate metric for whether a

local permutation of the triangle ordering would reduce the number of

cache misses. With this formula in hand, they use a multilevel algorithm

to optimize the mesh. They ﬁrst coarsen the mesh by clustering groups of

ﬁve nearby vertices into a single node. The coarsened adjacency graph is

clustered again, recursively, until ﬁve or less vertices are produced. They

then evaluate all possible orderings on these vertices, choosing the best.

Coarsening is then reversed, with each subset of ﬁve vertices exhaustively

searched for the best local permutation. Their algorithm works well with

large, out-of-core meshes of hundreds of millions of triangles. There is also

an open source package called OpenCCL that computes cache-oblivious

mesh layouts using algorithms by Yoon et al.

Forsyth [356] and Lin and Yu [777] provide extremely rapid greedy al-

gorithms that use similar principles. In Forsyth’s method, a vertex-triangle

connectivity map is used throughout, which is faster and simpler to gen-

erate than an edge-based adjacency graph. Vertices are given scores based

on their positions in the cache and by the number of unprocessed triangles

attached to them. The triangle with the highest combined vertex score is

processed next. By scoring the three most recently used vertices a little

lower, the algorithm avoids simply making triangle strips and instead cre-

ates patterns similar to a Hilbert curve. By giving higher scores to vertices

with fewer triangles still attached, the algorithm tends to avoid leaving

isolated triangles behind. The average cache miss ratios achieved are com-

parable to those of more costly and complex algorithms. Lin and Yu’s

method is a little more complex but uses related ideas. For a cache size of

12, the average ACMR for a set of 30 unoptimized models was 1.522; after

optimization, the average dropped below 0.64. This average was lower than

those produced by either Bogomjakov & Gotsman’s or Hoppe’s algorithms.

Given that the best possible ACMR for an inﬁnitely long triangle strip

is 1.0, universal rendering sequences can almost always outperform pure

triangle strip creation algorithms when it comes to vertex cache reuse.

However, there are other factors that can come into play when deciding on

which data format to use. The next section discusses the data structures

most commonly used on modern GPUs.

12.4.5 Vertex and Index Buffers/Arrays

One eﬃcient way to provide a modern graphics accelerator with model

data is by using what OpenGL calls vertex buﬀer objects and DirectX calls

vertex buﬀers. We will go with the DirectX terminology in this section;

most of the concepts map to OpenGL.

The idea of a vertex buﬀer is to store model data in a contiguous chunk

of memory. A vertex buﬀer is an array of vertex data in a particular for-

558 12. Polygonal Techniques

mat. The format speciﬁes whether a vertex contains diﬀuse or specular

colors, a normal, texture coordinates, etc. The size in bytes of a vertex is

called its stride. Alternately, a set of vertex streams can be used. For ex-

ample, one stream could hold an array of positions p

...and another

a separate array of normals n

.... In practice, a single buﬀer contain-

ing all data for each vertex is generally more eﬃcient on modern GPUs,

but not so much that multiple streams should be avoided [1065]. Multiple

streams can help save storage and transfer time. For example, six house

models with diﬀerent lighting due to their surroundings could be repre-

sented by six meshes with a diﬀerent set of illumination colors at each mesh

vertex, merged with a single mesh with positions, normals, and texture

coordinates.

How the vertex buﬀer is accessed is up to the device’s DrawPrimitive

method. The data can be treated as:

1. A list of individual points.

2. A list of unconnected line segments, i.e., pairs of vertices.

3. A single polyline.

4. A triangle list, where each group of three vertices forms a triangle,

e.g., vertices [0, 1, 2] form one, [3, 4, 5] form the next, etc.

5. A triangle fan, where the ﬁrst vertex forms a triangle with each suc-

cessive pair of vertices, e.g., [0, 1, 2], [0, 2, 3], [0, 3, 4].

6. A triangle strip, where every group of three contiguous vertices forms

a triangle, e.g., [0, 1, 2], [1, 2, 3], [2, 3, 4].

In DirectX 10, triangles and triangle strips can also include adjacent trian-

gle vertices, for use by the geometry shader (see Section 3.5).

The indices in an index buﬀer point to vertices in a vertex buﬀer. The

combination of an index buﬀer and vertex buﬀer is used to display the

same types of draw primitives as a “raw” vertex buﬀer. The diﬀerence

is that each vertex in the index/vertex buﬀer combination needs to be

stored only once in its vertex buﬀer, versus repetition that can occur in a

vertex buﬀer without indexing. For example, the triangle mesh structure

can be represented by an index buﬀer; the ﬁrst three indices stored in the

index buﬀer specify the ﬁrst triangle, the next three the second, etc. See

Figure 12.19 for examples of vertex and index buﬀer structures.

Which structure to use is dictated by the primitives and the program.

Displaying a simple rectangle is easily done with a vertex buﬀer using four

vertices as a two-triangle tristrip. One advantage of the index buﬀer is data

sharing. In the previous section the post-transform cache was explained,

12.4. Triangle Fans, Strips, and Meshes 559

Figure 12.19. Diﬀerent ways of deﬁning primitives: separate triangles, or as a vertex

triangle list or triangle strip, or an indexed vertex list or strip.

560 12. Polygonal Techniques

and ways to best use it were described. This vertex cache is available only

when rendering index buﬀers. See Section 18.4.2 for more about vertex

caches.

Index buﬀers also avoid the small batch problem, where the API over-

head of processing a single mesh is considerably less than processing a

number of triangle strips. However, separate triangle strips can be stitched

together into larger vertex buﬀers by duplicating vertices, as discussed in

Section 12.4.

Lastly, the amount of data that needs to be transferred and stored on

the GPU is often smaller when an index buﬀer is used. The small overhead

of including an indexed array is far outweighed by the savings achieved

by sharing vertices. The index buﬀer can contain separate triangles or

tristrips stitched together by duplicating vertices. One strategy for saving

space is to ﬁnd a cache-eﬃcient ordering for the triangle set, then determine

which structure is more space eﬃcient by examining how many tristrips the

ordering forms. The rule is that a set of tristrips with an average of two

triangles or more will take less storage than using an indexed triangle list.

With the DirectX 10 restart index form, this average drops to 1.5 triangles

per tristrip.

An index and vertex buﬀer provide a way of describing a polygonal

mesh. However, the data is typically stored with the goal of rendering ef-

ﬁciency, not compact storage. For example, one way to store a cube is to

save its eight corner locations in one array, and its six diﬀerent normals in

another, along with the six four-index loops that deﬁne its faces. This com-

pact representation is used in many model ﬁle formats, such as Wavefront

OBJ and VRML. On the GPU, a vertex buﬀer would be expanded out

and store 24 diﬀerent vertices, as each corner location has three separate

normals, depending on the face. The index buﬀer would store indices (as

triplets or in strips) deﬁning the 12 triangles forming the surface. More

compact schemes are possible by storing the mesh in texture maps and

using the vertex shader’s texture fetch mechanism [151], but such usage is

not common.

There are higher-level methods for allocating and using vertex and index

buﬀers to achieve greater eﬃciency. For example, a buﬀer can be stored

on the GPU for use each frame, and multiple instances and variations of

an object can be generated from the same buﬀer. Section 15.4.2 discusses

such techniques in depth.

Shader Model 4.0 and its ability to output processed vertices to a new

buﬀer allow more malleable ways to treat vertex buﬀers. For example, a

vertex buﬀer describing a triangle mesh could be treated as a simple set

of points in an initial pass. The vertex shader could be used to perform

per-vertex computations as desired, with the results sent to a new vertex

buﬀer using stream out. On a subsequent pass, this new vertex buﬀer could

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12. Polygonal Techniques (6/9)

Create new playlist

Sign In

Sign Up

Table of Contents for
12. Polygonal Techniques (6/9)