14. Acceleration Algorithms (7/11)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

14.6. Occlusion Culling 675

for more on latency). Hence, this GPU-based occlusion culling method is

worthwhile when the bounding boxes contain a large number of objects

and a relatively large amount of occlusion is occurring.

On HP’s VISUALIZE fx hardware (circa 2000), bounding boxes are

created automatically and queried for suﬃciently long display lists [213].

It has been shown that using tighter bounding volumes can speed up ren-

dering. Bartz et al. [69] achieved a 50 percent increase in frame rate using

k-DOPs (26-DOPs) for mechanical CAD models (where the interiors of the

objects often are complex).

The GPU’s occlusion query has been used as the basic building block

for a number of algorithms. Meißner et al. [851] use occlusion queries in a

hierarchical setting. The scene is represented in a hierarchical data struc-

ture, such as a BVH or octree. First, view frustum culling is used to ﬁnd

nodes not fully outside the view frustum. These are sorted, based on a

node’s center point (for example, the center of a bounding box), in front-

to-back order. The nearest leaf node is rendered without occlusion testing

to the frame buﬀer. Using the occlusion query, the BVs of subsequent

objects are tested. If a BV is visible, its contents are tested recursively,

or rendered. Klosowski and Silva have developed a constant-frame-rate al-

gorithm using an occlusion culling algorithm, called the prioritized-layered

projection algorithm [674]. However, at ﬁrst this was not a conservative

algorithm, i.e., it sacriﬁced image quality in order to keep a constant frame

rate. Later, they developed a conservative version of the algorithm, using

occlusion queries [675].

The research just described was done with graphics hardware that had a

serious limitation: When an occlusion query was made (which was limited

to a boolean), the CPU’s execution was stalled until the query result was

returned. Modern GPUs have adopted a model in which the CPU can send

oﬀ any number of queries to the GPU, then periodically check to see if any

results are available. For its part, the GPU performs each query and puts

the result in a queue. The queue check by the CPU is extremely fast, and

theCPUcancontinuetosenddownqueries or actual renderable objects

without having to stall.

How to eﬀectively use this model of a queue of queries is an active area

of research. The problem is that a query must be done after some actual

geometry has been rendered to the screen, so that the bounding box tested

is more likely to be occluded, but not so late in the rendering process

that the CPU is left waiting for the results of the query. Among other

optimizations, Sekulic [1145] recommends taking advantage of temporal

coherence. Objects are tested for occlusion, but the results are not checked

until the next frame. If an object was found to be occluded in the previous

frame, it is not rendered this frame, but it is tested for occlusion again.

This gives an approximate occlusion test, since an object could be visible

676 14. Acceleration Algorithms

but not rendered for a frame, but with a high frame rate, the problem is

not that serious.

Objects that were visible in the previous frame are handled in a clever

way. First, the object itself is used for the query: The object is rendered

normally, while also issuing a query for visibility. This technique avoids

having to render the test bounding box itself. Next, visible objects do not

need to all issue queries every frame. By having a visible object tested for

occlusion every few frames, fewer occlusion queries need to be dealt with

each frame, at the small cost of occasionally rendering an object that could

have been found to be occluded.

Bittner and Wimmer [92, 1360] wish to avoid any errors from temporal

coherence, while reaping its beneﬁts. Their approach starts with travers-

ing a hierarchical eﬃciency structure, rendering objects in a front-to-back

order. Only leaf nodes hold objects in their scheme. Leaf nodes and previ-

ously hidden internal nodes are tested for occlusion. Visibility states from

frame to frame are compared, with changes potentially causing further hi-

erarchy updates and leaf node renderings. Despite the fact that internal

nodes are also sometimes queried using this approach, the authors’ ap-

proach guarantees that the total number of queries will never be greater

than the number of leaf nodes. The major beneﬁt of their technique is

that much rendering time can be saved when internal nodes are found to

be occluded.

The number of queries that are worth performing in a frame is limited,

and each query costs some time. As such, queries should be performed

on objects most likely to be occluded. Koval`e´ık and Sochor [694] collect

running statistics on queries over a number of frames for each object while

the application is running. The number of frames in which an object was

found to be hidden aﬀects how often it is tested for occlusion in the future.

That is, objects that are visible are likely to stay visible, and so can be

tested less frequently. Hidden objects get tested every frame, if possible,

since these objects are most likely to beneﬁt from occlusion queries. Guthe

et al. [470] likewise explore performing a running statistical analysis, and

their system adapts to the underlying GPU.

The schemes discussed here give a ﬂavor of the potential and problems

with occlusion culling methods. When to use occlusion queries, or most

occlusion schemes in general, is not often clear. If everything is visible, an

occlusion algorithm can only cost additional time, never save it. Rapidly

determining that the algorithm is not helping, and so cutting back on

its fruitless attempts to save time, is one challenge. Another problem is

deciding what set of objects to use as occluders. The ﬁrst objects rendered

that are inside the frustum must be visible, so spending queries on these is

wasteful. This problem of choosing occluders is a challenge for most of the

algorithms in this section.

14.6. Occlusion Culling 677

Nonetheless, hardware occlusion query systems have proven to be use-

ful, especially when depth complexity is high. Statistical methods track

whether and when occlusion testing is likely to be eﬀective, so they have

good adaptive properties. Systems based on this basic building block of

the occlusion query are by far the most commonly used occlusion culling

schemesusedwithmodernGPUs.Therest of this section will therefore be

limited to discussion of one algorithm that has been important in its eﬀect

on the GPU’s architecture, with a brief survey of the rest.

14.6.2 Hierarchical Z -Buffering

Hierarchical Z-buﬀering (HZB), an algorithm developed by Greene et al.,

[450, 452] has had a signiﬁcant inﬂuence on occlusion culling research.

Though the CPU-side form presented here is rarely used, the algorithm

is the basis for the GPU hardware method of Z-culling (Section 18.3.7).

The algorithm maintains the scene model in an octree, and a frame’s

Z-buﬀer as an image pyramid, which we call a Z-pyramid—the algorithm

thus operates in image space. The octree enables hierarchical culling of

occluded regions of the scene, and the Z-pyramid enables hierarchical Z-

buﬀering of individual primitives and bounding volumes. The Z-pyramid

is thus the occlusion representation of this algorithm. Examples of these

data structures are shown in Figure 14.19. Any method can be employed

for organizing scene primitives in an octree, although Greene et al. [450]

recommend a speciﬁc algorithm that avoids assigning small primitives to

large octree nodes.

Now we will describe how the Z-pyramid is maintained and how it is

used to accelerate culling. The ﬁnest (highest-resolution) level of the Z-

pyramid is simply a standard Z-buﬀer. At all other levels, each z-value is

the farthest z in the corresponding 2 ×2 window of the adjacent ﬁner level.

Therefore, each z-value represents the farthest z for a square region of the

screen. Whenever a z-value is overwritten in the Z-buﬀer, it is propagated

through the coarser levels of the Z-pyramid. This is done recursively until

the top of the image pyramid is reached, where only one z-value remains.

Pyramid formation is illustrated in Figure 14.20.

Hierarchical culling of octree nodes is done as follows. Traverse the oc-

tree nodes in a rough front-to-back order. A bounding box of the octree

is tested against the Z-pyramid using an extended occlusion query (Sec-

tion 14.6.1). We begin testing at the coarsest Z-pyramid cell that encloses

the box’s screen projection. The box’s nearest depth within the cell (z

near

)

is then compared to the Z-pyramid value, and if z

near

is farther, the box

is known to be occluded. This testing continues recursively down the Z-

pyramid until the box is found to be occluded, or until the bottom level of

the Z-pyramid is reached, at which point the box is known to be visible.

678 14. Acceleration Algorithms

Figure 14.19. Example of occlusion culling with the HZB algorithm [450, 452], showing

a complex scene (lower right) with the corresponding Z-pyramid (on the left), and

octree subdivision (upper right). By traversing the octree from front-to-back and culling

occluded octree nodes as they are encountered, this algorithm visits only visible octree

nodes and their children (the nodes portrayed at the upper right) and renders only the

polygons in visible boxes. In this example, culling of occluded octree nodes reduces the

depth complexity from 84 to 2.5. (Image courtesy of Ned Greene/Apple Computer.)

Figure 14.20. On the left, a 4 × 4 piece of the Z-buﬀer is shown. The numerical values

are the actual z-values. This is downsampled to a 2 × 2 region where each value is the

farthest (largest) of the four 2 × 2 regions on the left. Finally, the farthest value of the

remaining four z-values is computed. These three maps compose an image pyramid that

is called the hierarchical Z-buﬀer.

14.6. Occlusion Culling 679

For visible octree boxes, testing continues recursively down in the octree,

and ﬁnally potentially visible geometry is rendered into the hierarchical Z-

buﬀer. This is done so that subsequent tests can use the occluding power

of previously rendered objects.

14.6.3 Other Occlusion Culling Techniques

There has been a large body of work on occlusion culling techniques. Much

of it has fallen out of favor, due to the performance of GPUs outpacing

CPUs. As such, a brief tour of some schemes are presented here. These are

worth at least some passing knowledge because architectures are constantly

evolving. With the rise of multicore systems, the CPU-side has additional

resources that are diﬃcult to bring directly to bear on rendering itself.

Using one or more cores to perform cell-based visibility testing or other

schemes is certainly conceivable.

The hierarchical occlusion map (HOM) algorithm [1402, 1403], like hier-

archical Z-buﬀering, is a way of enabling hierarchical image-space culling.

It diﬀers from HZB in that it oﬀers the ability to use approximate occlu-

sion culling. Each frame a hierarchical depth buﬀer is built up for occlusion

testing. An opacity threshold is used at each level to determine if enough

of the object about to be drawn is visible. If only a small portion of the

object is potentially visible then the object is culled. As a CPU-driven

system, this algorithm has fallen out of favor.

The idea of occlusion horizons has been used in computer games since

at least 1995 [864]. Wonka and Schmalstieg [1369] and Downs et al. [277]

discuss two variants. This type of algorithm was often used for eﬃciently

rendering urban environments, that is, cities and villages. By rendering

such scenes front to back, the horizon drawn can be tracked. Any object

found to be both behind and below the current horizon can then be culled.

While its use for cities is dated, Fiedler [342] presents a horizon-based

occlusion culling scheme for mountainous scenes in a GPU-based terrain

rendering system.

The occlusion horizon algorithm is point-based. Cell-based visibility is

sometimes preferable, but is in general much more complex to compute

than point-based visibility. Occluder shrinking is a technique developed

by Wonka et al. [1370] that can use a point-based occlusion algorithm to

generate cell-based visibility. The idea is to extend the validity of point

visibility by shrinking all occluders in the scene by a given amount. They

also present frustum growing [1371, 1372], which is often used in conjunc-

tion with occluder shrinking. Given that the viewer can move and change

orientation at certain speeds, a grown frustum is created that includes all

possible changes in view position and orientation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 14. Acceleration Algorithms (7/11)

Create new playlist

Sign In

Sign Up

Table of Contents for
14. Acceleration Algorithms (7/11)