i
i
i
i
i
i
i
i
14.6. Occlusion Culling 675
for more on latency). Hence, this GPU-based occlusion culling method is
worthwhile when the bounding boxes contain a large number of objects
and a relatively large amount of occlusion is occurring.
On HP’s VISUALIZE fx hardware (circa 2000), bounding boxes are
created automatically and queried for sufficiently long display lists [213].
It has been shown that using tighter bounding volumes can speed up ren-
dering. Bartz et al. [69] achieved a 50 percent increase in frame rate using
k-DOPs (26-DOPs) for mechanical CAD models (where the interiors of the
objects often are complex).
The GPU’s occlusion query has been used as the basic building block
for a number of algorithms. Meißner et al. [851] use occlusion queries in a
hierarchical setting. The scene is represented in a hierarchical data struc-
ture, such as a BVH or octree. First, view frustum culling is used to find
nodes not fully outside the view frustum. These are sorted, based on a
node’s center point (for example, the center of a bounding box), in front-
to-back order. The nearest leaf node is rendered without occlusion testing
to the frame buffer. Using the occlusion query, the BVs of subsequent
objects are tested. If a BV is visible, its contents are tested recursively,
or rendered. Klosowski and Silva have developed a constant-frame-rate al-
gorithm using an occlusion culling algorithm, called the prioritized-layered
projection algorithm [674]. However, at first this was not a conservative
algorithm, i.e., it sacrificed image quality in order to keep a constant frame
rate. Later, they developed a conservative version of the algorithm, using
occlusion queries [675].
The research just described was done with graphics hardware that had a
serious limitation: When an occlusion query was made (which was limited
to a boolean), the CPU’s execution was stalled until the query result was
returned. Modern GPUs have adopted a model in which the CPU can send
off any number of queries to the GPU, then periodically check to see if any
results are available. For its part, the GPU performs each query and puts
the result in a queue. The queue check by the CPU is extremely fast, and
theCPUcancontinuetosenddownqueries or actual renderable objects
without having to stall.
How to effectively use this model of a queue of queries is an active area
of research. The problem is that a query must be done after some actual
geometry has been rendered to the screen, so that the bounding box tested
is more likely to be occluded, but not so late in the rendering process
that the CPU is left waiting for the results of the query. Among other
optimizations, Sekulic [1145] recommends taking advantage of temporal
coherence. Objects are tested for occlusion, but the results are not checked
until the next frame. If an object was found to be occluded in the previous
frame, it is not rendered this frame, but it is tested for occlusion again.
This gives an approximate occlusion test, since an object could be visible
i
i
i
i
i
i
i
i
676 14. Acceleration Algorithms
but not rendered for a frame, but with a high frame rate, the problem is
not that serious.
Objects that were visible in the previous frame are handled in a clever
way. First, the object itself is used for the query: The object is rendered
normally, while also issuing a query for visibility. This technique avoids
having to render the test bounding box itself. Next, visible objects do not
need to all issue queries every frame. By having a visible object tested for
occlusion every few frames, fewer occlusion queries need to be dealt with
each frame, at the small cost of occasionally rendering an object that could
have been found to be occluded.
Bittner and Wimmer [92, 1360] wish to avoid any errors from temporal
coherence, while reaping its benefits. Their approach starts with travers-
ing a hierarchical efficiency structure, rendering objects in a front-to-back
order. Only leaf nodes hold objects in their scheme. Leaf nodes and previ-
ously hidden internal nodes are tested for occlusion. Visibility states from
frame to frame are compared, with changes potentially causing further hi-
erarchy updates and leaf node renderings. Despite the fact that internal
nodes are also sometimes queried using this approach, the authors’ ap-
proach guarantees that the total number of queries will never be greater
than the number of leaf nodes. The major benefit of their technique is
that much rendering time can be saved when internal nodes are found to
be occluded.
The number of queries that are worth performing in a frame is limited,
and each query costs some time. As such, queries should be performed
on objects most likely to be occluded. Koval`ık and Sochor [694] collect
running statistics on queries over a number of frames for each object while
the application is running. The number of frames in which an object was
found to be hidden affects how often it is tested for occlusion in the future.
That is, objects that are visible are likely to stay visible, and so can be
tested less frequently. Hidden objects get tested every frame, if possible,
since these objects are most likely to benefit from occlusion queries. Guthe
et al. [470] likewise explore performing a running statistical analysis, and
their system adapts to the underlying GPU.
The schemes discussed here give a flavor of the potential and problems
with occlusion culling methods. When to use occlusion queries, or most
occlusion schemes in general, is not often clear. If everything is visible, an
occlusion algorithm can only cost additional time, never save it. Rapidly
determining that the algorithm is not helping, and so cutting back on
its fruitless attempts to save time, is one challenge. Another problem is
deciding what set of objects to use as occluders. The first objects rendered
that are inside the frustum must be visible, so spending queries on these is
wasteful. This problem of choosing occluders is a challenge for most of the
algorithms in this section.
i
i
i
i
i
i
i
i
14.6. Occlusion Culling 677
Nonetheless, hardware occlusion query systems have proven to be use-
ful, especially when depth complexity is high. Statistical methods track
whether and when occlusion testing is likely to be effective, so they have
good adaptive properties. Systems based on this basic building block of
the occlusion query are by far the most commonly used occlusion culling
schemesusedwithmodernGPUs.Therest of this section will therefore be
limited to discussion of one algorithm that has been important in its effect
on the GPU’s architecture, with a brief survey of the rest.
14.6.2 Hierarchical Z -Buffering
Hierarchical Z-buffering (HZB), an algorithm developed by Greene et al.,
[450, 452] has had a significant influence on occlusion culling research.
Though the CPU-side form presented here is rarely used, the algorithm
is the basis for the GPU hardware method of Z-culling (Section 18.3.7).
The algorithm maintains the scene model in an octree, and a frame’s
Z-buffer as an image pyramid, which we call a Z-pyramid—the algorithm
thus operates in image space. The octree enables hierarchical culling of
occluded regions of the scene, and the Z-pyramid enables hierarchical Z-
buffering of individual primitives and bounding volumes. The Z-pyramid
is thus the occlusion representation of this algorithm. Examples of these
data structures are shown in Figure 14.19. Any method can be employed
for organizing scene primitives in an octree, although Greene et al. [450]
recommend a specific algorithm that avoids assigning small primitives to
large octree nodes.
Now we will describe how the Z-pyramid is maintained and how it is
used to accelerate culling. The finest (highest-resolution) level of the Z-
pyramid is simply a standard Z-buffer. At all other levels, each z-value is
the farthest z in the corresponding 2 ×2 window of the adjacent finer level.
Therefore, each z-value represents the farthest z for a square region of the
screen. Whenever a z-value is overwritten in the Z-buffer, it is propagated
through the coarser levels of the Z-pyramid. This is done recursively until
the top of the image pyramid is reached, where only one z-value remains.
Pyramid formation is illustrated in Figure 14.20.
Hierarchical culling of octree nodes is done as follows. Traverse the oc-
tree nodes in a rough front-to-back order. A bounding box of the octree
is tested against the Z-pyramid using an extended occlusion query (Sec-
tion 14.6.1). We begin testing at the coarsest Z-pyramid cell that encloses
the box’s screen projection. The box’s nearest depth within the cell (z
near
)
is then compared to the Z-pyramid value, and if z
near
is farther, the box
is known to be occluded. This testing continues recursively down the Z-
pyramid until the box is found to be occluded, or until the bottom level of
the Z-pyramid is reached, at which point the box is known to be visible.
i
i
i
i
i
i
i
i
678 14. Acceleration Algorithms
Figure 14.19. Example of occlusion culling with the HZB algorithm [450, 452], showing
a complex scene (lower right) with the corresponding Z-pyramid (on the left), and
octree subdivision (upper right). By traversing the octree from front-to-back and culling
occluded octree nodes as they are encountered, this algorithm visits only visible octree
nodes and their children (the nodes portrayed at the upper right) and renders only the
polygons in visible boxes. In this example, culling of occluded octree nodes reduces the
depth complexity from 84 to 2.5. (Image courtesy of Ned Greene/Apple Computer.)
Figure 14.20. On the left, a 4 × 4 piece of the Z-buffer is shown. The numerical values
are the actual z-values. This is downsampled to a 2 × 2 region where each value is the
farthest (largest) of the four 2 × 2 regions on the left. Finally, the farthest value of the
remaining four z-values is computed. These three maps compose an image pyramid that
is called the hierarchical Z-buffer.
i
i
i
i
i
i
i
i
14.6. Occlusion Culling 679
For visible octree boxes, testing continues recursively down in the octree,
and finally potentially visible geometry is rendered into the hierarchical Z-
buffer. This is done so that subsequent tests can use the occluding power
of previously rendered objects.
14.6.3 Other Occlusion Culling Techniques
There has been a large body of work on occlusion culling techniques. Much
of it has fallen out of favor, due to the performance of GPUs outpacing
CPUs. As such, a brief tour of some schemes are presented here. These are
worth at least some passing knowledge because architectures are constantly
evolving. With the rise of multicore systems, the CPU-side has additional
resources that are difficult to bring directly to bear on rendering itself.
Using one or more cores to perform cell-based visibility testing or other
schemes is certainly conceivable.
The hierarchical occlusion map (HOM) algorithm [1402, 1403], like hier-
archical Z-buffering, is a way of enabling hierarchical image-space culling.
It differs from HZB in that it offers the ability to use approximate occlu-
sion culling. Each frame a hierarchical depth buffer is built up for occlusion
testing. An opacity threshold is used at each level to determine if enough
of the object about to be drawn is visible. If only a small portion of the
object is potentially visible then the object is culled. As a CPU-driven
system, this algorithm has fallen out of favor.
The idea of occlusion horizons has been used in computer games since
at least 1995 [864]. Wonka and Schmalstieg [1369] and Downs et al. [277]
discuss two variants. This type of algorithm was often used for efficiently
rendering urban environments, that is, cities and villages. By rendering
such scenes front to back, the horizon drawn can be tracked. Any object
found to be both behind and below the current horizon can then be culled.
While its use for cities is dated, Fiedler [342] presents a horizon-based
occlusion culling scheme for mountainous scenes in a GPU-based terrain
rendering system.
The occlusion horizon algorithm is point-based. Cell-based visibility is
sometimes preferable, but is in general much more complex to compute
than point-based visibility. Occluder shrinking is a technique developed
by Wonka et al. [1370] that can use a point-based occlusion algorithm to
generate cell-based visibility. The idea is to extend the validity of point
visibility by shrinking all occluders in the scene by a given amount. They
also present frustum growing [1371, 1372], which is often used in conjunc-
tion with occluder shrinking. Given that the viewer can move and change
orientation at certain speeds, a grown frustum is created that includes all
possible changes in view position and orientation.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset