i
i
i
i
i
i
i
i
858 18. Graphics Hardware
Another technique with a similar name, and similar intent, is the “early
z pass” method. This is a software technique, the idea being to render
geometry once and establish the Z-buffer depths. See Section 7.9.2.
18.3.8 PCU: Programmable Culling Unit
As seen in the previous section, Z-culling has to be disabled by the hard-
ware if the fragment shader writes out a custom depth per pixel. Render-
ing algorithms such as relief mapping and evaluating the sphere equation
inside a quad are two examples where the shader modifies the z-depth.
Blythe [123] mentions that getting good performance relies on being able
to predict the shader output (on a tile basis). One reason why a fully
programmable output merger, i.e., blending, depth testing, stencil testing,
etc, was not included in DirectX 10 is that Z-culling is disabled when the
shader writes to the depth, for example.
The programmable culling unit (PCU) [513] solves this problem, but is
more general than that. The core idea is to run parts of the fragment shader
over an entire tile of pixels, and determine, for example, a lower bound on
the z-values, i.e., z
tri
min
(Section 18.3.7). In this example, the lower bound
can be used for Z-culling. Another example would be to determine if all
the pixels in a tile are completely in shadow, i.e., totally black. If this
is true, then the GPU can avoid executing large parts (or sometimes the
entire) fragment shader program for the current tile.
To compute these conservative bounds (e.g., z
tri
min
), the fragment shader
instructions are executed using interval arithmetic instead of floating-point
numbers. An interval, ˆa =[a
, a], is simply an interval of numbers, starting
from a
to a. An example is ˆa =[−1, 2], which is the entire set of numbers
from −1 to 2. Arithmetic operations can also be defined to operate on
intervals [900]. Addition of two intervals results in an interval including
all possible combinations from the two interval operands. This can be
expressed as: ˆa +
ˆ
b =[a
+ b, a + b]. As an example, [−1, 2] + [4, 5] = [3, 7].
If the fragment shader program contains a KIL instruction, all instruc-
tions that depend on this KIL are extracted into a cull shader program,
which will be executed once for the entire tile. All arithmetic operations
work on intervals in this cull shader, and all varying operands are intervals
as well. When the interval operand of the KIL instruction has been com-
puted, it is known whether all fragments in the tile can be killed without
executing the fragment shader for each fragment in the tile. Before start-
ing the execution of the cull shader, the varying input has to be converted
into intervals, and this is done using techniques similar to the one used for
computing z
tri
min
in the previous section.
As a simple example, imagine that the shader program computes diffuse
shading as d = l · n. It then tests this dot product to see if it is less than