i
i
i
i
i
i
i
i
44 3. The Graphics Processing Unit
tems [31]. The ability to access gradient information is a unique capability
of the pixel shader, not shared by any of the other programmable shader
stages.
Pixel shader programs typically set the fragment color for merging in
the final merging stage. The depth value generated in the rasterization
stage can also be modified by the pixel shader. The stencil buffer value is
not modifiable, but rather is passed through to the merge stage. In SM 2.0
and on, a pixel shader can also discard incoming fragment data, i.e., gen-
erate no output. Such operations can cost performance, as optimizations
normally performed by the GPU cannot then be used. See Section 18.3.7
for details. Operations such as fog computation and alpha testing have
moved from being merge operations to being pixel shader computations in
SM 4.0 [123].
Current pixel shaders are capable of doing a huge amount of processing.
The ability to compute any number of values in a single rendering pass gave
rise to the idea of multiple render targets (MRT). Instead of saving results
of a pixel shader’s program to a single color buffer, multiple vectors could
be generated for each fragment and saved to different buffers. These buffers
must be the same dimensions, and some architectures require them each to
have the same bit depth (though with different formats, as needed). The
number of PS output registers in Table 3.1 refers to the number of separate
buffers accessible, i.e., 4 or 8. Unlike the displayable color buffer, there are
other limitations on any additional targets. For example, typically no anti-
aliasing can be performed. Even with these limitations, MRT functionality
is a powerful aid in performing rendering algorithms more efficiently. If a
number of intermediate results images are to be computed from the same
set of data, only a single rendering pass is needed, instead of one pass per
output buffer. The other key capability associated with MRTs is the ability
to read from these resulting images as textures.
3.7 The Merging Stage
As discussed in Section 2.4.4, the merging stage is where the depths and
colors of the individual fragments (generated in the pixel shader) are com-
bined with the frame buffer. This stage is where stencil-buffer and Z-buffer
operations occur. Another operation that takes place in this stage is color
blending, which is most commonly used for transparency and compositing
operations (see Section 5.7).
The merging stage occupies an interesting middle point between the
fixed-function stages, such as clipping, and the fully programmable shader
stages. Although it is not programmable, its operation is highly config-
urable. Color blending in particular can be set up to perform a large
i
i
i
i
i
i
i
i
3.8. Effects 45
number of different operations. The most common are combinations of
multiplication, addition, and subtraction involving the color and alpha val-
ues, but other operations are possible, such as minimum and maximum,
as well as bitwise logic operations. DirectX 10 added the capability to
blend two colors from the pixel shader with the frame buffer color—this
capability is called dual-color blending.
If MRT functionality is employed, then blending can be performed on
multiple buffers. DirectX 10.1 introduced the capability to perform dif-
ferent blend operations on each MRT buffer. In previous versions, the
same blending operation was always performed on all buffers (note that
dual-color blending is incompatible with MRT).
3.8 Effects
This tour of the pipeline has focused so far on the various programmable
stages. While vertex, geometry, and pixel shader programs are necessary
to control these stages, they do not exist in a vacuum. First, an individual
shader program is not particularly useful in isolation: A vertex shader pro-
gram feeds its results to a pixel shader. Both programs must be loaded for
any work to be done. The programmer must perform some matching of the
outputs of the vertex shader to the inputs of the pixel shader. A particu-
lar rendering effect may be produced by any number of shader programs
executed over a few passes. Beyond the shader programs themselves, state
variables must sometimes be set in a particular configuration for these pro-
grams to work properly. For example, the renderer’s state includes whether
and how the Z-buffer and stencil buffer are each used, and how a fragment
affects the existing pixel value (e.g., replace, add, or blend).
For these reasons, various groups have developed effects languages, such
as HLSL FX, CgFX, and COLLADA FX. An effect file attempts to encap-
sulate all the relevant information needed to execute a particular rendering
algorithm [261, 974]. It typically defines some global arguments that can
be assigned by the application. For example, a single effect file might define
the vertex and pixel shaders needed to render a convincing plastic material.
It would expose arguments such as the plastic color and roughness so that
these could be changed for each model rendered, but using the same effect
file.
To show the flavor of an effect file, we will walk through a trimmed-
down example taken from NVIDIA’s FX Composer 2 effects system. This
DirectX 9 HLSL effect file implements a very simplified form of Gooch
shading [423]. One part of Gooch shading is to use the surface normal and
compare it to the light’s location. If the normal points toward the light, a
warm tone is used to color the surface; if it points away, a cool tone is used.
i
i
i
i
i
i
i
i
46 3. The Graphics Processing Unit
Figure 3.8. Gooch shading, varying from a warm orange to a cool blue. (Image produced
by FX Composer 2, courtesy of NVIDIA Corporation.)
Angles in between interpolate between these two user-defined colors. This
shading technique is a form of non-photorealistic rendering, the subject of
Chapter 11. An example of this effect in action is shown in Figure 3.8.
Effect variables are defined at the beginning of the effect file. The
first few variables are “untweakables,” parameters related to the camera
position that are automatically tracked for the effect:
float4x4 WorldXf : World;
float4x4 WorldITXf : WorldInverseTranspose;
float4x4 WvpXf : WorldViewProjection;
The syntax is type id : semantic.Thetypefloat4x4 is used for matrices,
the name is user defined, and the semantic is a built-in name. As the se-
mantic names imply, the WorldXf is the model-to-world transform matrix,
the WorldITXf is the inverse transpose of this matrix, and the WvpXf is
the matrix that transforms from model space to the camera’s clip space.
These values with recognized semantics are expected to be provided by the
application and not shown in the user interface.
Next, the user-defined variables are specified:
float3 Lamp0Pos : Position <
string Object = "PointLight0";
string UIName = "Lamp 0 Position";
string Space = "World";
> = {-0.5f, 2.0f, 1.25f};
float3 WarmColor <
string UIName = "Gooch Warm Tone";
string UIWidget = "Color";
> = {1.3f, 0.9f, 0.15f};
i
i
i
i
i
i
i
i
3.8. Effects 47
float3 CoolColor <
string UIName = "Gooch Cool Tone";
string UIWidget = "Color";
> = {0.05f, 0.05f, 0.6f};
Here some additional annotations are provided inside the angle brackets
<> and then default values are assigned. The annotations are application-
specific and have no meaning to the effect or to the shader compiler. Such
annotations can be queried by the application. In this case the annotations
describe how to expose these variables within the user interface.
Data structures for shader input and output are defined next:
struct appdata {
float3 Position : POSITION;
float3 Normal : NORMAL;
};
struct vertexOutput {
float4 HPosition : POSITION;
float3 LightVec : TEXCOORD1;
float3 WorldNormal : TEXCOORD2;
};
The appdata defines what data is at each vertex in the model and so
defines the input data for the vertex shader program. The vertexOutput
is what the vertex shader produces and the pixel shader consumes. The
use of TEXCOORD* as the output names is an artifact of the evolution of the
pipeline. At first, multiple textures could be attached to a surface, so these
additional datafields are called texture coordinates. In practice, these fields
hold any data that is passed from the vertex to the pixel shader.
Next, the various shader program code elements are defined. We have
only one vertex shader program:
vertexOutput std_VS(appdata IN) {
vertexOutput OUT;
float4 No = float4(IN.Normal,0);
OUT.WorldNormal = mul(No,WorldITXf).xyz;
float4 Po = float4(IN.Position,1);
float4 Pw = mul(Po,WorldXf);
OUT.LightVec = (Lamp0Pos - Pw.xyz);
OUT.HPosition = mul(Po,WvpXf);
return OUT;
}
i
i
i
i
i
i
i
i
48 3. The Graphics Processing Unit
This program first computes the surface’s normal in world space by using
a matrix multiplication. Transforms are the subject of the next chapter,
so we will not explain why the inverse transpose is used here. The position
in world space is also computed by applying the offscreen transform. This
location is subtracted from the light’s position to obtain the direction vector
from the surface to the light. Finally, the object’s position is transformed
into clip space, for use by the rasterizer. This is the one required output
from any vertex shader program.
Given the light’s direction and the surface normal in world space, the
pixel shader program computes the surface color:
float4 gooch_PS(vertexOutput IN) : COLOR
{
float3 Ln = normalize(IN.LightVec);
float3 Nn = normalize(IN.WorldNormal);
float ldn = dot(Ln,Nn);
float mixer = 0.5 * (ldn + 1.0);
float4 result = lerp(CoolColor, WarmColor, mixer);
return result;
}
The vector Ln is the normalized light direction and Nn the normalized
surface normal. By normalizing, the dot product ldn of these two vectors
then represents the cosine of the angle between them. We want to linearly
interpolate between the cool and warm tones using this value. The function
lerp() expects a mixer value between 0 and 1, where 0 means to use the
CoolColor,1theWarmColor, and values in between to blend the two. Since
the cosine of an angle gives a value from [1, 1], the mixer value transforms
this range to [0, 1]. This value then is used to blend the tones and produce
a fragment with the proper color. These shaders are functions. An effect
file can consist of any number of functions and can include commonly used
functions from other effects files.
A pass typically consists of a vertex and pixel (and geometry) shader,
6
along with any state settings needed for the pass. A technique is a set of
one or more passes to produce the desired effect. This simple file has one
technique, which has one pass:
technique Gooch < string Script = "Pass=p0;"; > {
pass p0 < string Script = "Draw=geometry;"; > {
VertexShader = compile vs_2_0 std_VS();
PixelShader = compile ps_2_a gooch_PS();
6
A pass can also have no shaders and control the fixed-function pipeline, in DirectX
9 and earlier.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset