i
i
i
i
i
i
i
i
34 3. The Graphics Processing Unit
*
*
+
*
specular
function
weight of
specular
component
ambient
normal view surface
roughness
weight of
ambient
component
copper
color
final color
float ka=0.5, ks=0.5;
float roughness=0.1;
float intensity;
color copper=(0.8,0.3,0.1);
intensity = ka*ambient() +
ks*specular(normal,view,roughness);
final_color = intensity*copper;
Figure 3.3. Shade tree for a simple copper shader, and its corresponding shader language
program. (After Cook [194].)
very general: the ability to use computation results as texture coordinates
(dependent texture reads), and support for data types with extended range
and precision in textures and color buffers. One of the proposed data types
was a novel (at the time) 16-bit floating point representation. At this
time, no commercially available GPU supported programmable shading,
although most had highly configurable pipelines [898].
In early 2001, NVIDIA’s GeForce 3 was the first GPU to support pro-
grammable vertex shaders [778], exposed through DirectX 8.0 and exten-
sions to OpenGL. These shaders were programmed in an assembly-like lan-
guage that was converted by the drivers into microcode on the fly. Pixel
shaders were also included in DirectX 8.0, but pixel shader SM 1.1 fell
short of actual programmability—the very limited “programs” supported
were converted into texture blending states by the driver, which in turn
wired together hardware “register combiners.” These “programs” were not
only limited in length (12 instructions or less) but also lacked the two el-
ements (dependent texture reads
1
and float data) that Peercy et al. had
identified as crucial to true programmability.
Shaders at this time did not allow for flow control (branching), so
conditionals had to be emulated by computing both terms and selecting
or interpolating between the results. DirectX defined the concept of a
1
The GeForce 3 did support a dependent texture read of sorts, but only in an ex-
tremely limited fashion.
i
i
i
i
i
i
i
i
3.3. The Evolution of Programmable Shading 35
Shader Model to distinguish hardware with different shader capabilities.
The GeForce 3 supported vertex shader model 1.1 and pixel shader model
1.1 (shader model 1.0 was intended for hardware that never shipped). Dur-
ing 2001, GPUs progressed closer to a general pixel shader programming
model. DirectX 8.1 added pixel shader models 1.2 to 1.4 (each meant for
different hardware), which extended the capabilities of the pixel shader
further, adding additional instructions and more general support for de-
pendent texture reads.
The year 2002 saw the release of DirectX 9.0 including Shader Model 2.0
(and its extended version 2.X), which featured truly programmable vertex
and pixel shaders. Similar functionality was also exposed under OpenGL
using various extensions. Support for arbitrary dependent texture reads
and storage of 16-bit floating point values was added, finally completing
the set of requirements identified by Peercy et al. in 2000 [993]. Limits
on shader resources such as instructions, textures, and registers were in-
creased, so shaders became capable of more complex effects. Support for
flow control was also added. The growing length and complexity of shaders
made the assembly programming model increasingly cumbersome. For-
tunately, DirectX 9.0 also included a new shader programming language
called HLSL (High Level Shading Language). HLSL was developed by
Microsoft in collaboration with NVIDIA, which released a cross-platform
variant called Cg [818]. Around the same time, the OpenGL ARB (Archi-
tecture Review Board) released a somewhat similar language for OpenGL,
called GLSL [647, 1084] (also known as GLslang). These languages were
heavily influenced by the syntax and design philosophy of the C program-
ming language and also included elements from the RenderMan Shading
Language.
Shader Model 3.0 was introduced in 2004 and was an incremental im-
provement, turning optional features into requirements, further increas-
ing resource limits and adding limited support for texture reads in vertex
shaders. When a new generation of game consoles was introduced in late
2005 (Microsoft’s Xbox 360) and 2006 (Sony Computer Entertainment’s
PLAYSTATION
R
3 system), they were equipped with Shader Model 3.0–
level GPUs. The fixed-function pipeline is not entirely dead: Nintendo’s
Wii console shipped in late 2006 with a fixed-function GPU [207]). How-
ever, this is almost certainly the last console of this type, as even mobile de-
vices such as cell phones can use programmable shaders (see Section 18.4.3).
Other languages and environments for shader development are available.
For example, the Sh language [837, 838] allows the generation and combi-
nation [839] of GPU shaders through a C++ library. This open-source
project runs on a number of platforms. On the other end of the spectrum,
several visual programming tools have been introduced to allow artists
(most of whom are not comfortable programming in C-like languages) to
i
i
i
i
i
i
i
i
36 3. The Graphics Processing Unit
Figure 3.4. A visual shader graph system for shader design. Various operations are
encapsulated in function boxes, selectable on the left. When selected, each function box
has adjustable parameters, shown on the right. Inputs and outputs for each function
box are linked to each other to form the final result, shown in the lower right of the
center frame. (Screenshot from “mental mill,” mental images, inc.)
design shaders. Such tools include visual graph editors used to link prede-
fined shader building blocks, as well as compilers to translate the resulting
graphs to shading languages such as HLSL. A screenshot of one such tool
(mental mill, which is included in NVIDIA’s FX Composer 2) is shown in
Figure 3.4. McGuire et al. [847] survey visual shader programming systems
and propose a high-level, abstract extension of the concept.
The next large step in programmability came in 2007. Shader Model
4.0 (included in DirectX 10.0 [123] and also available in OpenGL via ex-
tensions), introduced several major features, such as the geometry shader
and stream output.
Shader Model 4.0 included a uniform programming model for all shaders
(vertex, pixel and geometry), the common-shader core described earlier.
Resource limits were further increased, and support for integer data types
(including bitwise operations) was added. Shader Model 4.0 also is notable
in that it supports only high-level language shaders (HLSL for DirectX and
GLSL for OpenGL)—there is no user-writable assembly language interface,
such as found in previous models.
GPU vendors, Microsoft, and the OpenGL ARB continue to refine and
extend the capabilities of programmable shading. Besides new versions of
existing APIs, new programming models such as NVIDIA’s CUDA [211]
and AMD’s CTM [994] have been targeted at non-graphics applications.
This area of general-purpose computations on the GPU (GPGPU) is briefly
discussed in Section 18.3.1.
i
i
i
i
i
i
i
i
3.3. The Evolution of Programmable Shading 37
3.3.1 Comparison of Shader Models
Although this chapter focuses on Shader Model 4.0 (the newest at time of
writing), often developers need to support hardware that uses older shading
models. For this reason we give a brief comparison between the capabilities
of several recent shading models: 2.0 (and its extended version of 2.X), 3.0
and 4.0.
2
A listing of all the differences is beyond the scope of this book;
detailed information is available from the Microsoft Developer Network
(MSDN) and their DirectX SDK [261].
We focus on DirectX here, because of its distinct releases, versus
OpenGL’s evolving levels of extensions, some approved by the OpenGL
Architecture Review Board (ARB), some vendor-specific. This extension
system has the advantage that cutting-edge features from a specific inde-
pendent hardware vendor (IHV) can be used immediately. DirectX 9 and
earlier support IHV variations by exposing “capability bits” that can be ex-
amined to see if a GPU supports a feature. With DirectX 10, Microsoft has
moved sharply away from this practice and toward a standardized model
that all IHVs must support. Despite the focus here on DirectX, the fol-
lowing discussion also has relevance to OpenGL, in that the associated
underlying GPUs of the same time periods have the same features.
Table 3.1 compares the capabilities of the various shader models. In
the table, “VS” stands for vertex shader” and “PS” for “pixel shader”
(Shader Model 4.0 introduced the geometry shader, with capabilities simi-
lar to those of the vertex shader). If neither “VS” nor “PS” appears, the row
applies to both vertex and pixel shaders. Since the virtual machine is 4-way
SIMD, each register can store between one and four independent values.
“Instruction Slots” refers to the maximum number of instructions that the
shader can contain. “Max. Steps Executed” indicates the maximum num-
ber of instructions that can be executed, taking branching and looping into
account. Temp. Registers” shows the number of general-purpose registers
that are available for storing intermediate results. “Constant Registers”
indicates the number of constant values that can be input to the shader.
“Flow Control, Predication” refers to the ability to compute conditional
expressions and execute loops via branching instructions and predication
(i.e., the ability to conditionally execute or skip an instruction). “Textures”
shows the number of distinct textures (see Chapter 6) that can be accessed
by the shader (each texture may be accessed multiple times). “Integer
Support” refers to the ability to operate on integer data types with bitwise
operators and integer arithmetic. VS Input Registers” shows the number
of varying input registers that can be accessed by the vertex shader. “In-
terpolator Registers” are output registers for the vertex shader and input
2
Shader Models 1.0 through 1.4 were early, limited versions that are no longer actively
used.
i
i
i
i
i
i
i
i
38 3. The Graphics Processing Unit
SM 2.0/2.X SM 3.0 SM 4.0
Introduced DX 9.0, 2002 DX 9.0c, 2004 DX 10, 2007
VS Instruction Slots 256 512
a
4096
VS Max. Steps Executed 65536 65536
PS Instruction Slots 96
b
512
a
65536
a
PS Max. Steps Executed 96
b
65536
Temp. Regi sters 12
a
32 4096
VS Constant Registers 256
a
256
a
14 × 4096
c
PS Constant Registers 32 224 14 × 4096
c
Flow Control, Predication Optional
d
Yes Yes
VS Textures None 4
e
128 × 512
f
PS Textures 16 16 128 × 512
f
Integer Support No No Yes
VS Input Registers 16 16 16
Interp olator Registers 8
g
10 16/32
h
PS Output Registers 4 4 8
a
Minimum requirement (more can be used if available).
b
Minimum of 32 texture and 64 arithmetic instructions.
c
14 constant buffers exposed (+2 private, reserved for Microsoft/IHVs), each of which
can contain a maximum of 4096 constants.
d
Vertex shaders are required to support static flow control (based on constant values).
e
SM 3.0 hardware typically has very limited formats and no filtering for vertex
textures.
f
Up to 128 texture arrays, each of which can contain a maximum of 512 textures.
g
Not including 2 color interpolators with limited precision and range.
h
Vertex shader outputs 16 interpolators, which the geometry shader can expand
to 32.
Table 3.1 . Shader capabilities, listed by DirectX shader model version [123, 261, 946,
1055].
registers for the pixel shader. They are so called because the values output
from the vertex shader are interpolated over the triangle before being sent
to the pixel shader. Finally, “PS Output Registers” shows the number of
registers that can be output from the pixel shader—each one is bound to
a different buffer, or render target.
3.4 The Vertex Shader
The vertex shader is the first stage in the functional pipeline shown in
Figure 3.1. While this is the first stage that does any graphical processing,
it is worth noting that some data manipulation happens before this stage.
In what DirectX calls the input assembler [123, 261], a number of streams
of data can be woven together to form the sets of vertices and primitives
sent down the pipeline. For example, an object could be represented by one
array of positions and one array of colors. The input assembler would create
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset