3. The Graphics Processing Unit (2/5)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

34 3. The Graphics Processing Unit

specular

function

weight of

specular

component

ambient

normal view surface

roughness

weight of

ambient

component

copper

color

final color

float ka=0.5, ks=0.5;

float roughness=0.1;

float intensity;

color copper=(0.8,0.3,0.1);

intensity = ka*ambient() +

ks*specular(normal,view,roughness);

final_color = intensity*copper;

Figure 3.3. Shade tree for a simple copper shader, and its corresponding shader language

program. (After Cook [194].)

very general: the ability to use computation results as texture coordinates

(dependent texture reads), and support for data types with extended range

and precision in textures and color buﬀers. One of the proposed data types

was a novel (at the time) 16-bit ﬂoating point representation. At this

time, no commercially available GPU supported programmable shading,

although most had highly conﬁgurable pipelines [898].

In early 2001, NVIDIA’s GeForce 3 was the ﬁrst GPU to support pro-

grammable vertex shaders [778], exposed through DirectX 8.0 and exten-

sions to OpenGL. These shaders were programmed in an assembly-like lan-

guage that was converted by the drivers into microcode on the ﬂy. Pixel

shaders were also included in DirectX 8.0, but pixel shader SM 1.1 fell

short of actual programmability—the very limited “programs” supported

were converted into texture blending states by the driver, which in turn

wired together hardware “register combiners.” These “programs” were not

only limited in length (12 instructions or less) but also lacked the two el-

ements (dependent texture reads

and ﬂoat data) that Peercy et al. had

identiﬁed as crucial to true programmability.

Shaders at this time did not allow for ﬂow control (branching), so

conditionals had to be emulated by computing both terms and selecting

or interpolating between the results. DirectX deﬁned the concept of a

The GeForce 3 did support a dependent texture read of sorts, but only in an ex-

tremely limited fashion.

3.3. The Evolution of Programmable Shading 35

Shader Model to distinguish hardware with diﬀerent shader capabilities.

The GeForce 3 supported vertex shader model 1.1 and pixel shader model

1.1 (shader model 1.0 was intended for hardware that never shipped). Dur-

ing 2001, GPUs progressed closer to a general pixel shader programming

model. DirectX 8.1 added pixel shader models 1.2 to 1.4 (each meant for

diﬀerent hardware), which extended the capabilities of the pixel shader

further, adding additional instructions and more general support for de-

pendent texture reads.

The year 2002 saw the release of DirectX 9.0 including Shader Model 2.0

(and its extended version 2.X), which featured truly programmable vertex

and pixel shaders. Similar functionality was also exposed under OpenGL

using various extensions. Support for arbitrary dependent texture reads

and storage of 16-bit ﬂoating point values was added, ﬁnally completing

the set of requirements identiﬁed by Peercy et al. in 2000 [993]. Limits

on shader resources such as instructions, textures, and registers were in-

creased, so shaders became capable of more complex eﬀects. Support for

ﬂow control was also added. The growing length and complexity of shaders

made the assembly programming model increasingly cumbersome. For-

tunately, DirectX 9.0 also included a new shader programming language

called HLSL (High Level Shading Language). HLSL was developed by

Microsoft in collaboration with NVIDIA, which released a cross-platform

variant called Cg [818]. Around the same time, the OpenGL ARB (Archi-

tecture Review Board) released a somewhat similar language for OpenGL,

called GLSL [647, 1084] (also known as GLslang). These languages were

heavily inﬂuenced by the syntax and design philosophy of the C program-

ming language and also included elements from the RenderMan Shading

Language.

Shader Model 3.0 was introduced in 2004 and was an incremental im-

provement, turning optional features into requirements, further increas-

ing resource limits and adding limited support for texture reads in vertex

shaders. When a new generation of game consoles was introduced in late

2005 (Microsoft’s Xbox 360) and 2006 (Sony Computer Entertainment’s

PLAYSTATION



3 system), they were equipped with Shader Model 3.0–

level GPUs. The ﬁxed-function pipeline is not entirely dead: Nintendo’s

Wii console shipped in late 2006 with a ﬁxed-function GPU [207]). How-

ever, this is almost certainly the last console of this type, as even mobile de-

vices such as cell phones can use programmable shaders (see Section 18.4.3).

Other languages and environments for shader development are available.

For example, the Sh language [837, 838] allows the generation and combi-

nation [839] of GPU shaders through a C++ library. This open-source

project runs on a number of platforms. On the other end of the spectrum,

several visual programming tools have been introduced to allow artists

(most of whom are not comfortable programming in C-like languages) to

36 3. The Graphics Processing Unit

Figure 3.4. A visual shader graph system for shader design. Various operations are

encapsulated in function boxes, selectable on the left. When selected, each function box

has adjustable parameters, shown on the right. Inputs and outputs for each function

box are linked to each other to form the ﬁnal result, shown in the lower right of the

center frame. (Screenshot from “mental mill,” mental images, inc.)

design shaders. Such tools include visual graph editors used to link prede-

ﬁned shader building blocks, as well as compilers to translate the resulting

graphs to shading languages such as HLSL. A screenshot of one such tool

(mental mill, which is included in NVIDIA’s FX Composer 2) is shown in

Figure 3.4. McGuire et al. [847] survey visual shader programming systems

and propose a high-level, abstract extension of the concept.

The next large step in programmability came in 2007. Shader Model

4.0 (included in DirectX 10.0 [123] and also available in OpenGL via ex-

tensions), introduced several major features, such as the geometry shader

and stream output.

Shader Model 4.0 included a uniform programming model for all shaders

(vertex, pixel and geometry), the common-shader core described earlier.

Resource limits were further increased, and support for integer data types

(including bitwise operations) was added. Shader Model 4.0 also is notable

in that it supports only high-level language shaders (HLSL for DirectX and

GLSL for OpenGL)—there is no user-writable assembly language interface,

such as found in previous models.

GPU vendors, Microsoft, and the OpenGL ARB continue to reﬁne and

extend the capabilities of programmable shading. Besides new versions of

existing APIs, new programming models such as NVIDIA’s CUDA [211]

and AMD’s CTM [994] have been targeted at non-graphics applications.

This area of general-purpose computations on the GPU (GPGPU) is brieﬂy

discussed in Section 18.3.1.

3.3. The Evolution of Programmable Shading 37

3.3.1 Comparison of Shader Models

Although this chapter focuses on Shader Model 4.0 (the newest at time of

writing), often developers need to support hardware that uses older shading

models. For this reason we give a brief comparison between the capabilities

of several recent shading models: 2.0 (and its extended version of 2.X), 3.0

and 4.0.

A listing of all the diﬀerences is beyond the scope of this book;

detailed information is available from the Microsoft Developer Network

(MSDN) and their DirectX SDK [261].

We focus on DirectX here, because of its distinct releases, versus

OpenGL’s evolving levels of extensions, some approved by the OpenGL

Architecture Review Board (ARB), some vendor-speciﬁc. This extension

system has the advantage that cutting-edge features from a speciﬁc inde-

pendent hardware vendor (IHV) can be used immediately. DirectX 9 and

earlier support IHV variations by exposing “capability bits” that can be ex-

amined to see if a GPU supports a feature. With DirectX 10, Microsoft has

moved sharply away from this practice and toward a standardized model

that all IHVs must support. Despite the focus here on DirectX, the fol-

lowing discussion also has relevance to OpenGL, in that the associated

underlying GPUs of the same time periods have the same features.

Table 3.1 compares the capabilities of the various shader models. In

the table, “VS” stands for “vertex shader” and “PS” for “pixel shader”

(Shader Model 4.0 introduced the geometry shader, with capabilities simi-

lar to those of the vertex shader). If neither “VS” nor “PS” appears, the row

applies to both vertex and pixel shaders. Since the virtual machine is 4-way

SIMD, each register can store between one and four independent values.

“Instruction Slots” refers to the maximum number of instructions that the

shader can contain. “Max. Steps Executed” indicates the maximum num-

ber of instructions that can be executed, taking branching and looping into

account. “Temp. Registers” shows the number of general-purpose registers

that are available for storing intermediate results. “Constant Registers”

indicates the number of constant values that can be input to the shader.

“Flow Control, Predication” refers to the ability to compute conditional

expressions and execute loops via branching instructions and predication

(i.e., the ability to conditionally execute or skip an instruction). “Textures”

shows the number of distinct textures (see Chapter 6) that can be accessed

by the shader (each texture may be accessed multiple times). “Integer

Support” refers to the ability to operate on integer data types with bitwise

operators and integer arithmetic. “VS Input Registers” shows the number

of varying input registers that can be accessed by the vertex shader. “In-

terpolator Registers” are output registers for the vertex shader and input

Shader Models 1.0 through 1.4 were early, limited versions that are no longer actively

used.

38 3. The Graphics Processing Unit

SM 2.0/2.X SM 3.0 SM 4.0

Introduced DX 9.0, 2002 DX 9.0c, 2004 DX 10, 2007

VS Instruction Slots 256 ≥ 512

4096

VS Max. Steps Executed 65536 65536 ∞

PS Instruction Slots ≥ 96

≥ 512

≥ 65536

PS Max. Steps Executed ≥ 96

65536 ∞

Temp. Regi sters ≥ 12

32 4096

VS Constant Registers ≥ 256

≥ 256

14 × 4096

PS Constant Registers 32 224 14 × 4096

Flow Control, Predication Optional

Yes Yes

VS Textures None 4

128 × 512

PS Textures 16 16 128 × 512

Integer Support No No Yes

VS Input Registers 16 16 16

Interp olator Registers 8

10 16/32

PS Output Registers 4 4 8

Minimum requirement (more can be used if available).

Minimum of 32 texture and 64 arithmetic instructions.

14 constant buﬀers exposed (+2 private, reserved for Microsoft/IHVs), each of which

can contain a maximum of 4096 constants.

Vertex shaders are required to support static ﬂow control (based on constant values).

SM 3.0 hardware typically has very limited formats and no ﬁltering for vertex

textures.

Up to 128 texture arrays, each of which can contain a maximum of 512 textures.

Not including 2 color interpolators with limited precision and range.

Vertex shader outputs 16 interpolators, which the geometry shader can expand

to 32.

Table 3.1 . Shader capabilities, listed by DirectX shader model version [123, 261, 946,

1055].

registers for the pixel shader. They are so called because the values output

from the vertex shader are interpolated over the triangle before being sent

to the pixel shader. Finally, “PS Output Registers” shows the number of

registers that can be output from the pixel shader—each one is bound to

a diﬀerent buﬀer, or render target.

3.4 The Vertex Shader

The vertex shader is the ﬁrst stage in the functional pipeline shown in

Figure 3.1. While this is the ﬁrst stage that does any graphical processing,

it is worth noting that some data manipulation happens before this stage.

In what DirectX calls the input assembler [123, 261], a number of streams

of data can be woven together to form the sets of vertices and primitives

sent down the pipeline. For example, an object could be represented by one

array of positions and one array of colors. The input assembler would create

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 3. The Graphics Processing Unit (2/5)

Create new playlist

Sign In

Sign Up

Table of Contents for
3. The Graphics Processing Unit (2/5)