by Benjamin Lipchak
WHAT YOU'LL LEARN IN THIS CHAPTER:
How To | Functions You'll Use |
---|---|
Specify shader text |
|
Switch between shaders |
|
Create and delete shaders |
|
Set program parameters |
|
Query program parameters |
|
Set vertex attributes |
|
| |
Query vertex attributes |
|
Query program object state |
|
Low-level shaders provide relatively direct access to the current generation of underlying shader hardware. Every cycle of the shader can be scheduled with a vector instruction that operates on four components at a time. Low-level vertex and fragment shaders use the same commands for loading and managing shaders, and they offer nearly the same instruction sets. Their biggest difference is in their inputs and outputs.
As described in Chapter 19, “Programmable Pipeline: This Isn't Your Father's OpenGL,” vertex shaders take unprocessed vertices and their attributes (position, normal, colors, texture coordinates, and so on), and output clip-space position and new processed attributes (colors, texture coordinates, and so on). Fragment shaders, on the other hand, take fragments and their associated data as input, and they output a final fragment color and possibly a new depth.
We could devote an entire book to shaders, but we'll try to cover all the most important aspects in this chapter. Feel free to consult the extension specifications (GL_ARB_vertex_program
and GL_ARB_fragment_program
) as a complete reference. After you read this chapter, those specs may actually be decipherable!
In the following sections, we describe mostly what goes on inside the shader. First, though, you need to load shaders into OpenGL and be able to turn shaders on and off and switch between them. Then you can start writing shaders.
Like texture objects, buffer objects, occlusion queries, and other OpenGL objects, low-level shaders are loaded into objects, too—program objects in this case. First, you generate an unused program object name and then create it by binding it for the first time:
// Create shader objects, set shaders glGenProgramsARB(2, ids); glBindProgramARB(GL_VERTEX_PROGRAM_ARB, ids[0]); glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, ids[1]);
In its initial state, the program object has no shader associated with it. If you try enabling it and drawing something, an error is thrown. So now you're ready to load up some real shaders.
You pass shaders into OpenGL as ASCII strings via glProgramStringARB
, which takes a shader type, format, length, and pointer to the string containing the shader text:
glProgramStringARB(GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(vpString), vpString); glProgramStringARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(fpString), fpString);
The first argument indicates whether you're replacing the currently bound low-level vertex shader or fragment shader.
The format argument is used just for future expandability, such as to accept a possible Unicode or binary bytecode representation. Currently, GL_PROGRAM_FORMAT_ASCII_ARB
is the only game in town.
Your string need not be null-terminated. glProgramStringARB
looks only at the number of characters you tell it to in the third argument, and those characters should all be part of the actual shader text. A null terminator, if it is present, should not be included in the length argument you pass in. Conveniently, the standard C string function strlen
behaves this way, returning the length of a string minus its terminator.
When OpenGL receives the glProgramStringARB
command, it proceeds to parse the shader. If all goes well, your shader is compiled and optimized as necessary for the underlying hardware and is ready for rendering when you enable vertex and/or fragment shading.
If you have any syntax or semantic errors, or if the shader is too complex for the implementation to handle, an error is thrown. In this case, the currently bound shader is not replaced, and whatever shader was there before (if any) remains in place. You can find out where and why a problem occurred by querying for the error position and error string.
The error position is the byte offset into the shader string where the error occurred. If no error occurs, you get back –1. If the error is a semantic restriction that can be discovered only after the whole shader is parsed (for example, trying to use the same texture unit for both 2D and 3D texturing), the error position is set to the length of the shader.
The error string is the most useful way to diagnose your problem. It tells you the type of error that occurred and may provide additional hints, such as the line number where the error occurred. Using the error string to find the error in your shader is a lot easier than using the error position and trying to count out hundreds of characters by hand!)
Listing 20.1 shows the code used to set up low-level shaders.
Example 20.1. Setting Up Low-Level Shaders
// Create, set and enable shaders glGenProgramsARB(2, ids); glBindProgramARB(GL_VERTEX_PROGRAM_ARB, ids[0]); glProgramStringARB(GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(vpString), vpString); glGetIntegerv(GL_PROGRAM_ERROR_POSITION_ARB, &errorPos); if (errorPos != -1) { fprintf(stderr, "Error in vertex shader at position %d! ", errorPos); fprintf(stderr, "Error string: %s ", glGetString(GL_PROGRAM_ERROR_STRING_ARB)); Sleep(5000); exit(0); } glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, ids[1]); glProgramStringARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_FORMAT_ASCII_ARB, strlen(fpString), fpString); glGetIntegerv(GL_PROGRAM_ERROR_POSITION_ARB, &errorPos); if (errorPos != -1) { fprintf(stderr, "Error in fragment shader at position %d! ", errorPos); fprintf(stderr, "Error string: %s ", glGetString(GL_PROGRAM_ERROR_STRING_ARB)); Sleep(5000); exit(0); } if (useVertexShader) glEnable(GL_VERTEX_PROGRAM_ARB); if (useFragmentShader) glEnable(GL_FRAGMENT_PROGRAM_ARB);
Try adding a typographical error into one of the shaders loaded by the sample code in Listing 20.1. See how well the error string on your OpenGL implementation helps you narrow down the problem.
With shaders, just like other OpenGL objects, you need to clean up after yourself when you're done. Deleting your shaders frees the resources and makes the names available for use again later:
glDeleteProgramsARB(2, ids);
One more detail stands between us and diving into the actual shaders. Low-level vertex and fragment shaders are not part of core OpenGL. In previous chapters, we covered functionality that used to be extensions but have since been promoted to the core. However, with high-level shaders gaining popularity, these low-level shaders will likely never be promoted and will live their lives forever as ARB-approved extensions.
Their extension status does not make much difference when it comes to using them. At least on Windows platforms, any functionality more recent than OpenGL 1.1, whether core or extension, requires its entrypoint function pointers to be queried before use. Before doing that, you must check also for the presence of the extensions in the extension string. The sample code also uses the secondary color feature, which either requires OpenGL 1.4 or the GL_EXT_secondary_color
extension. Listing 20.2 shows how to check whether an OpenGL implementation supports the required features.
Example 20.2. Checking for the Presence of OpenGL Features
// Make sure required functionality is available! if (!gltIsExtSupported("GL_ARB_vertex_program")) { fprintf(stderr, "GL_ARB_vertex_program extension is unavailable! "); Sleep(2000); exit(0); } if (!gltIsExtSupported("GL_ARB_fragment_program")) { fprintf(stderr, "GL_ARB_fragment_program extension is unavailable! "); Sleep(2000); exit(0); } version = glGetString(GL_VERSION); if (((version[0] != '1') || (version[1] != '.') || (version[2] < '4') || (version[2] > '9')) && // 1.4+ (!gltIsExtSupported("GL_EXT_secondary_color"))) { fprintf(stderr, "Neither OpenGL 1.4 nor GL_EXT_secondary_color" " extension is available! "); Sleep(2000); exit(0); } glGenProgramsARB = gltGetExtensionPointer("glGenProgramsARB"); glBindProgramARB = gltGetExtensionPointer("glBindProgramARB"); glProgramStringARB = gltGetExtensionPointer("glProgramStringARB"); glDeleteProgramsARB = gltGetExtensionPointer("glDeleteProgramsARB"); if (gltIsExtSupported("GL_EXT_secondary_color")) glSecondaryColor3f = gltGetExtensionPointer("glSecondaryColor3fEXT"); else glSecondaryColor3f = gltGetExtensionPointer("glSecondaryColor3f"); if (!glGenProgramsARB || !glBindProgramARB || !glProgramStringARB || !glDeleteProgramsARB || !glSecondaryColor3f) { fprintf(stderr, "Not all entrypoints were available! "); Sleep(2000); exit(0); }
Each cycle, or instruction slot, of a low-level vertex or fragment shader can have a different instruction opcode, such as ADD
, MUL
, or MOV
, to perform addition, multiplication, or copy, respectively. These opcodes are your basic shader building blocks. Most instruction opcodes are followed by a single output and one or more inputs, separated by commas. Each instruction ends with a semicolon.
MUL myResult, myTemp, 2.0; # multiplies each component of myTemp by 2, # stores in myResult
With few exceptions, the vertex shader and fragment shader instruction sets are identical. In the following sections, we first discuss the significant overlap between the two, and then we deal with the instructions that are specific to one shader type or the other.
The instruction set can be categorized by the number of input arguments, whether the inputs are vector or scalar, and whether the result is vector or scalar. Vector in this case means a four-component vector, whereas a scalar is a single component. Table 20.1 categorizes all the instructions that are common to both vertex shaders and fragment shaders.
Table 20.1. Common Instruction Set
All instructions that output a vector operate independently on each component of the vector. For example, MUL
actually performs four independent multiplication operations:
MUL myResult, myTemp1, myTemp2; # This is the same as: # myResult.x = myTemp1.x * myTemp2.x # myResult.y = myTemp1.y * myTemp2.y # myResult.z = myTemp1.z * myTemp2.z # myResult.w = myTemp1.w * myTemp2.w
On the other hand, instructions that output a single scalar actually replicate that scalar to all components of the result vector:
RCP myResult, myTemp.x; # This is the same as: # myResult.x = myResult.y = myResult.z = myResult.w = 1.0 / myTemp.x
Only three instructions are specific to low-level vertex shaders: ARL
, EXP
, and LOG
. Table 20.2 describes these instructions.
Table 20.2. Vertex-Specific Instruction Set
The ARL
instruction is a special-purpose instruction used to load an address register, which is a single-component signed integer register type used for relative addressing. Before the address register is loaded, the address is floored so that it becomes the greatest integer less than or equal to the scalar input. We discuss relative addressing in a later section, “Addresses.”
The EXP
and LOG
instructions are lower precision approximations of their EX2
and LG2
counterparts, except they put the result only in the third component of the result vector. The first two components are filled with some other marginally useful approximation factors, and the fourth component is 1. Because these instructions provide no additional benefit except possibly improved performance on some OpenGL implementations, they were removed from the fragment shader instruction set.
After the low-level vertex program extension was approved by the ARB, work began on a counterpart fragment program extension. All the instructions from the vertex extension were considered for inclusion, and only the three listed in Table 20.2 were removed.
Relative addressing within fragment shaders is not yet a feature available in today's hardware, so the ARL
instruction was not included. Also, the low-precision EXP
and LOG
instructions were not particularly interesting compared to the full-precision versions, so they were dropped as well.
Quite a number of instructions particularly useful in the fragment domain were added. Table 20.3 lists them.
Table 20.3. Fragment-Specific Instruction Set
Fragment shaders introduce a new type of instruction: the texture instruction. The rest of the instructions fall into the ALU category because they perform arithmetic operations. TEX
, TXB
, and TXP
are the three instructions that fall nicely into this new texture category.
Each of these first three texture instructions performs a texture lookup on the specified texture target (1D, 2D, 3D, CUBE) of the specified texture unit. For example, to sample from the cube map on texture unit 0, you use
TEX myResult, myTexCoord, texture[0], CUBE;
KIL
, a unique instruction that can be used to stop all further fragment shader execution and discard the fragment, actually falls into the texture instruction category, too. It doesn't perform a texture lookup, but it may be implemented in hardware using the same resources.
The instruction set is great, but those opcodes won't do a thing for you without data to operate on and a place to store the result. The six different types of variables described in Table 20.4 can be used as inputs and/or outputs in your low-level shaders.
Table 20.4. Variable Types
All variables represent four-component floating-point vectors, except for address registers, which are signed integers.
Temporaries are the main work horses of low-level shaders. Unless you're writing to an output register, chances are you're writing to a temp. Before you can use a variable as a temp, you have to declare it. You can declare multiple temps at the same time if you want:
TEMP diffuseColor, specColor, myTexCoord;
Now you can use these variable names as inputs or outputs of any instruction.
Parameters are variables that never change during each run of a shader. You can think of them as constants, except that some parameters actually can be changed outside the shader. In any case, you can't write to a parameter during shader execution. You can use them only as instruction inputs. The three types of parameters are inline constants, state-bound parameters, and program parameters. Parameters can be declared with variable names, but they don't have to be, as we illustrate in the following sections.
Inline constants really are constants. They're set to specific values within the text of the shader, and they can never change. You can either set all four components of the vector to the same value, set each component to a unique value, or have unspecified components filled out with default values:
PARAM two = 2.0; # all 4 components contain 2 PARAM quarters = { 0.0, 0.25, 0.5, 0.75 }; # 4 unique values PARAM pi = { 3.14159 }; # vector gets padded out to # PI, 0, 0, 1
Be aware of the subtle differences between the use of braces and no braces! For example, 2.0
and {2.0}
both have the same value in the first component, but the other three components differ. This is a common low-level shader writing pitfall.
You don't need to declare parameters if you don't want to. You can use them directly as instruction inputs:
MUL tripleCoord, myCoord, 3.0f; MUL scaledResult, {0.1, 0.2, 0.3, 0.4}, myResult;
For convenience, you can access a variety of OpenGL state in the form of state-bound parameters. When OpenGL state changes, the parameters are automatically updated to reflect the changes. This makes emulating fixed functionality more straightforward. For example, instead of manually loading up the modelview/projection (MVP) matrix into four parameter vectors, you can just use the MVP already available in OpenGL state to transform your vertex position:
DP4 result.position.x, vertex.position, state.matrix.mvp.row[0]; DP4 result.position.y, vertex.position, state.matrix.mvp.row[1]; DP4 result.position.z, vertex.position, state.matrix.mvp.row[2]; DP4 result.position.w, vertex.position, state.matrix.mvp.row[3];
Bindable state includes all transformation matrices and the properties of texture coordinate generation, color material, lighting, fog, clip planes, point size and attenuation, texture environment colors, and depth range. Some bindable state parameters are specific to vertex shaders or specific to fragment shaders. Refer to the GL_ARB_vertex_program
and GL_ARB_fragment_program
extension specifications for the complete list of state parameter bindings available to low-level vertex and fragment shaders.
In addition to the inline constants hard-coded into the text and the parameters bound to specific OpenGL state, you can use a third category of generic parameters. They can be loaded with any values and then reloaded later with different values.
Program parameters are divided into two categories: program local and program environment parameters. The local ones are specific to a single shader, whereas the environment parameters are shared by all shaders of a given type. That is, vertex shaders share one set of environment parameters, and fragment shaders share their own set. The parameters are loaded with the commands glProgramLocalParameter4*ARB
and glProgramEnvParameter4*ARB
, which take a slot number and four values. They are then referenced by program.local[
n
]
and program.env[
n
]
within the shader text.
So, if parameters are constants, why would you want to use program parameters instead of just hard-coding the constants into your shader? Certainly, the shader would be more readable if the constant value were explicit in the shader text. Let's consider an example. Maybe you're rendering a scene with a flickering candle, and the brightness of the flicker changes with every frame of rendering. You might have a shader that ends with something like this:
MUL finalColor, litColor, program.local[0]; # local 0 contains flicker factor # in range [0,1]
You could then reuse the shader unchanged and simply update the local parameter once per frame:
glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 0, 0.75f, 0.75f, 0.75f, 0.75f); renderScene(); glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 0, 0.2f, 0.2f, 0.2f, 0.2f); renderScene(); glProgramLocalParameter4fARB(GL_FRAGMENT_PROGRAM_ARB, 0, 0.5f, 0.5f, 0.5f, 0.5f); renderScene();
The parameter is constant during each execution of the vertex or fragment shader, but it isn't constant over all rendered primitives over time. You can change program parameters (or state-bound parameters for that matter) as often as you like outside glBegin/glEnd
pairs.
You can declare an array of parameters that can be indexed either with absolute addressing or relative addressing. With absolute addressing, you supply the exact array index you want to use. Relative addressing is discussed later in the section “Addresses.”
The following are some examples of parameter array declarations. You can declare the array size if you want or let it be sized automatically. If you declare the size but then provide values beyond the size you declared, the shader will fail to parse:
# This one is explicitly sized to 10 vectors PARAM myArray[10] = {2.0, {0.1, 0.2, 0.3, 0.4}, program.env[0..5], state.fog.color, -1.0}; # This one automatically gets sized to 6 vectors PARAM myOtherArray[] = { state.matrix.mvp, state.matrix.texture[0].row[1..2] };
Absolute addressing of the arrays is simply a matter of providing an index when using it in an instruction:
# Scale by 2, then subtract 1 courtesy of the multiply then add (MAD) instruction MAD scaledAndBiased, myColor, myArray[0], myArray[9];
Like parameters, attributes are also read-only inputs. But unlike parameters, attributes tend to change on a per-execution basis. Each new vertex being shaded has a new position, and possibly new input colors and texture coordinates. The same is true of each fragment.
With attributes, like parameters, you can choose to declare them up front, or you can use them directly within an instruction:
TEMP nDotC; ATTRIB vNorm = vertex.normal; # declared attribute DP3 nDotC, vNorm, vertex.color.primary; # color was not declared
Vertex shaders and fragment shaders have their own sets of input attributes, so we cover them separately.
Table 20.5 lists all the input attributes available to vertex shaders. They correspond to all the OpenGL current vertex state that can be changed per-vertex within a glBegin
/glEnd
pair.
Table 20.5. Vertex Attributes
Attribute Binding | Components | Description |
---|---|---|
| (x,y,z,w) | Object-space position |
| (x,y,z,1) | Normal |
| (r,g,b,a) | Primary color |
| (r,g,b,a) | Primary color |
| (r,g,b,a) | Secondary color |
| (f,0,0,1) | Fog coordinate |
| (s,t,r,q) | Texture coordinate on unit 0 |
| (s,t,r,q) | Texture coordinate on unit |
| (x,y,z,w) | Generic attribute |
The one vertex attribute you've probably never seen before is the generic attribute. These attributes have been introduced so that you can specify any kind of per-vertex data, not necessarily one of the kinds previously available with fixed functionality. Binormals, tangent vectors, you name it—anything you'd like to stream into a vertex shader, you can send in through a generic attribute.
You can use the many flavors of the glVertexAttrib*ARB
command to set these generic attributes, or you can put generic attributes in a vertex array using glVertexAttribPointerARB
and glEnableVertexAttribArrayARB
.
One point to keep in mind is that on some implementations, these generic attributes overlap with the fixed functionality attributes. One important case is that calling glVertexAttrib
on attribute 0 is guaranteed to be the same as calling glVertex
, and vice versa. But you need to be more careful with all the other possible aliasing conflicts. Table 20.6 lists the conflicts between generic and fixed functionality attributes.
Table 20.6. Vertex Attribute Aliasing
Generic Binding | Overlapping Attribute | Overlapping Binding |
---|---|---|
| Vertex position |
|
| None | none |
| Normal |
|
| Primary color |
|
| Secondary color |
|
| Fog coordinate |
|
| None | none |
| None | none |
| Texture coordinate 0 |
|
| Texture coordinate |
|
If you call the command to change one attribute on each line of the table, the other becomes undefined (for example, calling glVertexAttrib
on attribute 2 undefines the normal set with glNormal
, and vice versa). Also, you cannot bind to both attributes on each line of the table within the same shader, or your shader will fail to parse. This helps catch accidental aliasing bugs. This result would occur if, for example, you tried to use both vertex.attrib[4]
and vertex.color.secondary
within your shader.
Fragment attributes are the fragment's position and other associated data, interpolated across the primitive. No generic attributes are available here—just the same interpolants available via fixed functionality. Table 20.7 lists all the fragment attributes and their fragment shader bindings.
Table 20.7. Fragment Attributes
Attribute Binding | Components | Description |
---|---|---|
| (x,y,z,1/w) | Window-space position, reciprocal of clip-space w |
| (r,g,b,a) | Primary color |
| (r,g,b,a) | Primary color |
| (r,g,b,a) | Secondary color |
| (s,t,r,q) | Texture coordinate 0 |
| (s,t,r,q) | Texture coordinate |
| (f,0,0,1) |
Outputs are write-only registers that can be used to store the result of an instruction. Like input attributes, outputs are also necessarily different between low-level vertex and fragment shaders.
The set of vertex output registers for the most part represents the colors and coordinates that will be interpolated across the primitive to which the vertex belongs and will become available as fragment shader input attributes. Table 20.8 lists all the low-level vertex shader outputs.
Table 20.8. Vertex Outputs
Output Binding | Components | Description |
---|---|---|
| (x,y,z,w) | Clip-space position |
| (r,g,b,a) | Front-facing primary color |
| (r,g,b,a) | Front-facing primary color |
| (r,g,b,a) | Front-facing secondary color |
| (r,g,b,a) | Front-facing primary color |
| (r,g,b,a) | Front-facing primary color |
| (r,g,b,a) | Front-facing secondary color |
| (r,g,b,a) | Back-facing primary color |
| (r,g,b,a) | Back-facing primary color |
| (r,g,b,a) | Back-facing secondary color |
| (f,*,*,*) | Fog coordinate |
| (s,*,*,*) | Point size |
| (s,t,r,q) | Texture coordinate 0 |
| (s,t,r,q) | Texture coordinate |
The fog coordinate and point size outputs are scalar. Only the first component is used, and the others are ignored. The point size is used only during rasterization to affect the size of the point primitive being generated. It does not become available as a fragment shader input.
Notice that there are four color outputs from the vertex shader and only two color input attributes in the fragment shader. The reason is that the orientation of the primitive is determined during rasterization, at which point either the front-facing or back-facing colors are passed along to the fragment shader.
Any output that isn't written by the vertex shader becomes undefined, so if you then try to use it as an input in the fragment shader, you get garbage. Moral of the story: Make sure that you match up your fragment shader with a vertex shader that generates all the needed interpolants!
Fragment shaders have only two outputs, a final color and a depth (see Table 20.9).
Table 20.9. Fragment Outputs
Output Binding | Components | Description |
---|---|---|
| (r,g,b,a) | Color |
| (*,*,d,*) | Depth coordinate |
The output color is passed along to subsequent per-fragment operations, such as alpha test and blending, and finally is stored in the framebuffer.
The depth output is handled a bit differently than other outputs. Whereas all other outputs are undefined if you don't write them, fragment depth defaults to the depth produced by rasterization if you don't write it. If you do write to the depth output, it overrides the rasterization depth, and this depth is passed along to subsequent stencil and depth test stages.
An alias isn't actually its own type of register. It's just a way of giving a new variable name to an existing register. Temporaries are limited resources, so aliases let you give meaningful new names to “recycled” registers. Here's a contrived example:
TEMP baseMap, outColor; ALIAS lightMap = baseMap; MOV outColor, fragment.color; TEX baseMap, fragment.texcoord[0], texture[0], 2D; MUL outColor, outColor, baseMap; # This next texture lookup puts its result in the same # physical temp as the last lookup, but gives it a new # name to make the shader more easily readable TEX lightMap, fragment.texcoord[1], texture[1], 2D; MUL outColor, outColor, lightMap;
Address registers are used for relative addressing of parameter arrays. This type of addressing gives you access into an array using an arbitrarily computed index. Relative addressing is allowed only in low-level vertex shaders, not in fragment shaders.
Only the first component of an address register is used. The other three components might as well not exist; they can neither be read nor written. Before using relative addressing, you have to declare your address register and then write to it with the ARL
(address register load) instruction:
A few operations can be applied to input and output registers as part of each instruction. They are described in the following sections.
The first operation we'll discuss is input negate. You can negate each input argument to an instruction by putting a minus sign in front of it:
MOV negativeVal, -positiveVal;
Another modifier to input arguments is the swizzle suffix. This suffix swizzles, or rearranges, the components of an input register. This example takes a parameter vector and reverses the order of its components:
PARAM someConstant = { 1, 2, 3, 4 }; ... # The following swizzle results in 4,3,2,1 MUL result, result, someConstant.wzyx;
The swizzle can be any combination of the components x
, y
, z
, and w
, such as .zzzz
, .xywy
, or even the redundant .xyzw
. It can also be a single component, where .x
is equivalent to .xxxx
. Low-level fragment shaders let you use the letters r
, g
, b
, and a
for your swizzles as well because colors are more predominant than coordinates within fragment shaders. .abgr
is the same as .wzyx
.
For scalar instructions that operate on a single input channel (COS
, EX2
, EXP
, LG2
, LOG
, POW
, RCP
, RSQ
, SCS
, and SIN
), you are forced to use a suffix to select the single component that will be used:
RCP oneOverZ, myCoord.z;
Swizzle suffixes determine which components to make available from each input register. Similarly, writemask suffixes determine which output components are written and which remain untouched:
MOV myResult.xyw, foo; # 3rd component stays as-is
The same component letters can be used for writemasks as for swizzles. Vertex shaders can use x
, y
, z
, and w
, whereas fragment shaders can also use r
, g
, b
, and a
for their writemasks. This is just in the name of readability. myColor.rgb
is a lot easier to understand at first glance than myColor.xyz
.
The final modifier is output clamp, also known as saturation. It clamps the result of an instruction to the range [0,1]. This modifier is most useful for colors, and therefore is available only in the low-level fragment shader.
To do an output clamp, you just add the _SAT
suffix to your fragment shader instruction, as follows:
ADD myResult, primaryColor, secondaryColor; # This could overflow # outside [0,1] ADD_SAT myResult, primaryColor, secondaryColor; # Here we clamp to [0,1]
The only instruction this doesn't make sense for is the KIL
instruction, which has no output.
Note that OpenGL automatically clamps the final color and depth outputs from your fragment shader before using them in subsequent pipeline stages, so you don't need to add _SAT
yourself on the final writes to result.color
or result.depth
. The output clamp modifier is just there to facilitate the clamping of intermediate computations.
If you want to clamp a register value within a vertex shader, your best bet is to use the MIN
and MAX
instructions. This sequence can also be used in a fragment shader to clamp to an arbitrary range other than [0,1]:
MIN myValue, myValue, 1; MAX myValue, myValue, 0;
OpenGL implementations have a limited number of resources available to low-level shaders. These resources include temporaries, parameters, instructions, and a few others. If you want your shader to run fast, or even run at all, you need to pay attention to these limits and try to minimize your resource consumption.
The first set of limits is the parser limits. These limits dictate the maximum number of resources that can be present in your shader for OpenGL to even consider trying to compile it. If you exceed any of these limits when calling glProgramStringARB
, your shader will fail to parse, an error will be thrown, and the error string will reflect which resource you overused.
Table 20.10 lists each resource, the way to query its parser limit via glGetProgramivARB
, and the minimum number that must be supported by all implementations. Limits are different for vertex shaders and fragment shaders, and some limits apply only to one or the other.
Table 20.10. Parser Resource Limit Queries
Resource | Query | VS Min. | FS Min. |
---|---|---|---|
Instructions |
| 128 | 72 |
Temporaries |
| 12 | 16 |
Parameters |
| 96 | 24 |
Program env parameters |
| 96 | 24 |
Program local parameters |
| 96 | 24 |
Attributes |
| 16 | 10 |
Addresses |
| 1 | n/a |
ALU instructions |
| n/a | 48 |
Texture instructions |
| n/a | 24 |
Texture indirections |
| n/a | 4 |
If your shader parsed successfully, you can call glGetProgramivARB
to find out how many of each resource was counted by the parser. Table 20.11 lists these query tokens.
Table 20.11. Parser Resource Consumption Queries
Resource | Query |
---|---|
Instructions |
|
Temporaries |
|
Parameters |
|
Attributes |
|
Addresses (VS only) |
|
ALU instructions (FS only) |
|
Texture instructions (FS only) |
|
Texture indirections (FS only) |
You may find yourself asking, “What's a texture indirection?” Or you might not ask yourself that question until the first time you try loading a big shader, and you get an error string complaining about texture indirections. This is as good a place as any to address this resource.
Fixed functionality always used texture coordinate interpolants to sample from textures. But fragment shaders introduce the ability to use any arbitrarily computed temporary register as a texture coordinate. In fact, you can use the result of one texture lookup as the texture coordinate for another lookup. This is called a dependent lookup.
A chain of dependent lookups is simply one dependent lookup after another, where the result of one lookup becomes the texture coordinate for the next lookup, repeated some number of times. Some hardware implementations have an internal limit on the length of the dependency chain that can be used within a fragment shader. This is one of the most common fragment shader pitfalls, and if you take a moment to familiarize yourself with texture indirections, you can avoid hitting the ceiling of this resource.
The GL_ARB_fragment_program
specification provides a fairly simple algorithm used by parsers for counting texture indirections. In the specification, see issue 24, “What is a texture indirection, and how is it counted?” It's easy enough that you can run the algorithm in your head when scanning your own fragment shader text to find out where indirections are being introduced and how to eliminate unnecessary ones.
The parser limits reflect what you pass into the parser. All implementations should count resources exactly the same way against the parser limits. But when your shader falls within these parse limits and successfully parses, you enter a world where all hardware is different. Optimizing compilers take over from here.
Native resource limits reflect more closely what the hardware really has to offer. For example, some hardware implementations might take eight native instructions to perform a sine/cosine (SCS), whereas others might take just one cycle. In both cases, the parser counts this as just one instruction, but the native resource count varies.
Even if, on ideal mythical hardware, all instructions consumed just one native cycle, your parser limits and native limits might still be different. An implementation might advertise twice as many resources for its parser limit than it can actually support in hardware. This would give an optimizer the opportunity to try to reduce the shader enough that it would fit within the native limits. Such optimizations include instruction rescheduling, dead code removal, constant folding, and temporary register collapsing. Significantly reducing a shader's native resource consumption is possible, so it makes sense to give the compiler bigger shaders to take a crack at.
Tables 20.12 and 20.13 list the native limit queries as well as the queries to find out the native consumption of a successfully compiled and optimized shader. They all are queried via glGetProgramivARB
.
Table 20.12. Native Resource Limit Queries
Resource | Query |
---|---|
Instructions |
|
Temporaries |
|
Parameters |
|
Attributes |
|
Addresses (VS only) |
|
ALU instructions (FS only) |
|
Texture instructions (FS only) |
|
Texture indirections (FS only) |
|
Table 20.13. Native Resource Consumption Queries
Resource | Query |
---|---|
Instructions |
|
Temporaries |
|
Parameters |
|
Attributes |
|
Addresses (VS only) |
|
ALU instructions (FS only) |
|
Texture instructions (FS only) |
|
Texture indirections (FS only) |
|
All native limits satisfied? (0/1) |
|
Notice the last entry in Table 20.12. If glProgramStringARB
returns without error, you can then perform this single query to determine whether all resources fell within native limits. If the result is true, you can expect your shader to be hardware accelerated. If it is false, your shader may be executed in software, which can be painfully slow. Or if you prefer, you can query each resource individually to get a finer-detailed perspective on which native resources are near or exceeding their limits.
Like everything else in OpenGL, any state you can set can also be queried. Program parameters can be queried with the glGetProgramEnvParameter*ARB
and glGetProgramLocalParameter*ARB
commands. You can read back your whole shader text via glGetProgramStringARB
. You can check the current vertex attributes with glGetVertexAttrib*ARB
or the vertex attribute array pointer with glGetVertexAttribPointervARB
. To see whether a given name represents an existing shader, call glIsProgramARB
. You can find the details of all these queries in the reference section.
The shader grammar and behavior can be altered with options. An OPTION
line appears at the beginning of a shader before any real instructions. GL_ARB_vertex_program
and GL_ARB_fragment_program
provide the options listed in the following sections, but future extensions may introduce additional options.
Usually, a vertex shader is required to output to result.position
. If this option is present, you cannot output to result.position
, and instead the vertex is automatically transformed to clip space for you:
!!ARBvp1.0 OPTION ARB_position_invariant;
Using this option is not only a convenience when you don't need fancy vertex transformation, but it also helps ensure that the transformation will be the same with or without vertex shaders so you don't have to worry about precision artifacts when multipass rendering.
Each fog application fragment option is another convenience option. Fragment shaders subsume the fog stage and thus are responsible for performing their own fog computations. However, if you specify one of the three fog options shown here, the work will be done for you, just like in fixed functionality. You just choose linear, exponential, or second-order exponential:
!!ARBfp1.0 OPTION ARB_fog_linear; OPTION ARB_fog_exp; # You can't actually specify more than one of these OPTION ARB_fog_exp2; # in your shader or it will fail to parse!
Some fragment shader hardware implementations may support multiple floating-point calculation and internal storage precisions that either exceed or fall short of OpenGL's minimum precision requirements. You can hint to the driver whether you would prefer your fragment shader to be run on more or less than the default precision by using one of these options. Remember, this is just a hint, and some implementations simply ignore the hint:
As you can tell by the length and density of this chapter, low-level shaders are a mix of rocket science and brain surgery. And we haven't even talked about any applications yet; that will start in Chapter 22, “Vertex Shading: Do-It-Yourself Transform, Lighting, and Texgen.” Seeing these shaders in action will make them less intimidating.
What we covered here is the mechanics of low-level shaders. You've been exposed to myriad instruction opcodes, variable types, and input and output modifiers. Finally, we discussed queries and shader options. Soon enough we'll put all this information to work for us.
Purpose: | Disables a vertex attribute array. |
Include File: |
|
Syntax: | |
void glDisableVertexAttribArrayARB(GLuint index);
| |
Description: | This function behaves like |
Parameters: | |
| |
Returns: | None. |
See Also: |
|
Purpose: | Enables a vertex attribute array. |
Include File: |
|
Syntax: | |
void glEnableVertexAttribArrayARB(GLuint index);
| |
Description: | This function behaves like |
Parameters: | |
| |
Returns: | None. |
See Also: |
|
Purpose: | Queries whether a name is a shader name. |
Include File: |
|
Syntax: | |
GLboolean glIsProgramARB(GLuint program);
| |
Description: | This function queries whether the specified name is the name of a shader. |
Parameters: | |
| |
Returns: |
|
See Also: |
|