Chapter 19. Programmable Pipeline: This Isn't Your Father's OpenGL

by Benjamin Lipchak

WHAT YOU'LL LEARN IN THIS CHAPTER:

  • The responsibilities of the conventional fixed functionality OpenGL pipeline

  • The pipeline stages that can be replaced by new programmable pipeline shaders

  • The shader extensions that expose this new functionality

Graphics hardware has traditionally been designed to quickly perform the same rigid set of hard-coded computations. Different steps of the computation can be skipped, and parameters can be adjusted, but the computations themselves remain fixed. That's why this old paradigm of GPU design is called fixed functionality.

There has been a trend toward designing general-purpose graphics processors. Just like CPUs, these GPUs can be programmed with arbitrary sequences of instructions to perform virtually any imaginable computation. The biggest difference is that GPUs are tuned for the floating-point operations most common in the world of graphics.

Think of it this way: Fixed functionality is like a cookie recipe. OpenGL allows you to change the recipe a bit here and there. Change the amount of each ingredient, change the temperature of the oven. You don't want chocolate chips? Fine. Disable them. But one way or another, you end up with cookies.

Enter programmability. Want to pick your own ingredients? Fine. Want to cook in a microwave or a frying pan or on the grill? Have it your way. Instead of cookies, you can bake a cake or grill sirloin or heat up leftovers. The possibilities are endless. The entire kitchen and all its ingredients, appliances, pots, and pans are at your disposal. These are the inputs and outputs, instruction set, and temporary register storage of a programmable pipeline stage.

In this chapter, we cover the conventional OpenGL pipeline and then describe the parts of it that can be replaced by programmable stages.

Out with the Old

Before we talk about replacing it, let's consider the conventional OpenGL rendering pipeline. The first several stages operate per-vertex. Then the primitive is rasterized to produce fragments. Finally, fragments are textured, fogged, and other per-fragment operations are applied before writing each fragment to the framebuffer. Figure 19.1 diagrams the fixed functionality pipeline.

This fixed functionality rendering pipeline represents the old way of doing things.

Figure 19.1. This fixed functionality rendering pipeline represents the old way of doing things.

The per-vertex and per-fragment stages of the pipeline are discussed separately in the following sections.

Fixed Vertex Processing

The per-vertex stages start with a set of vertex attributes as input. These attributes include object-space position, normal, primary and secondary colors, a fog coordinate, and texture coordinates. The final result of per-vertex processing is clip-space position, front-facing and back-facing primary and secondary colors, a fog coordinate, texture coordinates, and point size. What happens in between is broken into four stages.

Vertex Transformation

In fixed functionality, the vertex position is transformed from object space to clip space. This is achieved by multiplying the object space coordinate first by the modelview matrix to put it into eye space. Then it's multiplied by the projection matrix to reach clip space.

The application has control over the contents of the two matrices, but these matrix multiplications always occur. The only way to “skip” this stage would be to load identity matrices, so you end up with the same position you started with.

Each vertex's normal is also transformed, this time from object space to eye space for use during lighting. The normal is multiplied by the inverse of the modelview matrix, after which it is optionally rescaled or normalized. Lighting wants the normal to be a unit vector, so unless you're passing in unit length normal vectors and have a modelview matrix that leaves them unit length, you'll need to either rescale them (if your modelview introduced only uniform scaling) or fully normalize them.

Chapters 4, “Geometric Transformations: The Pipeline,” and 5, “Color, Materials, and Lighting: The Basics,” covered transformations and normals.

Lighting

Lighting takes the vertex color, normal, and position as its raw data inputs. Its output is two colors, primary and secondary, and in some cases a different set of colors for front and back faces. Controlling this stage are the color material properties, light properties, and a variety of glEnable/glDisable toggles.

Lighting is highly configurable; you can enable some number of lights (up to eight or more), each with myriad parameters such as position, color, and type. You can specify material properties to simulate different surface appearances. You also can enable two-sided lighting to generate different colors for front- and back-facing polygons.

You can skip lighting entirely by disabling it. However, when it is enabled, the same hard-coded equations are always used. See Chapters 5 and 6, “More on Colors and Materials,” for a refresher on fixed functionality lighting details.

Texture Coordinate Generation and Transformation

The final per-vertex stage of the fixed functionality pipeline involves processing the texture coordinates. Each texture coordinate can optionally be generated automatically by OpenGL. There are several choices of generation equations to use. In fact, a different mode can be chosen for each component of each texture coordinate. Or, if generation is disabled, the current texture coordinate associated with the vertex is used instead.

Whether or not texture generation is enabled, each texture coordinate is always transformed by its texture matrix. If it's an identity matrix, the texture coordinate is not affected.

This texture coordinate processing stage is covered in Chapters 8, “Texture Mapping: The Basics,” and 9, “Texture Mapping: Beyond the Basics.”

Clipping

If any of the vertices transformed in the preceding sections happen to fall outside the view volume, clipping must occur. Clipped vertices are discarded, and depending on the type of primitive being drawn, new vertices may be generated at the intersection of the primitive and the view volume. Colors, texture coordinates, and other vertex properties are assigned to the newly generated vertices by interpolating their values along the clipped edge. Figure 19.2 illustrates a clipped primitive.

All three of this triangle's vertices are clipped out, but six new vertices are introduced.

Figure 19.2. All three of this triangle's vertices are clipped out, but six new vertices are introduced.

The application may also enable user clip planes. These clip planes further restrict the clip volume so that even primitives within the view volume can be clipped. This technique is often used in medical imaging to “cut” into a volume of, for example, MRI data to inspect tissues deep within the body.

Fixed Fragment Processing

The per-fragment stages start out with a fragment and its associated data as input. This associated data is composed of various values interpolated across the line or triangle, including one or more texture coordinates, primary and secondary colors, and a fog coordinate. The result of per-fragment processing is a single color that will be passed along to subsequent per-fragment operations, including depth test and blending. Again, four stages of processing are applied.

Texture Application and Environment

Texture application is the most important per-fragment stage. Here, you take all the fragment's texture coordinates and its primary color as input. The output will be a new primary color. How this happens is influenced by which texture units are enabled for texturing, which texture images are bound to those units, and what texture function is set up by the texture environment.

For each enabled texture unit, the 1D, 2D, 3D, or cube map texture bound to that unit is used as the source for a lookup. Depending on the format of the texture and the texture function specified on that unit, the result of the texture lookup will either replace or be blended with the fragment's primary color. The resulting color from each enabled texture unit is then fed in as a color input to the next enabled texture unit. The result from the last enabled texture unit is the final output for the texturing stage.

Many configurable parameters affect the texture lookup, including texture coordinate wrap modes, border colors, minification and magnification filters, level-of-detail clamps and biases, depth texture and shadow compare state, and whether mipmap chains are automatically generated. Fixed functionality texturing was covered in detail in Chapters 8 and 9.

Color Sum

The color sum stage starts with two inputs: a primary and a secondary color. The output is a single color. There's not a lot of magic here. If color sum is enabled, or if lighting is enabled, the primary and secondary colors' red, green, and blue channels are added together and then clamped back into the range [0,1]. If color sum is not enabled, the primary color is passed through as the result. The alpha channel of the result always comes from the primary color's alpha. The secondary color's alpha is never used by the fixed functionality pipeline.

Fog Application

If fog is enabled, the fragment's color is blended with a constant fog color based on a computed fog factor. That factor is computed according to one of three hard-coded equations: linear, exponential, or second-order exponential. These equations base the fog factor on the current fog coordinate, which may be the approximate distance from the vertex to the eye, or an arbitrary value set per-vertex by the application.

For more details on fixed functionality fog, see Chapter 6.

Antialiasing Application

Finally, if the fragment belongs to a primitive that has smoothing enabled, one piece of associated data is a coverage value. That value is 1.0 in most cases, but for fragments on the edge of a smooth point, line, or polygon, the coverage is somewhere between 0.0 and 1.0. The fragment's alpha value is multiplied by this coverage value, which will with subsequent blending produce smooth edges for these primitives. Chapter 6 discussed this behavior.

In with the New

That trip down memory lane was intended to both refresh your memory on the various stages of the current pipeline and to give you an appreciation of the configurable but hard-coded computations that happen each step of the way. Now forget everything you just read. We're going to replace the majority of it and roll in the new world order: shaders.

Shaders are also sometimes called programs, and the terms are usually interchangeable. And that's what shaders are—application-defined customized programs that take over the responsibilities of fixed functionality pipeline stages. I prefer the term shader because it avoids confusion with the typical definition of program, which can mean any old application.

Figure 19.3 illustrates the simplified pipeline where previously hard-coded stages are subsumed by custom programmable shaders.

The block diagram looks simpler, but in reality these shaders can do everything the original fixed stages could do, plus more.

Figure 19.3. The block diagram looks simpler, but in reality these shaders can do everything the original fixed stages could do, plus more.

Programmable Vertex Shaders

As suggested by Figure 19.3, the inputs and outputs of a vertex shader remain the same as those of the fixed functionality stages being replaced. The raw vertices and all their attributes are fed into the vertex shader, rather than the fixed transformation stage. Out the other side, the vertex shader spits texture coordinates, colors, point size, and a fog coordinate, which are passed along to the clipper, just like the output from the fixed functionality lighting stage. A vertex shader is a drop-in replacement for those three per-vertex stages.

Replacing Vertex Transformation

What you do in your vertex shader is entirely up to you. The absolute minimum (if you want anything to draw) would be to output a clip-space vertex position. Every other output is optional and at your sole discretion. How you generate your clip-space vertex position is your call. Traditionally, and to emulate fixed functionality transformation, you would want to multiply your input position by the modelview and projection matrices to get your clip-space output.

But say you have a fixed projection and you're sending in your vertices already in clip space. In that case, you don't need to do any transformation. Just copy the input position to the output position. Or, on the other hand, maybe you want to turn your Cartesian coordinates into polar coordinates. You could add extra instructions to your vertex shader to perform those computations.

Replacing Lighting

If you don't care what the vertex's colors are, you don't have to perform any lighting computations. You can just copy the color inputs to the color outputs, or if you know the colors will never be used later, you don't have to output them at all, and they will become undefined. Beware, if you do try to use them later after not outputting them from the vertex shader, undefined usually means garbage!

If you do want to generate more interesting colors, you have limitless ways of going about it. You could emulate fixed functionality lighting by adding instructions that perform these conventional computations, maybe customizing them here or there. You could also color your vertices based on their positions, their surface normals, or any other input vector.

Replacing Texture Coordinate Processing

If you don't need texture coordinate generation, you don't need to code it into your vertex shader. The same goes for texture coordinate transformation. If you don't need it, don't waste precious shader cycles implementing it. You can just copy your input texture coordinates to their output counterparts. Or, as with colors, if you won't use the texture coordinate later, don't waste your time outputting it at all. For example, if your graphics card supports eight texture units, but you're going to use only three of them for texturing later in the pipeline, there's no point in outputting the other five. Doing so would just consume resources unnecessarily.

You understand the input and output interfaces of vertex shaders, largely the same as their fixed functionality counterparts. But there's been a lot of hand waving about adding code to perform the desired computations within the shader. This would be a great place for an example of a vertex shader, wouldn't it? Alas, this chapter covers only the what, where, and why of shaders. The next four chapters are devoted to the how, so you'll have to be patient and use your imagination. Consider this the calm before the storm. In a few pages, you'll be staring at more shaders than you ever hoped to see.

Fixed Functionality Glue

In between the vertex shader and fragment shader, there remains a couple of fixed functionality stages that act as glue between the two shaders. One of them is the clipping stage described previously, which clips the current primitive against the view volume and in so doing possibly adds or removes vertices. After clipping, the perspective divide by W occurs, yielding normalized device coordinates. These coordinates are subjected to viewport transformation and depth range transformation, which yield the final window-space coordinates. Then it's on to rasterization.

Rasterization is the fixed functionality stage responsible for taking the processed vertices of a primitive and turning them into fragments. Whether a point, line, or polygon primitive, this stage produces the fragments to “fill in” the primitive and interpolates all the colors and texture coordinates so that the appropriate values are assigned to each fragment. Figure 19.4 illustrates this process.

Rasterization turns vertices into fragments.

Figure 19.4. Rasterization turns vertices into fragments.

Depending on how far apart the vertices of a primitive are, the ratio of fragments to vertices tends to be relatively high. For a highly tessellated object, though, you might find all three vertices of a triangle mapping to the same single fragment. As a general rule, significantly more fragments are processed than vertices, but as with all rules, there are exceptions.

Rasterization is also responsible for making lines the desired width and points the desired size. It may apply stipple patterns to lines and polygons. It generates partial coverage values at the edges of smooth points, lines, and polygons, which later are multiplied into the fragment's alpha value during antialiasing application. If requested, rasterization culls out front- or back-facing polygons and applies depth offsets.

In addition to points, lines, and polygons, rasterization also generates the fragments for bitmaps and pixel rectangles (drawn with glDrawPixels). But these primitives don't originate from normal vertices. Instead, where interpolated data is usually assigned to fragments, those values are adopted from the current raster position. See Chapter 7, “Imaging with OpenGL,” for more details on this subject.

Programmable Fragment Shaders

The same texture coordinates, fog coordinate, and colors are available to the fragment shader as were previously available to the fixed functionality texturing stage. The same single color output is expected out of the fragment shader that was previously expected from the fixed functionality fog stage. Just as with vertex shaders, you may choose your own adventure in between the input interface and output interface.

Replacing Texturing

The single most important capability of a fragment shader is performing texture lookups. For the most part, these texture lookups are unchanged from fixed functionality in that most of the texture state is set up outside the fragment shader. The texture image is specified and all its parameters are set the same as though you weren't using a fragment shader. The main difference is that you decide within the shader when and if to perform a lookup and what to use as the texture coordinate.

You're not limited to using texture coordinate 0 to index into texture image 0. You can mix and match coordinates with different textures, using the same texture with different coordinates or the same coordinate with different textures. Or you can even compute a texture coordinate on the fly within the shader. This flexibility was impossible with fixed functionality.

The texture environment previously included a texture function that determined how the incoming fragment color was mixed with the texture lookup results. That function is now ignored, and it's up to the shader to combine colors with texture results. In fact, you might choose to perform no texture lookups at all and rely only on other computations to generate the final color result. A fragment shader could simply copy its primary color input to its color output and call it a day. Not very interesting, but such a “passthrough” shader might be all you need when combined with a fancy vertex shader.

Replacing Color Sum

Replacing the color sum is simple. This stage just adds together the primary and secondary colors. If that's what you want to happen, you just add an instruction to do that. If you're not using the secondary color for anything, ignore it.

Replacing Fog

Fog application is not as easy to emulate as color sum, but it's still reasonably easy. First, you need to calculate the fog factor, which is an equation based on the fragment's fog coordinate and some constant value such as density. Fixed functionality dictated the use of linear, exponential, or second-order exponential equations, but with shaders you can make up your own equation. Then you blend in a constant fog color with the fragment's unfogged color, using the fog factor to determine how much of each goes into the blend. You can achieve all this in just a handful of instructions. Or you can not add any instructions and forget about fog. The choice is yours.

Introduction to Shader Extensions

Enough with the hypotheticals. If you've made it this far, you must have worked up an appetite for some real shaders by now. In the following sections, we introduce the different OpenGL extensions that expose programmable shaders. These extensions were developed and approved by the OpenGL Architecture Review Board (ARB), and as such are widely supported by graphics card vendors throughout the industry.

Low-Level Extensions

The low-level extensions are GL_ARB_vertex_program and GL_ARB_fragment_program, used for replacing the fixed functionality vertex stages and fragment stages, respectively.

Much like assembly language versus C, this first set of extensions operates at a low level, giving more direct access to the features and resources of the GPU. As with assembly language, you make the trade-off between programming at a more cumbersome, detail-oriented, and often complex level in exchange for the full control of the hardware and improved performance.

Strictly speaking, you're not really coding at the assembly level because each hardware vendor has a unique GPU design, each with its own native instruction representation and instruction set. Each has its own limits on the number of registers, constants, and instructions. What you're capturing in these low-level extensions is just the lowest common denominator of functionality that's available from all vendors.

Listings 19.1 and 19.2 are your first exposure to these low-level shaders. Consider them to be “Hello World” shaders, even though technically they don't say hello at all.

Example 19.1. A Simple GL_ARB_vertex_program Vertex Shader

!!ARBvp1.0
# This is our Hello World vertex shader
# notice how comments are preceded by '#'

ATTRIB iPos = vertex.position;         # input position
ATTRIB iPrC = vertex.color.primary;    # input primary color

OUTPUT oPos = result.position;         # output position
OUTPUT oPrC = result.color.primary;    # output primary color
OUTPUT oScC = result.color.secondary;  # output secondary color

PARAM mvp[4] = { state.matrix.mvp };   # modelview * projection matrix

TEMP tmp;                              # temporary register

DP4 tmp.x, iPos, mvp[0];               # Multiply input position by MVP
DP4 tmp.y, iPos, mvp[1];
DP4 tmp.z, iPos, mvp[2];
DP4 tmp.w, iPos, mvp[3];

MOV oPos, tmp;                         # Output clip-space coord

MOV oPrC, iPrC;                        # Copy primary color input to output

RCP tmp.w, tmp.w;                      # tmp now contains 1/W instead of W
MUL tmp.xyz, tmp, tmp.w;               # tmp now contains persp-divided coords
MAD oScC, tmp, 0.5, 0.5;               # map from [-1,1] to [0,1] and output
END

Example 19.2. A Simple GL_ARB_fragment_program Fragment Shader

!!ARBfp1.0
# This is our Hello World fragment shader

ATTRIB iPrC = fragment.color.primary;    # input primary color
ATTRIB iScC = fragment.color.secondary;  # input secondary color

OUTPUT oCol = result.color;              # output color

LRP oCol.rgb, 0.5, iPrC, iScC;           # 50/50 mix of two colors
MOV oCol.a, iPrC.a;                      # ignore secondary color alpha
END

If these shaders are not self-explanatory, don't despair! Chapter 20, “Low-Level Shading: Coding to the Metal,” will make sense of it all. Basically, the vertex shader emulates fixed functionality vertex transformation by multiplying the object-space vertex position by the modelview/projection matrix. Then it copies its primary color unchanged. Finally, it generates a secondary color based on the post-perspective divide normalized device coordinates. Because they will be in the range [–1,1], you also have to divide by 2 and add 1/2 to get colors in the range [0,1]. The fragment shader is left with the simple task of blending the primary and secondary colors together. Figure 19.5 shows a sample scene rendered with these shaders.

The colors are pastel tinted by the objects' positions in the scene.

Figure 19.5. The colors are pastel tinted by the objects' positions in the scene.

High-Level Extensions

Programming GPUs in a high-level language means less code, more readable code, and thus more productivity. The OpenGL Shading Language (GLSL) is the name of this language. It looks a lot like C, but with built-in data types and functions that are useful to vertex and fragment shaders.

Four extensions are involved here: GL_ARB_shader_objects, GL_ARB_vertex_shader, GL_ARB_fragment_shader, and GL_ARB_shading_language_100. The first extension describes the mechanism for loading and switching between shaders and is shared by the next two extensions, one covering vertex shader specifics and one for fragment shader specifics. The fourth extension describes the GLSL language itself, again shared by vertex shaders and fragment shaders.

There is a confusing similarity in extension names between the low level and the high level: GL_ARB_*_program versus GL_ARB_*_shader. Just remember that the low-level ones are called programs, and the high-level ones are called shaders. This distinction is reinforced only by their extension names. In reality, they're all shaders.

Notice how Listings 19.3 and 19.4, which perform the same computations as the low-level Hello World shaders, are representable in fewer lines of more readable code.

Example 19.3. A Simple GLSL Vertex Shader

void main(void)
{
    // This is our Hello World vertex shader
    // notice how comments are preceded by '//'

    // normal MVP transform
    vec4 clipCoord = gl_ModelViewProjectionMatrix * gl_Vertex;
    gl_Position = clipCoord;

    // Copy the primary color
    gl_FrontColor = gl_Color;

    // Calculate NDC
    vec3 ndc = clipCoord.xyz / clipCoord.w;

    // Map from [-1,1] to [0,1] before outputting
    gl_SecondaryColor = (ndc * 0.5) + 0.5;
}

Example 19.4. A Simple GLSL Fragment Shader

// This is our Hello World fragment shader
void main(void)
{
    // Mix primary and secondary colors, 50/50
    gl_FragColor = mix(gl_Color, vec4(vec3(gl_SecondaryColor), 1.0), 0.5);
}

Chapter 21, “High-Level Shading,” will help you understand this code if it isn't readable enough already.

Summary

In this chapter, we outlined the conventional per-vertex and per-fragment pipeline stages, setting the stage for their wholesale replacement by programmable stages. We briefly introduced both the low-level and high-level shading extensions that step in for their fixed functionality counterparts.

High-level shader compilers are improving rapidly, and like C compilers, they soon will be generating hardware code that's as good or better than hand-coded assembly. Although the low-level extensions are currently very popular, expect the high-level extensions to gain mindshare in the near future as GPU compiler technology continues to advance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset