Chapter 4

Geometric Transformations

Geometric transformations come into play all the time in computer graphics, and to learn how to manipulate them correctly will save you hours of painful debugging. In this chapter, we will take an informal approach, starting from simple intuitive examples and then generalizing.

4.1 Geometric Entities

In computer graphics we deal with three entities: scalars, points and vectors. A scalar is a one-dimensional entity that we use to express the magnitude of something, for example the temperature of a body or the weight of a car. A point is a location in space, for example the tip of your nose, and a vector is a direction, for example the direction in which you are walking or the orientation of a satellite antenna. We did not specify the dimensions of points and vectors because they depend on the dimension of the space we are dealing with. If we work with bidimensional space, then points and vectors will have 2 dimensions, if we are in three-dimensional space (as we will in most practical cases) then they will have 3 dimensions, and so on. In this book we will use the italic for scalars, bold for points and italic bold for vectors:

  1. scalars a, b, α
  2. points p=[pxpypw] ,  q=[qxqyqw]  
  3. vectors v=[vxvyvw] ,  u=[uxuyuw]  

Also, in general we will often use p and q to indicate points. We will use a set of operations on these entities to express transformation of the objects they represent. These operations, shown in Figure 4.1, are:

Figure 4.1

Figure showing points and vectors in two dimensions.

Points and vectors in two dimensions.

  • point–vector sum: we can add vector v0 to point p0 and obtain point p1. We can think of this operation as moving, or translating, point p0 by v0;
  • point–point subtraction: in the same way, we can subtract point p0 from point p1 and obtain vector v0. We can think of this as the method of finding the translation required to move point p0 to point p1.
  • vector–vector sum: again, if we think of a vector as a translation then it will be obvious that the result of two consecutive translations is also a translation. That means v0 + v1 = v1 + v0 = v2.
  • scalar–vector multiplication: this operation maintains the direction of the vector but scales its length by a factor. So instead of writing p4 = p0 + v2 + v2 we can write p4 = p0 + 2v2

4.2 Basic Geometric Transformations

A generic geometric transformation is a function that maps points to points or/and vectors to vectors. In the computer graphics domain we are only interested in a small subclass of transformations that, luckily, are the easiest to handle.

From now on, when we talk about transformation of an object, the intended meaning is that we apply a transformation to all the points of that object.

We will start by looking at transformations in 2D because they are easier to understand with the help of our intuition, and then we will move onto their 3D counterparts.

4.2.1 Translation

A translation is a transformation applied to a point defined as:

Tv(p)=p+v

We have already seen an example of translation in the previous section when considering point-vector sum. Figure 4.2.(a) shows the (very basic) drawing of a car as seen from above and its translation by v = [1, 2]T.

Figure 4.2

Examples showing of translation (a), uniform scaling (b) and nonuniform scaling (c).

Examples of translation (a), uniform scaling (b) and nonuniform scaling (c).

4.2.2 Scaling

A scaling is a transformation applied to a point or a vector defined as:

S(sx, sy)(p)=[sxpxsypy]

A scale multiplies each coordinate of a point by a scalar value, named scaling factor. As the word implies, the effect of scaling is to change the size of the object. Figure 4.2.(b) shows the effect of applying S2,2 to our car. When all the scaling factors are equal, in this case when sx = sy, the scale is said to be uniform, non-uniform otherwise (see Figure 4.2.(c)).

4.2.3 Rotation

A rotation is a transformation applied to a point or a vector defined as:

Rα(p)=[px cos αpy sin αpx sin α+py cos α]

where α is the rotation angle, that is the angle by which the point or vector is rotated around the origin. This formula for rotation is less trivial and needs a small proof. Referring to Figure 4.3, consider the point p at distance ρ from the origin. The vector connecting the point p to the origin makes an angle β with the X-axis. The coordinates of p can be written as:

p=[pxpy]=[ρ cos βρ sin β]

Figure 4.3

Figure showing computation of the rotation of a point around the origin.

Computation of the rotation of a point around the origin.

If we rotate the point p with respect to the origin by an angle α in the counterclockwise direction we obtain a new point p′, which is still at a distance ρ from the origin and such that the vector p′ − 0 forms an angle α with p − 0, where 0 is the origin, that is, the vector [0, 0]T. Again, let us express the coordinates of p′ in terms of the angle p′ − 0 forms with the x axis.

p=[ρ cos(β+α)ρ sin(β+α)]

Using the trigonometric equality cos (β+α)=cos β cos αsin β sin α we get:

px=ρ cos (β+α)=ρ cos β cos αρ sin β sin α==px cos αpy sin α

A very similar derivation, using the equality sin (β+α)=sin β cos α+cos β sin α, completes the proof.

4.2.4 Expressing Transformation with Matrix Notation

If we want to express a series of consecutive transformations, even a simple translation followed by a rotation, the expressions easily become very long and practically unmanageable. Here we show how things may be simplified using matrix notation. You may have noticed how both scaling and rotation are expressed as a linear combination of coordinates:

px=axx px+axy pypy=ayx px+ayy py

Thus, we could conveniently write these transformations in a more compact way:

p=[axxaxyayxayy]  p

with:

S(sx, sy)(p)=S(sx, sy)p=[sx00sy] pRα(p)=Rαp=[cos α sin αsin αcos α]  p

Note that we indicate with the same symbol of the function the matrix corresponding to that geometric transformation. Also note that, this does not capture translation, which is an addition to the point coordinates and not a combination of them. So, if we want to write a rotation followed by a translation, we still have to write Rαp + v.

In order to extend the use of matrix notation to translation we have to express points and vectors in homogeneous coordinates. Homogeneous coordinates will be explained in more detail in Section 4.6.2.2; here we only need to know that a point in Cartesian coordinates p = [px, py]T can be written in homogeneous coordinates as ˉp=[px, py, 1]T, while a vector ˉv=[vx, vy]T can be written as ˉv=[vx, vy, 0]T. The first thing to notice is that now we can distinguish between points and vectors just by looking at whether the last coordinate is 1 or 0, respectively. Also note that the sum of two vectors in homogeneous coordinates gives a vector (0 + 0 = 0), the sum of a point and a vector gives a point (1 + 0 = 1) and the subtraction of two points gives a vector (1 − 1 = 0), just like we saw earlier in this section.

In order to simplify notation, we define the following equivalences:

p=p[001] ,  p=[001]  +p

that is, if we have a point p we write as p the vector from the origin to p and, vice versa, if we have a vector p we write as p the point obtained adding the vector to the origin.

With homogeneous coordinates, we only need one more column and one more row for matrix vector multiplication. In particular, we will use matrices of the form:

[axxaxyvxayxayyvy001]          (4.1)

Notice what the product of such a matrix by the point ˉp looks like:

[axxaxyvxayxayyvy001]  [pxpy1]=[axxpx+axypy+vxayxpx+ayypy+vy1]=[[axxaxyayxayy]  p+v1]

In other words, now we are also able to express the translation of a point in our matrix notation, by arranging the translation vector in the upper part of the last column of the matrix. Note that if we multiply the matrix with a vector [vx, vy, 0] the translation part will have no effect because the elements of v are multiplied by 0. This is coherent with the fact that vectors represent a direction and a magnitudo (the length of the vector, hence a scalar value): the direction can be changed by a rotation or by a non-uniform scaling, the magnitude can be changed by a scaling but neither of them is affected by translation.

4.3 Affine Transformations

So far we have seen three types of transformations: translation, scaling and rotation, and we have seen how they can conveniently be expressed with a 3 × 3 matrix by using homogeneous coordinates. You may wonder if the matrices like the one in Equation (4.1) may also represent other kinds of transformations and how they are characterized. In fact the matrix in (4.1) expresses the class of affine transformations. A transformation is said to be affine if:

  • It preserves collinearity. This means that points that lie on a line before being transformed also do so after being transformed.
  • It preserves proportions. This menas that, given three points p1, p2 and p3 that lie on a line, the ratio ||p2p1||/||p3p1|| is preserved after the transformation. Here ||v|| represents the norm (that is, magnitude or length) of vector v.

Figure 4.4 shows an example of an affine transformation applied to three points. Among the affine transformations, the rotation and the translation are called rigid transformations because they only move the object they are applied to, preserving all the angles and lengths. The scaling is not rigid, but if uniform it preserves the angles. Also, parallel lines are mapped to parallel lines.

Figure 4.4

Figure showing (Left) Three collinear points. (Right) The same points after an affine transformation.

(Left) Three collinear points. (Right) The same points after an affine transformation.

4.3.1 Composition of Geometric Transformations

Suppose you want to move the car in Figure 4.5 from position A to position B. To do so, you can first rotate the car clockwise by 45° and then translate by (3, 2). The matrix for a clockwise rotation by 45° is:

R 45°=[cos ( 45°) sin ( 45°)0sin ( 45°)cos ( 45°)0001]=[12 12012120001]

Figure 4.5

Figure showing combining rotation and translation.

Combining rotation and translation.

and the matrix for a translation by (3, 2) is:

T(3, 2)=[103012001]

so you can apply the rotation R−45° to all the points of the car and bring it to the position marked with A′, then apply the translation T(3, 2):

T(3, 2)(R45°p)=[103012001] ( [12 12012120001]  p)

since the matrix product is associative, that is A (B C) = (A B) C, where every term of the product can be either a matrix or a vector with compatible dimensions, we can multiply the two matrices first and so obtain a matrix that will be applied to p:

T(3, 2)(R 45°p)=(T(3, 2)R 45°)p=( [103012001] [12 12012120001] )  p=[12 12312122001]  p=Mp

Matrices of this form, containing a rotation and a translation, are often referred to as roto-translation matrices. They are very common since they are the tools you have to use to move things around, keeping their shape intact.

In general we can apply any number of affine transformations and obtain a single corresponding matrix as we did above. We will now see that some of them are especially useful in practical situations.

4.3.2 Rotation and Scaling about a Generic Point

The rotation matrix we have used so far performs a rotation about the origin, that is about the point [0, 0, 1]T. Very often you want to rotate an object about a point different from the origin, for example you may want to rotate the car about its barycenter. Note that in the example in Figure 4.5 we just hide this fact by placing the car so that its barycenter coincided with the origin.

A rotation by α about a generic point c is obtained as the composition of three transformations, illustrated in Figure 4.6.

Figure 4.6

Figure showing how to make an object rotate around a specified point.

How to make an object rotate around a specified point.

  1. Apply the translation Tc so that the point c coincides with the origin.
  2. Apply the rotation Rα.
  3. Apply the translation Tc=T 1c so that the point c returns in its initial position.

In one formula:

Rα, c=T 1 cRαT c

where we indicate with Rα, c the matrix corresponding to the rotation by α about the point c.

It is interesting here to see what the matrix Rα, c looks like. To do so we may do the whole product or we can switch back to Cartesian coordinates and write the transformation of point p (let us drop the pedices to simplify the notation):

p=(R(pc))+c=R pR c+c=R protation+(IR)ctranslation

so that after all, we have a roto-translation matrix:

Rα, c=[R(IR)c01]          (4.2)

The very same considerations hold for the scaling, just putting a scaling matrix S(sx, sy) in place of the rotation matrix Rα, thus:

S(sx, sy), c=[S(IS)c01]          (4.3)

4.3.3 Shearing

Shearing is an affine transformation where the point p is scaled along one dimension by an amount proportional to its coordinate in another dimension (see the example in Figure 4.7).

Figure 4.7

Example showing of shearing for <i>h</i> = 0 and <i>k</i> = 2.

Example of shearing for h = 0 and k = 2.

Sh(h, k)(p)=[1k0h10001]  p=[x+kyhx+y1]

Note that shearing can also be obtained as a composition of a rotation and a non-uniform scaling (see Exercise 5, Section 4.13).

4.3.4 Inverse Transformations and Commutative Properties

The inverse of an affine transformation is easily computed as:

[Av01] 1=[A 1 A 1v01]

The meaning of the inverse of an affine transformation is to restore a transformed object into its original shape. For example it is geometrically intuitive that T 1v=T v,  R 1α=R α and S 1(sx, sy)=S(1/sx, 1/sy) (the proof of these equivalences is left for the exercises). As a side note, we anticipate here that the rotation matrices are orthonormal, meaning that R−1 = RT. So the inverse of a rototranslation is simply obtained as:

[Rc01] 1=[RT RTc01]          (4.4)

Matrix product is associative, which means that A (B C) = (A B) C, but, in general, not commutative, which means that A  BB  A. Among the affine transformations the commutative property holds only between two translations and between uniform scaling and rotation (and of course always if at least one of the two transformations is the identity or if the two are equal). The general non-commutativity of matrix multiplication means, in other words, that “the order of multiplication matters”. And it matters a lot, and remembering this will save us a lot of debugging time.

4.4 Frames

A frame is a way to assign values to the component of geometric entities. In two dimensions, a frame is defined by a point and two non-collinear axes. Figure 4.8 shows our car and two distinct frames. If you were given only frame F0 and were asked to write the coordinates of the point p on the front of the car, it would be obvious to say (2.5,1.5). The presence of the other frame implies the relative nature of coordinates of point p, which are (1.4, 2.1) with respect to frame F1.

Figure 4.8

Figure showing coordinates of a point are relative to the frame.

Coordinates of a point are relative to the frame.

Note that the origin and axis of these frames are themselves expressed in a frame. The frame we use to express all other frames is called a canonical frame and has origin in [0,0,1]T and axes u = [1,0,0]T and v = [0, 1, 0]. To change frame means to express the value of the components of points and vectors given in a frame into another, which is what we have just done visually in the previous example. Here we want to show how to compute the transformation required to change frame from F0 to F1. Let us start with changing the coordinates of a point given in one frame F0 into the canonical frame. We can do this geometrically by starting from the origin of F0, moving p0x units along vector u0 and p0y units along vector v0:

p=O0+p0xu0+p0yv0

Note that we can write this formula in matrix notation by arranging the axes as column vectors and the origin as the last column of a matrix (that we will call M0):

p=[u0xv0xO0xu0yv0yO0y001]M0  p0

It should not be a surprise that this matrix is just another affine transformation, because we have just applied two scalings and two point-vector sums. In fact there is more: the upper left 2 × 2 matrix obtained by the axes of F0 is the rotation matrix that brings the canonical axes to coincide with the F0 axes and the last column is the translation that brings the origin of the canonical frame to coincide with F0. So the matrix that expresses the coordinates p0 in the canonical frame is just a rototranslation.

p=[RuvO001]  p0

If we apply the same steps to p1 in frame F1 we obtain:

p=[R0O001]M0  p0=[R1O101]M1  p1

therefore:

M 11M0p0=p1

The matrix M 11M0 transforms a point (or a vector) given in the frame F0 into the frame F1. Note that from Section 4.3.4 we know how to invert rototranslation matrices like M1 without resorting to algorithms for the inversion of generic matrices.

4.4.1 General Frames and Affine Transformations

So far we have only seen orthonormal frames, which are characterized by having normalized and orthogonal axes. In general, the axes of a frame do not have to be orthogonal or normalized, and then any affine transformation may be considered a change of frame. The direct implication is that applying an affine transformation to an object is the same as changing the frame in which it is expressed. Whether we refer to an affine transformation as a transformation of the coordinates of the object or transformation of the reference frame is a mere matter of what is more intuitive for our inner representation of reality. If, for example, we are asked to find the transformation that brings our car forward we think of it as an object transformation and not as bringing the reference frame backward, but, if we go down to the math, there is no difference.

4.4.2 Hierarchies of Frames

A very important concept to grasp when using frames is the hierarchical way they can be arranged for an intuitive and practical description of the scene. We could describe the drawing in Figure 4.9 as a car with four frames: one centered in the middle of the car, one on a corner of the driver cabin, one on the center of the steering wheel and one on the keyhole. Now consider the following sentences:

  • The keyhole is on the right of the steering wheel.
  • The steering wheel is inside the driver cabin.
  • The driver cabin is on the left side of the car.

Figure 4.9

Figure showing (Right) an example of relations among frames. (Left) How it can be represented in a graph.

(Right) An example of relations among frames. (Left) How it can be represented in a graph.

In these sentences we have expressed the position of the keyhole in the frame of the steering wheel, the position of the steering wheel in the frame of the driver cabin and the position of the driver cabin in the frame of the car. In the above description we first established a hierarchical relationship between the frames where the parent of a frame F is the frame in which the values of F, its origin and its axes, are expressed.

The hierarchy and the relations among the frames can be conveniently illustrated with a directed graph as in Figure 4.9. Each node corresponds to a frame, each arc corresponds to the transformation that converts the coordinates from the tail frame to the head frame and the thick arrows show the hierarchical relation among the frames. The matrix representing the frame and the matrix, which transforms the coordinates of the current frame to the coordinates of the parent frame, are the same thing, so the arc (KH, SW) is associated with matrix KH and consequently its opposite arc, (SW, KH), is associated with the inverse KH−1.

Note that if we apply a transformation to one of the frames, such a transformation will also have an effect on all the nodes of the subtree rooted at that node. So, if we translate the frame C the frames DC, SW and KH will be translated in the same way, although C will be the only matrix to change.

4.4.3 The Third Dimension

Everything we learned so far about transformations in two dimensions holds in any dimension. To extend the formulas to the 3D case we only need to add the z component to points and vectors and a fourth row and column to the matrices. This is true for all the transformations except for rotations, which need some more explanation, provided in the next section.

Here we start extending the notion of frames in 3D space. Since we have shown all our examples with the axis x and y in the plane, it is straighforward to add a third axis z orthogonal to the plane XY to obtain a reference frame for the 3D space. There are two possible verses for the vector: one pointing upward and one downward. This determines the orientation of the frame, or handness of the coordinate system.

The term handness derives from the rule illustrated in Figure 4.10. If we set our hands as shown in the figure and assign the x, y and z to the thumb, index and middle fingers, respectively, we will obtain a left-handed reference system (LSH) with the left hand and a right-handed reference system (RHS) with the right hand.

Figure 4.10

Figure showing handness of a coordinate system.

Handness of a coordinate system.

The right-handed system is somehow a more common choice. Just consider when you draw the x an y axis on paper. If you want to add a third axis to make a three-dimensional frame it would be coming out of the paper towards you (RHS), and not through the table (LHS).

The RHS also gives us a mnemonic rule for the cross product between two vectors, that is: x × y = z. So if you have two generic vectors a and b and assign a to your right thumb and b to your right index-finger, the middle finger gives you the direction of a × b.

4.5 Rotations in Three Dimensions

In Section 4.2.3 we showed how to compute a rotation around the origin of the bidimensional plane. If we consider the bidimensional plane as an infinitely thin slice of 3D space passing through the origin [0, 0, 0,1]T and perpendicular to the z-axis (the axis [0, 0, 1, 0]T) then we can see how the rotation in two dimensions is also a rotation in 3D around the z axis:

Rα, z=[cos α sin α00sin αcos α0000100001]          (4.5)

Note that applying a rotation about the z axis only changes the x and y coordinates. In other words a point rotated around an axis always lies on the plane passing through the point and orthogonal to that axis. In fact, the rotation in 3D is defined with respect to an axis, called the axis of rotation, which is usually specified as r=or+tdir,  t( , ). All the techniques to implement a rotation that we will see consider the case where the rotation axis passes through the origin, which means r=[0, 0, 0, 1]T+t  dir (see Figure 4.11). This is not a limitation, since, as we did in Section 4.3.2, we can compose the transformations to find the rotation around a generic axis. This can be achieved by applying a translation to make the generic axis pass through the origin, applying the rotation and then translating it back.

Figure 4.11

Example showing of rotation around an axis.

An example of rotation around an axis.

Figure 4.12

Figure showing how to build an orthogonal frame starting with a single axis.

How to build an orthogonal frame starting with a single axis.

4.5.1 Axis–Angle Rotation

The first technique to find the rotation around an axis of rotation r (defined as previously specified) is a composition of transformations:

  1. Apply the transformation that makes the rotation axis r coincide with the z axis.
  2. Apply the rotation Rα, z using the matrix in equation (4.5).
  3. Apply the transformation that makes the z axis coincide with the rotation axis r.

We know how to rotate a point about the z axis, but what about the first transformation (and its inverse)? Suppose that we have a whole frame Fr whose z axis coincides with r. The affine transformation F 1r brings the frame Fr to coincide with the canonical frame, which means the axis r is transformed to the axis z, just like we wanted. In matrix form:

Rα, r=FrRα, zF 1r          (4.6)

So what we miss is just the x and y axis of the frame Fr, which we will calculate in the next section.

4.5.1.1 Building Orthogonal 3D Frames from a Single Axis

As stated in the previous section, it is useful to see how we can build a 3D orthogonal frame starting from a partial specification of it, say the axis zF (that we want corresponding to r). The xr axis will have to be orthogonal to zF. We use the property of vector (or cross) product: given two vectors a and b, the vector c = a × b is orthogonal to both, that is, ca = 0 and cb = 0 (see Appendix B). If we take any vector a, we can define the xr axis as:

xr=r×a

and then do the same operation to obtain y

yr=r×xr

It should be clear that the choice of a determines the frame. We must be careful about choosing a vector a not collinear with vector r, because the vector product of two collinear vectors gives a null vector. If we pick three random numbers to build the vector a, the chances of creating a vector collinear to r are infinitely small, but not zero. Furthermore the finite precision of the computer representation of real numbers also degrades the quality of the vector product for quasi collinear vectors. So, instead of taking the chance, we consider the position of the smallest component of vector r, let us call it i, and define a as the vector with value 1 for the component i and 0 for the other components. In other words, we take the canonical axis “most orthogonal” to r.

4.5.1.2 Axis–Angle Rotations without Building the 3D Frame

Note that there are infinite orthogonal frames having zr as their z axis and they all can be used for our rotation. This means that whatever choice for vector a we make, we will always end up in the same rotation matrix, which in turn means that if we simplify the expression we will see that a disappears from it, which will then be written only in terms of r and α.

Instead of performing the tedious algebraic simplification we will now show it with a geometric approach. Considering Figure 4.13, let p be the point we want to rotate by α around the axis r and let p′ be the arrival position. p and p′ line on a plane orthogonal to the rotation axis r: let us build a new frame F taking the intersection of such a plane with the axis r as the origin OF, the vectors r, xF=pOF and yF=r×xF as axes. The coordinates of point p′ with respect to the frame F are [cos α, sin α, 0]T, which means that the coordinates of the point p′ in the canonical frame are:

p=[OFxxF yF rOFyOFz01]  [cos αsin α01]=cos α xF+sin α yF+0 r+OF

Figure 4.13

Figure showing rotation around an axis without building a frame.

Rotation around an axis without building a frame.

OF is the projection of p on axis t r, which is:

OF = (p r) r

and thus:

xF=pOF=p(pr) ryF=r×xF=r×(pOF)==r×(pOF)=r×pr×(pr) r=r×p

so we have the formula for rotating the point p around the axis rdir by α:

p=cos α p+(1cos α)(pr)r+sin α (r×p)(4.7)

4.5.2 Euler Angles Rotations

For some applications it is convenient to specify a rotation as a combination of rotations around the canonical axes. The most well known example is the control of an aircraft in a flight simulator: referring to Figure 4.14, consider the three possible rotations of the aircraft. The rotation around the Y axis is called yaw (aircraft turning left or right), the rotation around the X axis is called pitch (head up or down) and the rotation around the Z axis is called roll (inclination to left or right). Every possible orientation of the aircraft (which means every possible rotation) may be obtained by properly specifying the angles for yaw, pitch and roll, that we call α, β and γ respectively, and composing the three rotations.

Figure 4.14

Figure showing a gimbal and the rotation of its rings.

A gimbal and the rotation of its rings.

With the name Euler angles we usually refer to the angles by means of which we specify a rotation. How they specify a rotation, that is which axes are used, in which order the axis rotations are applied and if they are intrinsic or extrinsic, is a matter of convention.

Here we show a classic example: the aircraft mounted on a gimbal. A gimbal is a system of three concentric rings, r1, r2 and r3, where each ring is bound to the first outer ring through a support that allows rotation. We may associate a frame to each ring, F1, F2 and F3, that coincides with the canonical frame if α = β = γ = 0.

As shown in Figure 4.14 the ring r1 lies on the plane XY of frame F1, which rotates around the canonical Y axis, the ring r2 lies on the plane XZ of frame F2 which rotates around the X axis of the frame F1, and the ring r3 lies on the plane YZ of frame F3, which rotates around the Z axis of the frame F2. Figure 4.15 shows the corresponding hierarchy of frames and the transformations/changes of coordinates between them (which are all rotations around a canonical axis).

Figure 4.15

Figure showing scheme of the relations among the three rings of a gimbal.

Scheme of the relations among the three rings of a gimbal.

So the values of α, β and γ determine the frame F3 as:

F3=Rzγ Rxβ Ryα C

Note that rotation R is specified with respect to its outer frame, that is, the canonical frame transformed by rotation R, as well as rotation Rzγ is expressed in the canonical frame rotated by R R. In this case we speak of intrinsic rotations while if all the rotations were expressed with respect to the canonical frame we would have extrinsic rotations.

The mapping between angles and resulting rotations is not bijective, in fact the same rotation can be obtained with a combination of different angles.

Figure 4.16 illustrates a very annoying phenomenon: the gimbal lock. If β = π/2 rotation is applied with rings r1 and r3 both make the aircraft roll by an angle α + γ so we have infinite values for α and β leading to the same rotation, and one degree of freedom is lost: more precisely, the gimbal is no longer able to make the aircraft yaw and the rotation is locked in 2 dimensions (pitch and roll). This means that when we are using euler angles we need to take into account these degenerate configurations, for example, making it so that the discontinuity points correspond to rotations we are not interested in.

Figure 4.16

Figure showing illustration of gimbal lock: when two rings rotate around the same axis one degree of freedom is lost.

Illustration of gimbal lock: when two rings rotate around the same axis one degree of freedom is lost.

4.5.3 Rotations with Quaternions

The quaternions are a mathematical entity that can be used to represent rotations in a way more efficient than with matrices and more robust than Euler angles. They provide a sound way to interpolate rotations and are easy to implement. The only bad news is that quaternions are a bit more difficult to understand.

The quaternions are an extension of complex numbers with two more imaginary parts:

a = aw + iax + jay + kaz

usually found in short form as

a = (aw, a)

with a = (ax, ay, az). So, quaternions can also be seen as points in four dimensions.

Quaternions can be summed component by component just like point in two or three dimensions:

a+b=(aw+bw, i(ax+bx)+j(ay+by)+k(az+bz))

The multiplication is done using the following rules:

i2=j2=k2=ijk=1

which gives:

ab=(awbwab, awb+bwa+a×b)

The identity is 1 = (1, 0, 0,0) and the inverse can be calculated as:

a 1=1||a||2(awa)

while the magnitude of a quaternion is ||a||2=a2w+a2x+a2y+a2z.

Finally, the quaternions in the form (0, x, y, z) represent the points (x, y, z) in 3D space.

Concerning rotation, the following quaternion:

q=(cos α2, sin α2 r)

with ||r||=1 may be used to rotate a point p=(0, px, py, pz) around the axis r by α. More precisely, back to the example of Figure 4.13, we can demonstrate that:

p=qpq 1          (4.8)

The proof can be obtained in a few simple derivations.

q pq 1=(cosα2, sinα2r) (0, p)(cosα2,  sinα2r)=(cosα2, sinα2r)  (sinα2(pr), cosα2 psinα2(p×r))=(0, (cos2α2sin2α2)p+2 sin2α2 r(pr)+2 sinα2cosα2 (r×p))

Now, using the following trigonometric equivalences:

cos2α2sin2α2=cosα   2sin2α2=(1cosα)  2cosα2sinα2=sinα

we obtain precisely the equation 4.7.

4.6 Viewing Transformations

At this point we know how to manipulate objects to create our 3D scene but we did not say anything about how these objects will be projected to the 2D screen by the rasterization-based pipeline. Let us imagine that you are in the scene with a camera: are you looking at our car from the front, from the back, from far away? Are you using a zoom or a wide angle lens?

4.6.1 Placing the View Reference Frame

We can specify our view on a scene by answering three questions: from what position are we looking, called the view point, towards which direction are we looking, called the viewing direction and towards which direction is pointing the spike of the “pickelhaube helmet” we are wearing, called the up direction (see Figure 4.17). Conventionally we express these data with a frame, called view reference frame, whose origin is at the view point, the y axis oriented like the up direction and the z axis point as the opposite of viewing direction, while the x axis, pointing toward the right of the viewer, is found as x = y × z (again see Figure 4.17). With the term view transformation we indicate the transformation that expresses all the coordinates with respect to the view reference frame.

Figure 4.17

Figure showing view reference frame.

View reference frame.

So we can write our view frame in matrix form as:k

V=[xxyxzxoxxyyyzyoyxzyzzzoz0001]

Hence, the inverse of the V matrix allows us to transform the coordinates of our scene to the view reference frame. Since V describes an orthogonal frame, we know, from Section 4.3.4, that V−1 can be computed by:

V 1=[RTxyz( RTxyz o)01] ,  Rxyz=[xxyxzxxyyyzyxzyzzz]

4.6.2 Projections

Establishing the projection means describing the type of camera we are using to observe the scene. In the following we will assume that the scene is expressed in the view reference frame, therefore the point of view is centered at the origin, the camera is looking toward the negative z axis and its up direction is y.

4.6.2.1 Perspective Projection

Suppose you are standing beyond an infinitely large window and looking outside with only one eye. You can draw a line from each point to your eye and, for every point you see, the corresponding line will pass through the window. This example describes the perspective projection : the point of view (your eye) is called center of projection C, the plane of the window plane of projection (or image plane) V P and the lines are projectors. So, given a 3D point p, a center of projection C and a plane of projection V P, its perspective projection p′ is the intersection between the line passing through p and C, and the plane V P. In our case C = [0,0,0,1]T and plane V P is orthogonal to the z axis. Figure 4.18 shows an example where the plane V P is (z = −d). In this case it is easy to find the formula for the projection of the point p by observing that the triangles Cap′ and Cbp are similar and hence the ratios between corresponding sides are equal:

py:d=py:pzpy=pypz/d

the same holds for the x coordinate, therefore

p=[pxpz/dpypz/dd1]          (4.9)

Figure 4.18

Figure showing the perspective projection.

The perspective projection.

Note that our example is only an ideal description, since the eye is not a zero-dimensional point. Another, more common, way to introduce perspective projection is by means of the pinhole camera, represented in Figure 4.19. The pinhole camera consists of a light-proof box with a single infinitely small hole so that each point in front of the camera projects on the side opposite to the hole. Note that in this way the image formed is inverted (up becomes down, left become right) and this is why we prefer the eye-in-front-of-the-window example.

Figure 4.19

Figure showing the pinhole camera.

The pinhole camera.

4.6.2.2 Perspective Division

We introduced homogeneous coordinates in section 4.2.4 almost like a trick to be able to express translations in matrix notation. In fact homogeneous coordinates do more: they allow us to express the equivalence of points with respect to their projections. Referring to Figure 4.18, note that not just point p but all the points that lie on the line passing through C and p project onto the point p′, with the single exception of the center of projection. In homogeneous coordinates we can write this equivalence as:

p=[pxpypz1]=[λ0pxλ0pyλ0pzλ0]=[λ1pxλ1pyλ1pzλ1]=[λ2pxλ2pyλ2pzλ2]  =

Note that the fourth component can be any real number λi0 but when it is 1 the coordinates are said to be in the canonical form. When a point is in canonical form we can represent it in 3D space by simply considering its first three coordinates. Consider again our point p′ in equation 4.9: if we multiply all the components by pzd we have the following equivalence:

p=[pxpz/dpypz/dd1]=[pxpypzpzd]

Note that the difference is that now no component of p′ appears in the denominator and so the perspective projection can be written in matrix form:

Prsp p=perspective projection[100001000010001d1]  [pxpypz1]=[pxpypzpzd]=[pxpz/dpypz/dd1]

The operation of turning a point in homogeneous coordinates in its canonical form consists of dividing each component by the fourth one and it is called perspective division. There is a fundamental difference between the projection matrix and all the matrices we have encountered so far: the last row of the projection matrix is not [0, 0, 0,1] like for the affine transformations. In fact, perspective transformation is not affine in that, although it preserves collinearity, it does not preserve ratios between distances.

4.6.2.3 Orthographic Projection

The projections where all the projectors are parallel are called parallel projections. If the projectors are also orthogonal to the projection plane we have orthographic projections (see Figure 4.20). In this case the projection matrix is trivially obtained by setting the z coordinate to d. To be more precise, the x and y values of the projected points are independent from d, so we just consider the projection plane z = 0. In matrix notation:

Orth p=orthographic projection[1000010000000001]  [pxpypz1]=[pxpy01]

Figure 4.20

Figure showing the orthographics projection.

The orthographics projection.

Unlike the perspective projection, the orthographic projection is an affine transformation (although not invertible). The orthogonal projection can also be seen as an extreme case of perspective projection, where the distance between the center of projection and the projection plane is .

4.6.3 Viewing Volume

The term viewing volume indicates the portion of 3D space that is seen from our ideal camera. For example, if we take one of the projections above and a specified rectangular region of the viewing plane, called viewing window, the viewing volume consists of the portion of space whose projection falls inside such viewing windows. By this definition, the viewing volume has infinite extent, because points that fall inside the viewing window may be infinitely far to the view point. However, in computer graphics we bound the viewing volume with two other planes, called near plane and far plane. Figure 4.21 (Top) shows two examples of viewing volume, one obtained with a perspective projection and one with an orthographics projection. The planes that bound the viewing volume are called clipping planes because they are used to “clip” the geometric primitives against it so as to discard from the rest of the pipeline the parts of the objects that would not be visible (details on this process will be given later in Section 5.4).

Figure 4.21

Figure showing all the projections convert the viewing volume in the canonical viewing volume.

All the projections convert the viewing volume in the canonical viewing volume.

4.6.3.1 Canonical Viewing Volume

We have seen that, depending on the projection, the viewing volumes can be either a parallelepiped or a trunk of a pyramid.

As we know from Chapter 1.3.2, projection is not the last operation of the rendering pipeline, but many other operations such as clipping, rasterization, and so on, are at work. This means that at a certain point of the rasterization-based pipeline the algorithms should be “parametric” on the specific viewing volume. A more elegant and efficient option to account for this is to establish a common interface between the projection and the rest of the pipeline. In fact, in the pipeline it is imposed that the projection always transforms its corresponding viewing volume into a Canonical Viewing Volume (CVV), which is by definition the cube aligned with the axis and with corners [−1, −1, −1] and [1, 1, 1].

Figure 4.21 shows the viewing volumes corresponding to an orthographic projection and a perspective projection and their mapping to the canonical viewing volume. The orthogonal and perspective projections taking into account this mapping become:

Porth=orthographic projection[2rl00r+lrl02tb0t+btb00 2fn f+nfn0000]  Ppersp=perspective projection[2nrl0r+lrl002ntbt+btb000 (f+n)fn 2fnfn00 10]          (4.10)

In these transformations, n indicates the distance of the near plane, f the distance of the far planes, and t, b, l, r are the values that define the top, the bottom, the left, and the right delimitation of the viewing window, respectively, all expressed with respect to the observer (that is, in view space).

Be aware that a point transformed by the projection matrix is not yet in its canonical form: it will be so only after its normalization. So, the projection matrix does not bring points from view space to CVV but to a four-dimensional space called clip space because it is the space where clipping is done. The Canonical Viewing Volume is also referred to as Normalized Device Context (NDC) to indicate the normalization applied to the homogeneous coordinates. Coordinates in NDC space are called normalized device coordinates.

4.6.4 From Normalized Device Coordinates to Window Coordinates

We have seen how, after projection and perspective division, the viewing volume is mapped to the canonical viewing volume. More specifically the viewing window is mapped to the face (− 1, −1, −1), (1, −1, −1), (1, 1, −1), (−1, 1, −1) of the canonical viewing volume. The last transformation we need is to map from such a face to the rectangular portion of the screen we designated for rendering, called the viewport. Figure 4.22 shows the face of the canonical viewing volume (in the middle) and the viewport (on the right). Note that the viewport is a portion of a bigger rectangle, called application windows, which is the region of the screen your application can draw into. In order to transform the face of the CVV to the viewport, we may do the followings: (see Figure 4.22): 1) apply a translation of [1, 1]T to bring the Bottom-Left corner to (0, 0)T; 2) scale by (vXvx)/2 along the x axis and by (vYvy)/2 along the y axis; 3) translate by (vx, vy)T. In matrix form:

W=[10vx01vx001]  [vXvx2000vYvy20001]  [101011001]=[vXvx20vXvx2vx0vYvy2vYvy2vy001]

Figure 4.22

Figure showing from CVV to viewport.

From CVV to viewport.

4.6.4.1 Preserving Aspect Ratio

The aspect ratio of a window is the ratio between width and height. If the aspect ratio of the viewing window (which is defined in world space) and the aspect ratio of the viewport (which is defined in screen coordinates) are different, we will see our scene deformed because the matrix PW will produce a non-uniform scaling.

The most common situation is that the viewport is fixed and coincides with the canvas size and we would like to set a viewing window that does not have the same aspect ratio. If we do not want to see a distorted image (and usually we don’t) we have to set a viewing window that is as close as possible to the one we had in mind and that has the same aspect ratio as the viewport.

Let Vw, Vh the sizes of the viewing window we would like to set and Ww and Wh the sizes of the vieport. We have different aspect ratios, so:

VwVhWwWh

and we want to change Vw and Vh so that the aspect ratios are the same. We can easily express all the ways to modify these values by introducing two scaling factors kw and kh and writing:

Vw=Vw kwVh=Vh kh

so that

VwVh=WwWh

Note that in order to change the aspect ratio just one coefficient would be enough, but we also want to choose how to change width and height of the viewing window. For example, we may want to keep the width fixed, so: kw = 1 and kh=WhWw VwVh.

A fairly common choice is not to cut anything from our intended projection and enlarge one of the two sizes. This is achieved by setting kw = 1 and solving for kh if VwVh>WwWh. We set kh = 1 and we solve for kw otherwise. You may easily verify both geometrically and algebraically that the unknown coefficient is greater than 1, that is, we enlarge the viewing window.

4.6.4.2 Depth Value

Note that the transformation from clip space to window coordinates does not involve the z component, so the distance of the point to the near plane is apparently lost. Although this topic will be treated in section 5.2, here we anticipate that a memory buffer of the same size as the viewport, called depth buffer or z-buffer, is used to store depth value of the object points visible through each pixel. Conventionally, the value of z[  1, +1 ] (in clip space) is linearly mapped to the range z[ 0, 1 ]. However, we must note that the mapping of z from the [near, far] to the CVV is not linear (more details about this will be given in Section 5.2.4).

4.6.5 Summing Up

Figure 4.23 summarizes the various properties preserved by all the transformations discussed in this chapter.

Figure 4.23

Transformations Length Angles Ratio Collinearity
Translation Yes Yes Yes Yes
Rotation Yes Yes Yes Yes
Uniform scaling No Yes Yes Yes
Non-uniform scaling No No Yes Yes
Shearing No No Yes Yes
Orthogonal projection No No Yes Yes
Generic affine transformation No No Yes Yes
Perspective transformation No No No Yes

Summary of the geometric properties preserved by the different geometric transformations.

4.7 Transformations in the Pipeline

As we learned in Chapter 1.3.2, when vertices enter the pipeline, their position is the one specified by the application. The vertex shader processes one vertex at a time and transforms its position into its clip space coordinates. The transformation applied by the shader may be obtained as the concatenation of transformations as shown in Figure 4.24. The first transformation is conventionally called model transformation and it is used to place the object in the scene. For example, the primitive cube created in Section 3.9 is centered at the origin, but if in our scene we want it to be centered at coordinates [20, 10, 20]T, our model transformation will have to be a translation. So, the vertices of every object will be subjected to a proper model transformation to bring the object in the right position and with the right proportions. The next transformation is the view transformation and that is the one that transforms all the coordinates so that they are expressed in the view reference frame, like we saw in Section 4.6.1. Finally we apply the projection transformation that transforms all the coordinates in clip space, like we saw in Section 4.6.2.

Figure 4.24

Figure showing logic scheme of the transformations in the pipeline.

Logic scheme of the transformations in the pipeline.

Please note that this decomposition of the transformation applied to the vertices coordinates is purely logical and it is not encoded in the WebGL API in any way. We may program the vertex shader to output coordinates of the vertex in any way we want; it does not have to be done by means of a matrix multiplication or even to be linear. We make this point because it marks an important difference with respect to the old fixed pipeline. In the fixed pipeline, the state of the API only exposed two 4 × 4 matrices that you could use to specify the transformation to apply: the MODELVIEW matrix and the PROJECTION matrix. Each vertex was then transformed by the concatenation of these transformations.

4.8 Upgrade Your Client: Our First 3D Client

In Section 3.9 we have seen how to specify a polygonal shape and we created some JavaScript objects for them: the Box, the Cylinder, the Cone and the Track. Now we will use these shapes to assemble all the elements of our first working client.

Figure 4.25 shows a scheme of how to create a simple model both for a tree and the car and gives names to these transformations (M0,…M9). The first task is to compute these transformations.

Figure 4.25

Figure showing using basic primitives and transformations to assemble the race scenario.

Using basic primitives and transformations to assemble the race scenario.

4.8.1 Assembling the Tree and the Car

As shown in Figure 4.25, the trunk of the tree is obtained by transforming the cylinder with diameter 2 and height 2 into one with diameter 0.5 and height 0.8, so we only need a non-uniform scaling:

M1=S(0.25, 0.4, 0.25)=[ 0.2500000.400000.2500001 ]

The upper part of the tree is also obtained by a non-uniform scaling, this time of the cone. However, the scaled cone must also be translated along the Y axis by 0.8 units in order to be put onto the trunk. Therefore:

M0=T(0, 0.8, 0) S(0.6, 1.65, 0.6)=[ 10000100.800100001 ]   [ 0.600001.6500000.600001 ]

Our first car will be a very simple one, made by a box for the car’s body and four cylinders for the wheels. We want to transform the cube centered in (0, 0, 0) with side 2 into the box with sizes (2, 1, 4) and with its bottom face on the plane Y = 0.3. We first apply a translation by 1 along the Y axis to make the bottom face of the cube to lie on the plane Y = 0. In this way we can apply the scaling transformation S(1, 0, 5, 2) to have the box with the right proportions while keeping the bottom face on the plane Y = 0. Finally we translate again along Y by the radius of the wheel. All together:

M6=T(0, 0.3, 0) S(1, 0.5, 2) T(0, 1, 0)

The transformation M3 turns the cylinder into a wheel centered at the origin. We can first translate along Y by −1 to have the cylinder centered at the origin, then rotate by 90° around the Z axis, then scale to obtain the 0.6 diameter and 0.1 width:

M3=S(0.05, 0.3, 0.3) R90Z T(0,  1, 0)

The matrices from M4 to M7 are simply translations that bring the wheels to their place on the side of the car.

4.8.2 Positioning the Trees and the Cars

At this point we have assembled a tree and a car in their right proportions so no scaling will be involved in M8 and M9. Transformation M8 brings the assembled tree on its position along the circuit. The transformation will be a simple translation T(tx, ty, tz), where [tx,ty,tz]T is the position of the tree (if the ground is all flat ty will be 0). Transformation M9 brings the car onto its position and orientation in the track. In the NVMC framework, the position and rotation of the car is specified by a frame Fc, which, as we have seen in Section 4.4.1, corresponds to a roto-translation matrix.

4.8.3 Viewing the Scene

Now that we have found all the matrices to build the scene from the given primitives, we have to specify from where we are looking at the scene and what kind of projection we are using. As a first example we decide to watch the whole scene from above with an orthogonal projection (an over-head view). So let us say we are positioned at [0.0, 10, 0.0] and looking down along the Y axis. The corresponding viewing frame is described by the following matrix:

V=[ uxvxwxOxuyuywyOyuzuzwzOz0001 ]=[ 100000 1100 1000001 ]

Let us say that the projection is such that it includes the whole circuit, which is a square of 200m × 200m. Since we have set the point of view along the y axis, we will need to include 100m on each side (left, right, top and bottom) in the viewing volme. So we take the matrix in 4.10 and replace r, l, t and b with 100. The near plane n may be set to 0 and the far plane f so that it includes the ground. Since we are looking from [0, 10, 0]T, f may be set to a value greater than 10 (we will see in Section 5.2 how to choose the far plane correctly to avoid artifacts due to limited numerical representation).

P=[ 2/20000002/200000 2/11 11/110001 ]

As explained in Section 4.6.4.1, these values may need to be changed according to the viewport chosen.

4.9 The Code

At this point the problem of writing the necessary code to show our simple scene may be put as follows: we have the function drawObject that activates the rendering of a primitive and four kinds of primitives: Cube, Cone, Cylinder and Street. Furthermore, we have the matrices which transform each instance of each primitive in order to compose our scene. We know from Chapter 1 that the coordinates of every vertex are transformed in the vertex transformation and attribute setup phases, and more precisely in the vertex shader. So what we have to do is to make it so that when a primitive is rendered the vertex shader will apply the right transformation. We use Figure 4.26, which shows the whole hierarchy of frames/transformation of our scene, as a reference. This is essentially the same as Figure 4.25 but this time we put in evidence the hierarchy and also add the viewing transformations. What you find in this hierarchy is similar to what you find when you consider a scene described with a scene graph, which is in fact a hierarchical organization of a scene commonly used to model and optimize the rendering of complex scenes. A detailed description of a scene graph is beyond the scope of this book, so we mention it only to give the intuitive idea of it.

Figure 4.26

Figure showing hierarchy of transformations for the whole scene.

Hierarchy of transformations for the whole scene.

4.10 Handling the Transformations Matrices with a Matrix Stack

The code can be just a sequence of three steps: 1) compute the proper transformation; 2) set the vertex shader to apply it; 3) render the primitive. Let us consider how the code would look for drawing the first two wheels:

// draw the wheel 1 (wl)
M = P * Inverse(V) * M_9 * M_4 * M_3;
SetMatrixInTheVertexShader(M);
draw(w1);
// draw the wheel 2 (w2)
M = P * Inverse(V) * M_9 * M_5 * M_3;
SetMatrixInTheVertexShader(M);
draw(w2);
// …

Although this way to apply transformations is correct, we can see it requires unnecessary computation. Specifically, the first three matrices of the multiplication, P, V−1 and M9 are the same for all the wheels, so the obvious thing to do is to compute their product in advance and store it in a variable. We may apply the same idea to avoid computing PV−1 three times (for the car, the trees and the track).

In order to generalize this mechanism we note from Figure 4.26 that the order in which the matrices are multiplied before drawing any shape describes a path from the clip space node to the shape node. In the figure we use a path of cyan arrows for the case of the first wheel. With this interpretation if two or more paths share a subpath, it is convenient to compute and store the product of the matrices corresponding to that subpath.

The simple and elegant way to implement this mechanism is by assuming to access directly the transformation matrix used by the vertex shader, which we will call current, and to keep this matrix updated with the value resulting from the composition of transformations applied. Furthermore, we use a stack (that is, a last-in first-out data structure) of matrices that stores all the values of current that we want to save for later use, which we will call a matrix stack. The table in Figure 4.9 shows the sequence of matrix operations we do to draw the first wheel. Note that every time we traverse a node with more than one child we make a push of the matrix on the stack, because we know we will need it again for all the other children.

In the example, when we draw the cylinder for the first wheel (w1), we can see that current = P V−1 M9 M4 M3. Then we make a pop from the stack and reset current to the last value pushed on the stack, which is P V−1 M9.

We are now ready to comment the code in Listing 4.2. At line 295 we create a SpiderGL object SglMatrixStack. Like the name suggests, this object implements a stack of matrices, so it provides the methods push and pop. The matrix we referred to as current is a member of the object SglMatrixStack. The function multiply(P) right-multiplies the matrix current, which means it performs current = current P.

The lines between 229 and 240 (Listing 4.1) set the projection matrix and the view matrix. Although we can manually set these matrices as those shown in Section 4.8.3, SpiderGL provides the utilities to build these matrices by specifying a few intuitive parameters. SglMat4.ortho builds an orthogonal projection matrix by specifying the extremes of the viewing volume. SglMat4.lookAt creates the view matrix (that is, the inverse of the view frame) by passing the eye position, the point towards which we are looking and the up direction.

Note that we set the viewing volume taking into account the ratio between width and height of the viewport (which in this case has the same size as the canvas) to preserve the aspect ratio, as explained in Section 4.6.4.1.

At line 242 we push the current matrix, which is now P invV, and then we multiply by the car frame (returned by function myFrame).

229  var ratio = width / height;
230  var bbox = this.game.race.bbox;
231  var winW = (bbox [3] - bbox [0]);
232  var winH = (bbox [5] - bbox [2]);
233  winW = winW * ratio * (winH / winW);
234  var P = SglMat4.ortho([-winW / 2, -winH / 2, 0.0], [winW /2, winH / 2, 21.0]);
235  gl.uniformMatrix4fv(this.uniformShader.uProjectionMatrixLocation, false, P);
236  
237  var stack = this.stack;
238  stack.loadIdentity();
239  // create the inverse of V
240  var invV = SglMat4.lookAt ([0, 20, 0], [0, 0, 0] , [1, 0, 0]);
241  stack.multiply(invV);
242  stack.push();

LISTING 4.1: Setting projection and modelview matrix.

290  NVMCClient.onInitialize = function () {
291 var gl = this.ui.gl;
292 NVMC.log(“SpiderGL Version : ” + SGL_VERSION_STRING + “
”);
293 this.game.player.color = [1.0, 0.0, 0.0, 1.0];
294 this.initMotionKeyHandlers();
295 this.stack = new SglMatrixStack();
296 this.initializeObjects(gl);
297 this.uniformShader = new uniformShader(gl);

LISTING 4.2: Actions performed upon initialization.

 2  var vertexShaderSource = "
 3 uniform  mat4 uModelViewMatrix;  
 4 uniform  mat4 uProjectionMatrix; 
 5 attribute vec3 aPosition;    
 6 void main(void)      
 7 {	      
 8 gl_Position = uProjectionMatrix * uModelViewMatrix  
 9  * vec4(aPosition, 1.0);    
10}";
11  
12  var fragmentShaderSource = "
13 precision highp float;    
14 uniform vec4 uColor;     
15 void main(void)      
16 {       
17  gl_FragColor = vec4(uColor);   
18} ";

LISTING 4.3: A basic shader program.

Listing 4.3 shows the vertex and the fragment shader. With respect to the shader shown in Listing 2.8 we added a uniform variable for the color, so that every render call will output fragments of color uColor and the matrices uProjectionMatrix (that we set at line 235 of Listing 4.1) and uModelViewMatrix (that we set to stack.matrix before rendering any shape) to transform the vertex position from object space to clip space.

Lines 147-148 of Listing 4.4 show the typical pattern for drawing a shape: we first set the uModeViewMatrix the value of the current matrix in the stack (stack.matrix) and then make the render call. A snapshot from the resulting client is shown in Figure 4.27.

Figure 4.27

Figure showing a snapshot from the very first working client.

A snapshot from the very first working client. (See client http://envymycarbook.com/chapter4/0/0.html.)

147  gl.uniformMatrix4fv(this.uniformShader.uModelViewMatrixLocation, false, stack.matrix);
148  this.drawObject(gl, this.cylinder, [0.8, 0.2, 0.2, 1.0], [0, 0, 0, 1.0]);

LISTING 4.4: Setting the ModelView matrix and rendering

4.10.1 Upgrade Your Client: Add the View from above and behind

To make things more interesting, suppose we want to place the view reference frame as in Figure 4.28, which is the classic view from up-behind where you see part of your car and the street ahead. This reference frame cannot be a constant because it is behind the car and so it is expressed in the frame of the car F0, which changes as the car moves. In terms of our hierarchy, this view reference frame is a child of the car’s frame F0. In terms of transformations, the view reference frame can be easily expressed in the frame of the car by rotating the canonical frame around the x axis by −30° and then translating by (0, 3, 1.5).

V c0=T(0, 3.0, 1.5)  Rx( 20°)

Figure 4.28

Figure showing a view reference frame for implementing the view from behind the car.

A view reference frame for implementing the view from behind the car.

Please note that Vc0 is expressed in the frame of the car F0, and not (yet) in the world reference frame. As for the wheels and the car’s body, to know the view reference frame in the world coordinates we just need to multiply it by F0:

V0=F0 V c0

Since we will create more view frames and we want to give some structure to the code we create an object that represents a specific view frame and the way we manipulate it. In this case we call this object ChaseCamera, and show its implementation in Listing 4.5.

74  function ChaseCamera() {
75 this.position  = [0.0, 0.0, 0.0];
76 this.keyDown  = function (keyCode) {}
77 this.keyUp   = function (keyCode) {}
78 this.mouseMove   = function (event) {};
79 this.mouseButtonDown = function (event) {};
80 this.mouseButtonUp = function () {}
81 this.setView  = function (stack, F_0) {
82  var Rx = SglMat4.rotationAngleAxis(sglDegToRad(-20), [1.0, 0.0, 0.0]);
83  var T = SglMat4.translation([0.0, 3.0, 1.5]);
84  var Vc_0 = SglMat4.mul(T, Rx);
85  var V_0 = SglMat4.mul(F_0, Vc_0);
86  this.position = SglMat4.col(V_0 ,3);
87  var invV = SglMat4.inverse(V_0);
88  stack.multiply(invV);
89};
90 };

LISTING 4.5: The ChaseCamera sets the view from above and behind the car. (Code snippet from http://envymycarbook.com/chapter4/1/1.js.)

The lines from 76 to 80 define functions to handle key and mouse events. For this specific camera they do nothing, since the camera simply depends on the car position and orientation, which is passed to the function setView in line 81. The goal of this function is to update the matrix stack (passed as parameter stack) with the viewing transformation. In this case the lines from 82 to 88 simply build the frame as defined by the equations above and finally its inverse is right multiplied to the stack’s current matrix.

4.11 Manipulating the View and the Objects

In every interactive application the user must be enabled to manipulate the scene and/or change the point of view using some input interface such as the mouse, the keyboard, the touch screen or a combination of them. This means translating some events (e.g., click and/or movement of the mouse) into a modification to the transformations in the pipeline shown in Figure 4.24. In a 3D scene this is typically done following one of the following paradigms: camera-in-hand (CIH) and world-in-hand (WIH). The CIH paradigm corresponds to the idea that the user moves in the scene holding a camera, like in a first person shooter video game or like any subjective sequence in a movie. It is useful when the 3D scene is “big” with respect to the user (for example, a building or our circuit). The WIH paradigm is applied when the scene is something that can virtually be held in the palm of one’s hand and observed from outside, such as a model of a car. In this case the view is fixed and the scene is rotated, scaled and translated to be observed.

In terms of our transformations pipeline, we can think that the CIH is implemented by changing the view transformation, while the WIH is implemented by adding one more transformation between the model and the view transformations, that is, transforming the scene in view space.

4.11.1 Controlling the View with Keyboard and Mouse

If you ever played a first person shooting game you are already familiar with the so called WASD mode, where the keys w, a, s and d are used to move forward, left, backward and right, respectively, and the mouse is used to rotate the view (most often to aim at some enemy).

All we need to do to move the point of view is to add a vector to the origin of the view refererence frame V towards the direction we want to move along. Note that we usually want to specify such a direction, let us call it tv, in the view reference frame itself (e.g., “move to my right”) so we will need to express tv in the world reference frame and then add to the current origin of the frame V:

vo=vo+VR tV          (4.11)

In order to change the orientation of the view reference frame using the mouse, a common strategy is to map mouse movements into Euler angles and rotate the frame axis (VR) accordingly. Typically, the change of orientation is limited to yaw and pitch, therefore we can store the current direction as α and β (following notation of Section 4.5.2) and obtain the z axis of the view reference frame as:

voz=Rxβ  Ryα voz          (4.12)

Then we can build the reference frame as shown in Section 4.5.1.1

v0x=[ 0, 1, 0 ]T×vozvoy=voz×vox          (4.13)

The values for α and β are obtained by converting the mouse movement from pixels, as it is returned by onMouseMove function, to degrees:

α=dxmaxY awwidth, β=dymaxPitchheight

Note that the choice of [0, 1, 0]T guarantees that vox will be parallel to plane XZ, which is what we want in this kind of interaction (no roll). Also note that this choice does not work if we are aiming straight up, which means when voz=[ 0, 1, 0 ]T. This problem is usually avoided by limiting the pitch to 89.9°.

A more elegant implementation is to always update the original view reference frame (and not build a new one). Instead of keeping the total rotation angles α and β we keep the increment and . Then we first rotate VR around the (y) axis of the world reference frame:

VR=Ry dα  VR

Note that rotating a frame (or, more generally, applying any affine transformation to it) is simply done by rotating each of its axes. Then we apply a rotation to VR around its x axis, which is done as shown in Section 4.4:

VR=VR Rxdβ VR 1rotation  around  VR x  VR=Rydα  VR  Rxdβ          (4.14)

4.11.2 Upgrade Your Client: Add the Photographer View

The photographer stays at a position alongside the track and takes pictures of the race. So what we want is the ability to control a view reference frame whose origin is between 50cm to 180cm from the ground. We also want to enable the photographer to lock the camera target to a specific car, which will be very useful to make photos with the technique called panning photography (more on this in Section 10.1.5). Like we did with the ChaseCamera, we create another object that we call PhotographerCamera, but this time the keys and mouse events will affect the camera position and orientation.

Listing 4.6 shows the properties of the camera. Note that this time we do not store the view frame as 4 × 4 matrix. Instead, we have the property this.position (a three-dimensional point) for the origin of the frame and the property this.orientation (a 4 × 4 matrix) for the axes. The only reason for this is that we will always handle position and orientation separately, so it is just more handy to store them explicitly.

The position is updated by the function updatePosition, which is called at every frame. At line 58 we convert the direction of movement from photographer to world space and then add it to the current position, as shown in Equation (4.11), and then clamp the Y value (that is, the height) between 0.5 and 1.8. The direction is determined by the mouse movement in the function mouseMove. At lines 35 and 36 we compute the angles alpha and beta as the horizontal and vertical mouse movements, respectively, and lines 40 to 42 are the implementation of Equation (4.14).

The function setView simply uses the lookAt function to compute the viewing transformation. If the variable lockToCar is true we use the car position as target for the lookaAt function; otherwise we use the colums of the matrix orientation to define the target and the up vector.

 7  function PhotographerCamera() {
 8 this.position = [0, 0, 0];
 9 this.orientation = [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1];
10 this.t_V = [0, 0, 0];
11 this.orienting_view = false;
12 this.lockToCar = false;
13 this.start_x = 0;
14 this.start_y = 0;
15  
16 var me = this ;
17 this.handleKey = {};
18 this.handleKey["Q"] = function () {me.t_V = [0, 0.1, 0];};
19 this.handleKey["E"] = function () {me.t_V = [0, -0.1, 0];};
20 this.handleKey["L"] = function () {me.lockToCar= true;};
21 this.handleKey["U"] = function () {me.lockToCar= false;};
22  
23 this.keyDown = function (keyCode) {
24  if (this.handleKey[keyCode])
25  this.handleKey[keyCode](true);
26}
27  
28 this.keyUp = function (keyCode) {
29  this.delta = [0, 0, 0];
30}
31  
32 this.mouseMove = function (event) {
33  if (!this.orienting_view) return;
34  
35  var alpha = (event.offsetX - this.start_x)/10.0;
36  var beta = (event.offsetY - this.start_y)/10.0;
37  this.start_x = event.offsetX;
38  this.start_y = event.offsetY;
39  
40  var R_alpha = SglMat4.rotationAngleAxis(sglDegToRad(alpha), [0, 1, 0]);
41  var R_beta = SglMat4.rotationAngleAxis(sglDegToRad (beta), [1, 0, 0]);
42  this.orientation = SglMat4.mul(SglMat4.mul(R_alpha, this.orientation), R_beta);
43};
44  
45 this.mouseButtonDown = function (event) {
46  if (!this.lock_to_car) {
47  this.orienting_view = true;
48  this.start_x = event.offsetX;
49  this.start_y = event.offsetY;
50 }
51};
52  
53 this.mouseButtonUp = function () {
54  this.orienting_view = false;
55}
56  
57 this.updatePosition = function (t_V){
58  this.position = SglVec3.add(this.position, SglMat4.mul3(this.orientation, t_V));
59  if (this.position[1] > 1.8) this.position[1] = 1.8;
60  if (this.position[1] < 0.5) this.position[1] = 0.5;
61}
62 
63 this.setView = function (stack, carFrame) {
64  this.updatePosition (this.t_V)
65  var car_position = SglMat4.col(carFrame ,3);
66  if (this.lockToCar)
67  var invV = SglMat4.lookAt(this.position, car_position, [0, 1, 0])
68  else
69  var invV = SglMat4.lookAt(this.position, SglVec3.sub(this.position, SglMat4.col(this.orientation, 2)), SglMat4.col(this.orientation, 1));
70  stack.multiply(invV);
71};
72 }; ;

LISTING 4.6: Setting the view for the photographer camera. (Code snippet from http://envymycarbook.com/chapter4/1/1.js.)

Figure 4.29 shows a snapshot from the client while viewing from the photographer’s point of view.

Figure 4.29

Figure showing adding the photographer point of view. (See client http://envymycarbook.com/chapter4/1/1.html.)

Adding the photographer point of view. (See client http://envymycarbook.com/chapter4/1/1.html.)

4.11.3 Manipulating the Scene with Keyboard and Mouse: the Virtual Trackball

The virtual trackball is a convenient way to map mouse movements to rotation of the scene (or to a single object). The idea is that we have a sphere centered to a fixed point in world space, and we can grab any point of its visible surface and drag it in order to make the sphere and the scene rotate accordingly around the sphere’s center. Figure 4.30 shows how a virtual trackball can be implemented. p0 and p1 are two consecutive positions on screen during mouse movement, and p′0 and p′1 their projections on the sphere. The corresponding rotation is obtained bringing p′0 to p′1 throught the shortest path on the sphere’s surface, which is done by rotating p′0 around the axis (p0c) × (p1c) by an angle

Figure 4.30

Figure showing the virtual trackball implemented with a sphere.

The virtual trackball implemented with a sphere.

θ=asin  (|| (p0c) || || (p1c) |||| (p0c)  (p1c) ||)          (4.15)

We can see several problems with this implementation. The first one is that nothing happens when the projection of the mouse position does not intersect the sphere, which is both unpleasant to see and a source of jolty movements when the projection leaves the sphere on one point to reappear on a faraway point. Furthermore, when the projection approaches the border of the sphere the surface becomes very steep and small mouse movements correspond to big angles. Also, it is not possible to obtain a rotation around the z axis (roll) because all the points will always be in the z positive halfspace.

A naive solution to the first two problems would be to use a sphere so big that its projection covers the whole viewing window. Apart from the fact that this is not always possible because the center of the sphere may be too close to the viewing plane, note that the ratio between mouse movement and angle depends on sphere radius: if it is too big large mouse movements will result in small angle rotation and make the trackball nearly useless.

A common and effective solution consists of using as surface a combination of the sphere and a hyperbolic surface as shown in Figure 4.31:

S={ r2(x2+y2)x2+y2<r2/2r2/2x2+y2x2+y2>=r2/2

Figure 4.31

Figure showing a surface made by the union of a hyperbolid and a sphere.

A surface made by the union of a hyperbolid and a sphere.

This surface coincides with the sphere around the center of the trackball and then smoothly approximates the plane XY. The amount of rotation can still be determined by Equation (4.15) although the more the points p′0 and p′1 are off the center the smaller the angle will be with respect to their distance. Instead we can use:

θ=|| p1p0 ||r

that is, the angle covered by the arc length || p1p0 || (see Figure 4.32, left side).

Figure 4.32

Figure showing the virtual trackball implemented with a hyperbolid and a sphere.

The virtual trackball implemented with a hyperbolid and a sphere.

4.12 Upgrade Your Client: Create the Observer Camera

The improvement we are about to make to our client will be very helpful for the rest of the development. We will add the Observer Camera, a view that allows us to freely fly around with WASD mode and switch to a trackball mode when we want to examine some detail. This is a tool that, during the development of the game, will enable us to examine details of the scene from a privileged point of view. Suppose we want to use WASD mode to move the point of view near the car, then use a virtual trackball to make the world rotate about the car center, to examine it, and then to switch back to WASD mode to go to see some other detail of the scene.

We define the object ObserverCamera, which has all the same member functions as the ChaseCamera and the PhotographerCamera. The implementation of the camera is a mere implementation of the formulas in Sections 4.11.1 and 4.11.3. The only detail left out is what happens when we switch from the trackball mode to the WASD mode. Note the WASD mode works by changing the view frame while the trackball makes the world rotate about a pivot point, that is, without changing the view frame. Such rotation is stored in the matrix tbMatrix, which multiplies the stack matrix right after the view transformation, so that all the scene is rotated. If we use the virtual trackball and then switch to WASD mode we cannot simply ignore the matrix tbMatrix because our view frame has not changed since the beginning of the rotation and it would be like if we did not use the trackball mode at all.

What we do is the following: every time a mouse movement controlling the trackball has ended, we convert the trackball rotation into a camera transformation and reset the trackball transformation to the identity. In other words, we change the view frame to obtain exactly the same transformation as with the trackball. Let us see how this is done. During the trackball manipulation, the generic point p of the scene is brought in view space by:

p=V 1 tbMatrix p

When we switch to WASD mode we want to forget about tbMatrix, but keep the same global transformation, so we need a new view reference frame Vwasd such that:

p=Vwasd 1p=V 1 tbMatrix p

so we have:

Vwasd 1=V 1 tbMatrixVnew=tbMatrix 1V

The listing below shows the implementation of the mouseUp function. orbiting is a Boolean variable that is set to true when the camera is in trackball mode and the mouse button is pressed, and to false when the mouse button is released.

144  this.mouseButtonUp = function (event) {
145 if (this.orbiting) {
146  var invTbMatrix = SglMat4.inverse(this.tbMatrix);
147  this.V = SglMat4.mul(invTbMatrix, this.V);
148  this.tbMatrix = SglMat4.identity();
149  this.rotMatrix = SglMat4.identity();
150  this.orbiting = false;
151}else
152 this.aiming = false;
153 };

Figure 4.33 shows a snapshot from the client while viewing the scene with the Observer Camera.

Figure 4.33

Figure showing adding the Observer point of view with WASD and Trackball Mode.

Adding the Observer point of view with WASD and Trackball Mode. (See client http://envymycarbook.com/chapter4/2/2.html.)

4.13 Self-Exercises

4.13.1 General

  1. Find the affine 2D transformation that transforms the square centered at (0, 0) with sides of length 2 into the square with corners {(2, 0), (0, 2), (−2, 0), (0, −2)} and find its inverse transformation.
  2. Provide a counter example to the following statements: a non-affine transformation is not invertible; translation and rotation are commutative transformations.
  3. Prove that affine transformation maps parallel lines to parallel lines. Hint : express the lines in parametric form as L = p + t d and then apply an affine transformation to both.
  4. Consider the square with unitary sides centered at (0.5, 0.5) and apply the following transformations:

    • rotation by 45°
    • scaling by 0.5 along the y axis
    • rotation by −30°

    Write down the shear transformation that gives the same result.

  5. Describe what happens to the image formed in a pinhole camera when the hole is not punctiform.

4.13.2 Client Related

  1. In order to improve the visualization of the steering of the car, allow the front wheels of the car to rotate by 30° about the y axis on the left and on the right.
  2. In order to tell the front of the car from its back, apply a transformation along the z axis so that the front will look half as wide and half as high as the back. Can you do it with an affine transformation? Hint : draw the transformed version of the box and answer to the question: do parallel lines stay parallel after this transformation?
  3. Add the front view, which is the view looking forward from the car (so the car is no longer visible).
  4. In Listing 4.1 we show how to set the projection matrix so to preserve the aspect ratio. Note that the way we did it works on the assumptions that width/height ratio was greater for the viewport than the wanted viewing window. What happens if we set the viewport to 300 × 400? Will we still see the whole circuit? Fix the code in Listing 4.1 so that it works in every case.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset