The Big Chain of Transforms

On the way through the graphics pipeline, the 3D content undergoes numerous transformations to become a rendered output. This part of the pipeline is what we have covered throughout this section of the book so far as we have examined content creation. The pipeline transformations occur over all the many reference frames in the scene (represented by TransformGroups and Transform3Ds). The basic set of transformations in any 3D graphics pipeline was established early on by Sutherland, who recognized the utility of projective geometry in chaining them together for computational ease. In fact, when we refer to “the 3D graphics pipeline,” for all intents and purposes, we are describing what is frequently called the Sutherland pipeline.

It turns out that the physical environment in which the user exists also corresponds to a kind of Sutherland pipeline going the other way. Because both pipelines converge at a common point, the screen, a common space can be created in which a point in one space maps on to a point in the other space. This space can be considered a coexistence space.

None of this is very important when you simply want to display a 3D model on a single flat screen as we have done throughout the last two chapters. Java 3D makes some basic assumptions about where your head and eyes are and renders the correct view.

However, when you want to use more advanced 3D graphics displays such as stereo viewing or controlling the view with head tracking, you will quickly run into a complex series of transformations that must be used to get the correct view. Encountering these transforms will most likely make you a little nauseated because knowing them all is quite challenging. Nonetheless, a number of simplifying assumptions can be used to make the problem more tractable. In general, there are two basic series of transformations corresponding to two general classes of viewing situations. We describe these next.

Two Fundamental Series of Transforms

Most viewing situations fall into one of two classes depending on whether the display (by this, we mean the physical screen) is head mounted and room mounted. The vast majority of output situations are single screens without head tracking. Such systems are the simplest form of room mounted display. In the case of the head mounted situation, the screens are attached to the head (that is, when the head moves, the screens follow).

Whether the screen is head mounted or room mounted only really makes a difference when head tracking is incorporated. We want to use head tracking for two different purposes, depending on which of the two viewing situations are in play. The following descriptions apply to head tracking setups only. Remember that a typical head tracking setup has two relevant hardware components—the tracker base, a base station that emits and receives data, and the tracker sensor, a measurement device for detecting x, y, z, pitch, roll, and yaw. An excellent review of head tracking hardware and theory was presented at SIGGRAPH 2001 by Danette Allen, Gary Bishop, andAllen, Gary Bishop and Gregory Gregory Walsh from UNC. The course notes are available from

http://cave.cs.nps.navy.mil/Courses/cd1/courses/11/11cdrom.pdf

In the head-mount situation, we want the camera to be slaved to the user's head such that when the user's head moves and rotates, the 3D view of the scene moves with it. Importantly, the relationship between our eyes and the screen does not change in the head-mount situation. Therefore, the projection matrix remains constant. In the head-mount situation, the sensor is rigidly attached to thehead tracking display device.

In the room-mount situation, movement of our head changes the projection matrix but has no effect on the camera. This is only an issue when viewing stereo. With a room mounted display, there is no reason to use head tracking unless you are viewing in stereo. In this case, the tracker base (and not the sensor) is attached to the display (or some central point among multiple displays).

To summarize, when the screen(s) are fixed in the room, Java 3D operates under the assumption that the tracker base is fixed to the display. Thus the room, the screen, and the tracker base all exist in the same reference frame. Alternatively, when the screen(s) are fixed to the user's head, Java 3D computes the transforms assuming that the tracker sensor is attached to the display. In this case the head, display screen, and tracker sensor exist in the same reference.

Java 3D determines which of these two situations to use in rendering by seeing whether the View policy is View.HMD_VIEW or View.SCREEN_VIEW.

Understanding Viewing Through a Remote Telepresence Robot

One way to understand the different viewing situations is to consider a remote controlled video equipped robot. Such a robot was used in psychological and brain imaging investigations during my time at Michigan State University (see Figure 13.1). The robot work described here was developed in collaboration with the Robotics and Automation Lab in the College of Engineering at MSU.

Figure 13.1. Telerobot with control system.


The robot can be configured in a number of ways and makes a nice heuristic model for understanding the Java 3D view model.

The robot is drivable by a remote user from a distant location using a joystick. In the most basic configuration, the robot has a single camera attached to it that continuously streams video to the remote user.

The robot platform corresponds well to the ViewPlatform in Java 3D. Moving the joystick moves the robot (including the camera) around in the remote location, much as we experience when we use a navigation behavior (see Chapter 12, “Interaction with the Virtual World”). The remote user can see the world from the perspective of the robot. The user's eyes are in the world of the robot, and hence everything the user sees is from the robot's cyclopean view.

One important point to note is that what the remote user sees on the screen is determined partially by the properties of the camera lens and partially by the properties of the display device. For example, the cameras could have a wide-angle lens, and the user could have a simple monitor on which to view the video. We want the user to suspend reality and imagine being embodied by the robot. One way to help suspend the user's reality is to project the video so that the objects in the remote world are their proper size in the user's world. In other words, if we know the amount of space that the robot's video camera covers, we can create a nice immersive effect by projecting the image to the user at the same apparent size as he would encounter if he actually were the robot. For example, we could have some very large screens several feet from the user and project large images, or we could have small screens right in front of the user's eyes and project little images thereby achieving roughly the same effect. That, in essence, is the important idea of apparent size.

A number of interesting configurations are possible given sufficient bandwidth, including another simple configuration in which three cameras are mounted on the robot—the original camera pointing straight ahead with two additional cameras rotated 45° to either side within the view plane. The remote user could then sit in a room with three screens and three projectors in back projection mode. Again, in order for the subject to feel more embodied within the robot, we need to maintain the apparent size of the robot's scene.

Adding Immersive Head Tracking to the Robot

Now consider what would happen if we configured the system so that a head tracker is attached to the head of the remote user. Moreover, say that we mount cameras to a 3° of freedom motorized tripod that is slaved to the head tracker of the remote user. Thus, when the remote user moves his head, the camera will rotate around the x, y, and z axis accordingly. (For now, we will leave out the translations.)

Two very different view scenarios occur with this setup. We could have the screen(s) attached to the user's head or attached to the room. If we have a little single screen (say .3mx.2m attached to the user's head, immediately in front of the eyes), we could have a pretty realistic robot's perspective. Even if we don't scale the image to the size of the screen, we will feel “pretty robotic.” A scaling problem will only make us feel big or small.

The virtual analogy of this can easily be seen in the virtual world. In the virtual world, the camera (defined by the view frustum) is slaved to the head tracker. Again, the ViewPlatform always corresponds to the robot itself. Because the tripod and camera are attached to the robot, they form a parent-child relationship, just as the view platform and camera do in the virtual world.

Note that a human would generally return his head to the forward position when walking, although not always. This can be accomplished by incorporating some body reference such as a fake set of shoulders mounted on the robot or by reference to the robot's arm. The body reference is analogous to geometry attached to the ViewPlatform.

Robot View as a Window

The last part of the robot analogy is a little harder to imagine. This is the case in which the screen is not attached to the remote operator's head. We begin by returning to the original one camera cyclopean case with no head tracking. In this case, we are looking at the robot's remote world as if we were sitting on the robot with our faces in front of a large window. This window is defined by the field of view of our camera lens. The viewing volume is analogous to viewing a large fish tank through a small window, much like those underground exhibits at the zoo in which you get a little view of the underground or underwater life of beavers and other such creatures. Note that with a single camera, we get little depth information. Furthermore, if we move our heads relative to the window, it does not matter much for our viewing.

But now imagine what would happen if we had a pair of video cameras streaming video to the remote user. In this case, when we move back and forth, our stereo vision is going to be seriously affected. Each eye is going to have a slightly different and separate view frustum. When the head moves, the view frustum for each eye is going to change differently. Thus if we use the same projection matrix we were using at the start, our stereo cues are going to be invalid. In the case of the robot's video cameras, we are going to translate and rotate the cameras independently to get a proper 3D capture. Indeed, the brain does some remarkable occulomotor control to do just that in the real world. Similar to the brain, our robot will need to use head tracking in order to appropriately change the view frustum.

Attaching the Reference Frames to the Renderer

All that remains is to attach one of the two reference frames described previously to the scene. This is achieved through setting the matrix representing the physical world to a place in the virtual world, and it can be done because we know the transformation between the physical world and coexistence and the transformation from coexistence to the virtual world.

We will give examples of these two situations and how to attach coexistence later. For the moment, we want to discuss why the Java 3D view model is powerful for addressing the many variations of these two basic setups.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset