By combining the last three steps, it is possible to obtain an equation that associates the computer image coordinates (M, N) with the 3-D coordinates (x, y, z) of the object points in the camera system:

λxz=x=x*+Rx=x*(1+kr2)=(MOm)SxLxμMx(1+kr2)(2.39)

λyz=y=y*+Ry=y*(1+kr2)=(NOn)Sy(1+kr2)(2.40)

Taking eq. (2.28) and eq. (2.29) into the above two equations, we get

M=λr1X+r2Y+r3Z+Txr7X+r8Y+r9Z+TzμMx(1+kr2)SxLx+Om(2.41)

N=λr4X+r5Y+r6Z+Tyr7X+r8Y+r9Z+Tz1(1+kr2)Sx+On(2.42)

2.2Stereo Imaging

Stereo imaging techniques are effective for capturing the depth information that is often lost in normal image acquisition processes, because the projection from 3-D to 2-D causes all the world points along the projection line to be converted to an image point. In stereo imaging, two separate images of a scene are captured. These two images can be obtained by using two cameras located in two locations simultaneously or by using one camera that moves from one location to another and takes two images consecutively.

There are several models for stereo imaging. First, a basic model will be described, while its variations will be introduced later.

2.2.1Parallel Horizontal Model

The parallel horizontal model is the basic model used in stereo imaging. Several of its variations are now popularly employed in different applications.

2.2.1.1Disparity and Depth

The parallel horizontal model is depicted in Figure 2.12. Two cameras are located along a horizontal line. Their optical axes are parallel. The coordinate systems of these two cameras are only different by the positions of their origins (i. e., all their corresponding axes are parallel). The distance between their origins is called the baseline of these systems.

For a world point W viewed by two cameras, its projections on two images are located differently in two coordinate systems. Such a difference is called a disparity, and it is this difference that comprises the depth information.

Figure 2.12: Parallel horizontal model.

Figure 2.13: Disparity in the parallel horizontal model.

Taking a profile (corresponding to the XZ plane) of Figure 2.12 and putting the first camera coordinate system coincident with the world coordinate system, the parallel horizontal model can be depicted by Figure 2.13.

In Figure 2.12, the X coordinate of a world point W under consideration is negative. Following eq. (2.1), one equation can be obtained from the first camera coordinate system as

X=x1λ(Zλ)(2.43)

Similarly, another equation can be obtained from the second camera coordinate system as

BX=(x2+B)λ(Zλ)(2.44)

Taking both eq. (2.43) and eq. (2.44) and solving them for Z gives

BλZλ=x1x2+B(2.45)

In eq. (2.45), the absolute value of the right side is just the disparity. Suppose this absolute value is represented by d. The depth Z can be obtained by

Z=λ(1+Bd)(2.46)

In eq. (2.46), both B and λ are determined by the characteristics (position and focus) of the camera system. Once the disparity between the two image points corresponding to the same world point is determined, the distance between the world point and the camera can be easily calculated. With the Z value in hand, the X and Y coordinates of the world point can also be determined.

2.2.1.2Angular-Scanning Model

In the angular-scanning model, the stereo system rotates to capture a panorama image composed of various views. The camera used is called an angle-scanning camera. The principle of such a system can be explained with the help of Figure 2.14. When using an angle-scanning camera to capture images, the pixels are uniformly distributed in the image plane according to the azimuth and elevation of the lens. If the XZ plane is considered the equator plane of the earth and the Y-axis is pointing to the North Pole, the azimuth will correspond to the longitude and the elevation will correspond to the latitude. In Figure 2.14, the azimuth is the angle between the YZ plane and the line connecting the camera center C and the world point W, and the elevation is the angle between the XZ plane and the plane including the X-axis and the world point W.

The distance between the world point and its image point can be represented with the help of the azimuth. According to Figure 2.13, the following two relations can be established

tanθ1=XZ(2.47)

tanθ2=BXZ(2.48)

Combining eq. (2.47) and eq. (2.48) and solving it for Z of point W yields

Z=Btanθ1+tanθ2(2.49)

Equation (2.49) establishes the relation between the distance Z (between a world point and an image point, i. e., the depth information) and the tangents of the two azimuths. Equation (2.49) can also be written in a form like eq. (2.46), because the influences of the disparity and the focus length are all implicitly included in the azimuth. Finally, if the elevation is ϕ, which is common for two cameras, the X and Y coordinates of point W are

Figure 2.14: Stereoscopic imaging by an angular-scanning camera.

X=Ztanθ1(2.50)

Y=Ztanϕ(2.51)

2.2.2Focused Horizontal Model

The two axes of two cameras are not necessarily parallel. They can also be focused at some points. Such a group of models is called the focused horizontal model. Only the case shown in Figure 2.15 is considered as an example. In Figure 2.15, the arrangement in the XZ plane is depicted. Such a camera system rotates one camera coordinate system clockwise and rotates another camera coordinate system counterclockwise. In Figure 2.15, the baseline is still B. Two optical axes are crossed at point (0, 0, Z) with an unknown angle 2θ.

Now consider how to determine the coordinates of a world point W(X, Y, Z), once the image coordinates (x1, y1) and (x2, y2) are known. From the triangle enclosed by the X and Z axes as well as the connecting line between the focus points of the two cameras, it is easy to obtain

Z=Bcosθ2sinθ+λcosθ(2.52)

As shown in Figure 2.15, two perpendicular lines from point W to the two optical axes of the cameras can be drawn, respectively. Since the two angles between these two perpendicular lines to the X-axis are all θ,

Figure 2.15: Disparity in the focused horizontal model.

x1λ=XcosθrXsinθ(2.53)

x2λ=Xcosθr+Xsinθ(2.54)

where r is the unknown distance from any camera center to the focus point of the two cameras. Taking both eq. (2.53) and eq. (2.54) and discarding r and X gives

x1λcosθ+x1sinθ=x2λcosθx2sinθ(2.55)

Substituting eq. (2.55) into eq. (2.52), eq. (2.52) becomes

Z=B2cosθsinθ+2x1x2sinθd(2.56)

Similar to eq. (2.46), eq. (2.56) relates also the distance Z between the world point and the image plane with a disparity d. However, to solve eq. (2.33), only d(x1x2) is needed, while to solve eq. (2.56), the values of x1 and x2 are also needed. On the other hand, it is seen from Figure 2.15 that

r=B2sinθ(2.57)

Substituting eq. (2.44) into eq. (2.40) or eq. (2.41) gives

X=B2sinθx1λcosθ+x1sinθ=B2sinθx2αcosθx2sinθ(2.58)

2.2.3Axis Model

In the axis model, two cameras are arranged along the optical axis. One camera captures the second image after the first image by moving along the optical axis. The second image can be obtained at a point closer or further away from the world point than the first image, as shown in Figure 2.16. In Figure 2.16, only the XZ plane is depicted. The first camera coordinate system is coincident with the world coordinate system and the second camera moves toward the world point. Both image coordinate systems are coincident with their corresponding camera coordinate systems. The only difference between the two camera coordinate systems is the difference along the Z-axis, ΔZ.

For each camera, eq. (2.1) can be applied. This gives (only the coordinate X is considered here, as the computation for Y would be similar)

Figure 2.16: Disparity in the axis model.

Xx1=Zλλ(2.59)

Xx2=ZλΔzλ(2.60)

Combining eq. (2.59) and eq. (2.60) gives

X=ΔZλx1x2x1x2(2.61)

Z=λ+ΔZx2x2x1(2.62)

Compared to the parallel horizontal model, the common viewing field of the two cameras in the axis model is just the viewing field of the second camera (near the world point). Therefore, the boundary of the common viewing field can be easily determined. In addition, since the camera moves along the optical axis, the occlusion problem can be avoided. These factors suggest that the axis model would be less influenced by the uncertainty caused by the corresponding points than the horizontal model.

2.3Image Brightness

An image pattern is often a brightness pattern in which the brightness values of images represent different properties of a scene. To quantitatively describe image brightness, some physical parameters should first be introduced.

2.3.1Luminosity

Radiometry measures the energy of electromagnetic radiation. A basic quantity in radiometry is a radiation flux (or radiation power), whose unit is W. Light is a kind of electromagnetic radiation. Visible light has a spectrum ranging from 400 nm to 700 nm. Photometry measures the energy of light radiation. In photometry, radiation power is measured by luminous flux, whose unit is lm.

2.3.1.1Point Source and Extended Source

When the scale of a light source is sufficiently small or it is so distant from the observer that the eye cannot identify its form, it is referred to as a point source. The luminous intensity I of a point source Q emitted along a direction r is defined as the luminous flux in this direction within a unit of solid angle (its unit is sr), as shown in Figure 2.17(a).

Take a solid angle element d K with respect to the r-axis and suppose the luminous flux in is dΦ. The luminous intensity that emits from the point source along the r direction is

I=dΦdΩ(2.63)

The unit of luminous intensity is cd (1 cd = 1 lm/sr).

Real light sources always have some finite emitting surfaces. Therefore, they are also referred to as extended sources. On the surface of an extended source, each surface element dS has a limited luminous intensity dI along the r direction, as shown in Figure 2.17(b). The total luminous intensity of an extended source along the r direction is the sum of the luminous intensities of all the surface elements.

2.3.1.2Brightness and Illumination

In Figure 2.17(b), suppose the angle between the r direction and the normal direction N of surface dS is θ. When an observer looks along the r direction, the projected surface area is dS′= dS cos θ. The brightness B of a surface element dS, looked at along the r direction, is defined as the luminous intensity of the unit-projected surface area along the r direction. In other words, brightness B is the luminous flux emitted from a unit of solid angle by a unit-projected surface

BdIdSdIdScosθdΦdΩdScosθ(2.64)

The unit of brightness is cd/m2.

Figure 2.17: Point source and extended source of light.

The illumination of a surface illuminated by some light source is defined as the luminous flux on a unit surface area. It is also called irradiance. Suppose the luminous flux on a surface dS is dΦ, then the illumination on this surface, E, is

E=dΦdS(2.65)

The unit of illumination is lx or lux (1 lx = 1 lm/m2).

2.3.1.3Factors Influencing the Density of an Image

There are a number of factors that determine the electrical strength of the signal forming the image of an object in a real scene (Joyce, 1985):

(1)Reflectivity: The reflectivity actually is what people are trying to measure. In the case of an object being viewed in transmitted light, both the thickness and the light-absorbency of the object will have an influence.

(2)Brightness of the light source and the efficiency of the optical train carrying the illumination to the specimen (e. g., condenser lenses, filters, and apertures in microscopic imaging systems).

(3)Absorption and reflection: In the image-forming part of the optical train, photons will be absorbed by the optical elements, and will also be reflected by various surfaces that they encounter. These reflections can end up in unwanted places, and the light from the particular portion of the object under consideration may be considerably enhanced by reflections arising from different parts of the object (glare).

(4)Conversion: The photo hits the light-sensitive surface of the acquiring devices (e. g., CCD) and its energy is converted to electrical energy in a linear or nonlinear manner.

(5)Amplification: The output from the acquiring device is then amplified, again in a nonlinear manner.

(6)Digitization: The signal is then digitized. The output of the digitizer may be passed through a look-up table converter in which the output has some predetermined function of the input.

The presence of these influencing factors means that the density of the images needs to be interpreted very carefully.

2.3.2A Model of Image Brightness

An image can be taken as a 2-D bright function f(x, y), where the brightness is a measurement of radiation energy. Therefore, f(x, y) must be nonzero and finite

0<f(x,y)<(2.66)

When capturing an image from a real scene, the brightness of the image is determined by two quantities: the amount of light incident to the viewing scene and the amount of light reflected by the object in the scene. The former is called the illumination component and is represented by a 2-D function i(x, y). The latter is called the reflection component and is represented by a 2-D function r(x, y). i(x, y) is determined by the energy emitted by the light source and the distance between the source and the viewing scene (here, a point source is considered). r(x, y) is calculated by the fraction of the reflection over incidence, which is determined by the surface property of the object. Some example values of typical surfaces are 0.01 for black velvet, 0.65 for stainless steel, 0.90 for a silver plate, and 0.93 for white snow. The value of f(x, y) should be propositional to i(x, y) and r(x, y). This can be written as

f(x,y)=i(x,y)r(x,y)(2.67)

According to the nature of i(x, y) and r(x, y), the following two conditions must be satisfied

0<i(x,y)<(2.68)

0<r(x,y)<1(2.69)

Equation (2.68) means that the incident energy each time is greater than zero (only considering the case where radiation arrives at a surface) and it cannot go to infinite (for a physically realizable situation). Equation (2.69) means that the incident energy is always bound by 0 (total absorption) and 1 (total reflection).

The value of f(x, y) is often called the gray-level value at (x, y), and can be denoted as g. Following eq. (2.67) to eq. (2.69), the gray-level values of f(x, y) are also bound by two values: Gmin and Gmax. For images captured differently, both Gmin and Gmax vary. The restriction for Gmin is that Gmin must be positive if there is illumination. The restriction for Gmax is that Gmax must be finite. In a real application, the gray-level span [Gmin, Gmax] is always converted to an integer range [0, G]. When an image is displayed, the pixel with g = 0 is shown as black and the pixel with g = G is shown as white. All intermediate values are shown as shades from black to white.

2.4Sampling and Quantization

Corresponding to the two parts in f(x, y), the process of image acquisition consists of two parts:

(1)Geometry: Used to determine from where the image position (x, y) comes from in a 3-D scene.

(2)Radiometry (or photometry): Used to determine how bright a point (x, y) is in an image, and what the relationship of the brightness value with respect to the optical property of a 3-D point on the surface is.

When using an analogue image to obtain a digital image, two processes must be performed: sampling and quantization. The former establishes the size of an image (both the range of x and y), while the latter establishes the span values of f (the dynamic range of f).

2.4.1Spatial Resolution and Amplitude Resolution

An image f(x, y) must be digitized both in space and in amplitude to be processed by computers. The sampling is the process of the digitization of the spatial coordinates (x, y), and the quantization is the process of the digitization of the amplitude f.

Suppose that F, X, and Y are integer sets, fF, xX, yY. Sampling a continuous image can be accomplished by taking equally spaced samples in the form of a 2-D array. A spatially digitized image with a spatial resolution of N × N consists of N2 pixels. Quantization of the above spatially digitized image can be accomplished by assigning equally spaced gray levels to each element in the image. The gray-level resolution (amplitude resolution, in general) is determined by the range of values for all pixels. An image with a gray-level resolution G has G distinct values.

For processing an image by computers, both N and G are taken as the power of 2, given by

N=2n(2.70)

G=2k(2.71)

Example 2.4 The spatial resolutions of several display formats

The spatial resolutions of several commonly used display formats are as follows:

(1)SIF (Standard Interface Format) in an NTSC system has the spatial resolution of 352 × 240. However, SIF in the PAL system has the spatial resolution of 352 which is also the spatial resolution of CIF (common intermediate format). QCIF (quarter common intermediate format) has a spatial resolution of 176 × 144.

(2)VGA: 640 × 480; CCIR/ITU-R 601: 720 × 480 (for NTSC) or 720 × 576 (for PAL); HDTV: 1440 1152 or 1920 × 1152.

(3)The screen of a (normal) TV has a length/high ratio of 4:3, while the screen of an HDTV is 16:9. To display an HDTV program on the screen of a (normal) TV, two formats can be used (as shown in Figure 2.18). One is the double-frame format, which keeps the original ratio. The other is the whole-scan format, which only intercepts a part of the original program along the horizontal direction. The former format retains the whole view with a reduced resolution; the latter format holds only a part but with the original resolution of this part. For example, suppose that a TV has the same height as an HDTV, it needs to receive the HDTV program with a spatial resolution of 1920 × 1080. If the double-frame format was used, the resolution would be 1440 × 810. If the whole scan format was used, the resolution would be 1440 × 1080.

Figure 2.18: Displaying HDTV program on the screen of a normal TV.

The data space needed to store an image is also determined by spatial resolution and amplitude resolution. According to eq. (2.70) and eq. (2.71), the number of bits needed to store an image is

b=N2k(2.72)

A digital image is an approximation of an analogue image. How many samples and gray levels are required for a good approximation? Theoretically, the larger the values of N and k, the better the approximation. Practically, the storage and the processing requirements will increase very quickly with the increase of N and k. Therefore, the values of N and k should be maintained on a reasonable scale.

Example 2.5 Storage and processing of image and video A 512

A 512 × 512 image with 256 gray levels requires 2,097,152 bits of storage. A byte consists of 8 bits, so the above image needs 262,144 bytes of storage. A 1024 × 1024 color image needs 3.15 Mbytes to store. This requirement equals the requirement for a 750-page book. Video is used to indicate image sequence, in which each image is called a frame. Suppose that a color video has a frame size of 512 × 512, then the data volume for 1 second of video would be 512 × 512 ×8 × 3 × 25 bits or 19.66 Mbytes.

To process a color video with a frame size of 1024 × 1024, it is needed to process × 512 × 8 × 3 × 1024 × 1024 × 8 × 3 × 25 ≈ 78.64 Mbytes of data. Suppose that for each pixel ten floating-point operations (FLOPS) are required. One second of video needs nearly 1 billion FLOPS. Parallel processors can increase the process speed by simultaneously using many processors. The most optimistic estimation suggests that the processing time of a parallel process can be reduced to (ln J)/J with respect to that of a sequential process, where J is the number of parallel processors (Bow, 2002). According to this estimation, if 1 million processors are used to treat one second of video, each processor still needs to have the capability of nearly a hundred million FLOPS.

2.4.2Image Quality Related to Sampling and Quantization

In the following, the relationship of image quality with respect to sampling and quantization is discussed. Spatial resolution and amplitude resolution are directly determined by sampling and quantization, respectively. Therefore, the subjective quality of images will degrade with respect to a decrease of the spatial resolution and the amplitude resolution of images. Three cases are studied here in which only the cases with sampling at equal intervals and uniform quantization are considered.

2.4.2.1The Influence of Spatial Resolution

For a 512 × 512 image with 256 gray levels, if the number of gray levels are kept while reducing its spatial resolution (by pixel replication), a checkerboard effect with graininess will be produced. Such an effect is more visible around the region boundary in images and more important when the spatial resolution of images becomes lower.

Example 2.6 Effect of reducing spatial resolution

Figure 2.19 shows a set of images with different spatial resolutions. Figure 2.19(a) is a 512 × 512 image with 256 gray levels. Other figures are produced by keeping the gray levels while reducing the spatial resolution in both the horizontal and vertical directions to half of the previous image. The spatial resolution of Figure 2.19(b) is 256 × 256, Figure 2.19(c) is 128 × 128, Figure 2.19(d) is 64 × 64, Figure 2.19(e) is 32 × 32, and Figure 2.19(f) is 16 × 16.

Figure 2.19: The effects of reducing the number of pixels in an image.

The effects induced by reducing the spatial resolution of images appear in different forms in these images. For example, the serration at the visor in Figure 2.19(b), the graininess of hairs in Figure 2.19(c), the blurriness of the whole image in Figure 2.19(d), the nearly unidentifiable face in Figure 2.19(e), and the hardly recognizable item in Figure 2.19(f).

2.4.2.2The Influence of Amplitude Resolution

For a 512 × 512 image with 256 gray levels, keeping its spatial resolution while reducing its number of gray levels (by combining two adjacent levels to one), a degradation of the image quality will be produced. Such an effect is almost not visible when more than 64 gray levels are used. Further reducing of the number of gray levels will produce some ridge-like structures in images, especially around the areas with smooth gray levels. Such a structure becomes more sizeable when the number of gray levels is reduced. This effect is called false contouring and is perceived for an image displayed with 16 or less gray levels. Such an effect is generally visible in the smooth areas of an image.

Example 2.7 Effects of reducing amplitude resolution

Figure 2.19 shows a set of images with different amplitude resolutions. Figure 2.20(a) is the 512 × 512 image with 256 gray levels as in Figure 2.19(a). Other figures are produced by keeping the spatial resolution while reducing the number of gray levels. The number of gray levels is 64 in Figure 2.20(b), 16 in Figure 2.20(c), 8 in Figure 2.20(d), 4 in Figure 2.20(e), and 2 in Figure 2.20(f).

The effects of reducing the number of gray levels are hardly noticed in Figure 2.20(b), but they start to make an appearance in Figure 2.20(c). In Figure 2.20(d), many false contours can be seen in the cap, shoulder, etc. Those effects are very noticeable in Figure 2.20(e), and Figure 2.20(f) looks like a woodcarving.

Figure 2.20: The effects of reducing the number of gray levels.

2.4.2.3The Influence of Spatial Resolution and Amplitude Resolution

The above two examples show the influences of spatial resolution and amplitude resolution, separately. Experiments consisting of subjective tests on images with varying spatial resolutions and amplitude resolutions have shown (Huang, 1965) the following:

(1)The quality of an image decreases with the reduction of spatial and amplitude resolutions. Only in a few cases where there is a fixed spatial resolution, will the reduction in amplitude resolution improve the quality of images.

(2)For images with many details, only a few numbers of gray levels are sufficient to represent them.

(3)For various images represented by the same number of gray levels, their subjective qualities can be quite different.

Example 2.8 Effect of reducing spatial and amplitude resolutions

Figure 2.21 shows a set of images with varying spatial and amplitude resolutions. Figure 2.21(a) is a 256 × 256 image with 128 gray levels. Figure 2.21(b) is a 181 128 image with 64 gray levels. Figure 2.21(c) is a 128 × 128 image with 32 gray levels. Figure 2.21(d) is a 90 × 90 image with 16 gray levels. Figure 2.21(e) is a 64 × 64 image with 8 gray levels, and Figure 2.21(f) is a 45 × 45 image with 4 gray levels.

Figure 2.21: The effects of reducing both the number of pixels and gray levels.

Comparing Figure 2.21 to Figure 2.19 and Figure 2.20, it is seen that the degradation of the image quality is decreased more quickly if both the number of pixels and gray levels are reduced.

2.4.3Sampling Considerations

It is clear that sampling plays an important role in digital image acquisition. In the following, a more theoretical basis for this phenomenon will be discussed.

2.4.3.1Sampling Theorem

Functions whose area under the curve is finite can be represented in terms of the sins and cosines of various frequencies (Gonzalez and Woods, 2002). The sine/cosine component with the highest frequency determines the highest “frequency content” of the functions. Suppose that this highest frequency is finite and that the function is of unlimited duration (these functions are called band-limited functions). Then, according to the Shannon sampling theorem, if the function is sampled at a rate (equal to or) greater than twice its highest frequency, it is possible to recover completely the original function from its samples.

Briefly stated, the sampling theorem tells us that if the highest frequency component in a signal f(x) is given by w0(if f(x) has a Fourier spectrum F(w), f(x) is band-limited to frequency w0 if F(w) = 0 for all |w| > w0), the sampling frequency must be chosen such that ws > 2w0 (note that this is a strict inequality).

If the function is under-sampled, a phenomenon called aliasing corrupts the sampled image. The corruption is in the form of additional frequency components introduced into the sampled function. These are called aliased frequencies.

The sampling process can be modeled by

f^(x)=f(x)n=n=+δ(xnx0)(2.73)

which says that sampling is the multiplication of the signal f(x) with an ideal impulse train with a spacing of x0. The spacing is related to the sampling frequency by ws = 2π/x0.

Equation (2.73) can be rewritten as

f^(x)=n=n=+f(x)δ(xnx0)=n=n=+f(nx0)δ(xnx0)(2.74)

The set of samples can now be identified as {fn} = {f(nx0)|n = –∞, . . ., –1, 0, +1, . . ., + ∞}. An infinite number of samples are necessary to represent a band-limited signal with the total fidelity required by the sampling theorem. If the signal were limited to some interval, say x1 to x2, then the sum in eq. (2.74) could be reduced to a finite sum with N samples where N ≈ (x2x1)/x0. The signal, however, could not be band-limited, and the sampling theorem could not be applied.

The above argument just shows that for any signal and its associated Fourier spectrum, the signal can be limited in extent (space-limited) or the spectrum can be limited in extent (band-limited) but not both.

The exact reconstruction of a signal from its associated samples requires an interpolation with the sinc(•) function (an ideal low-pass filter). With an impulse response h(x), the reconstruction result is given by

f(x)=n=n=+f(nx0)h(xnx0)=n=n=+f(nx0)sin(ws(xnx0))ws(xnx0)(2.75)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset