Chapter   | 19 |

Introduction to image quality and system performance

Sophie Triantaphillidou

All images © Sophie Triantaphillidou unless indicated.

INTRODUCTION

In this chapter we will review the quantification of image quality, as carried out by imaging scientists, designers and engineers, who are concerned with the design, implementation and manufacturing of imaging systems.

The quantification of image quality is often related to a number of image quality attributes, also referred to as image quality dimensions, which contribute to the overall impression of the quality of images. Many of these attributes have already been mentioned in previous chapters (e.g. resolution, sharpness). Here we will clearly identify them and list the most useful objective imaging performance measures employed in their quantification. In Chapters 2024 we will discuss each of these extensively, their purpose and their individual measurement for both analogue and digital imaging systems.

Later in the chapter we will introduce qualitative methods such as image psychophysics and psychometric scaling. These methods are employed to quantify the quality of images, to derive thresholds of visibility of image defects or artefacts and just-noticeable differences between image stimuli. Lastly, there will be an introduction to image quality metrics and models, designed to predict the visual impression of image quality.

SUBJECTIVE AND OBJECTIVE IMAGE QUALITY

Figure 19.1 illustrates a typical imaging chain. Every component in the chain contributes to the overall quality of the image. In many cases, the purpose of image quality measurement is to quantify the relationship between the scene and the image (the input and the output in the imaging chain respectively). However, often the ‘scene’ is not at our disposal, or it is even irrelevant to the preferred reproduction, which might include purposeful distortion; thus we may simply quantify the quality of the produced image. At the end of the imaging chain there is the observer, who makes subjective judgements on the quality of the image.

Before we discuss methods employed in the evaluation of image quality, first let us define it: strictly speaking, image quality is the subjective impression of goodness that the image conveys. Amongst many well-known image scientists, Jacobson pointed out the fact that the perceived quality is very difficult to define. Engeldrum has defined image quality as ‘the integrated set of perceptions of the overall degree of excellence of an image’ and Keelan as ‘an impression of its (the image’s) merit or excellence, as perceived by an observer neither associated with the act of photography, nor closely involved with the subject matter depicted’. All given definitions point to the fact that, ultimately, image quality is a subjective attribute that relates to human perception, memory, experience, etc. After all, it is the human observer that decides whether an image looks good. If we now revisit Figure 19.1 we establish that the objective relationship between the input and the output in the imaging chain has to correlate with the subjective (observer’s) judgement of the image.

image

Figure 19.1   A typical imaging chain.

Image quality assessment

In order to produce reliable measurements it is necessary to understand the factors influencing image quality. The subjective assessment of image quality is a function of the human visual system (HVS), as well as the quality criteria of the observer. The HVS is highly dependent on the viewing conditions as well as the physical properties of the test stimuli, i.e. the test images used in the evaluation. The quality criteria of the observer are based on various cognitive factors, such as memory, influences, experience, expectations and many more, and result in a variation of assessments amongst individuals as well as temporal variations for any one individual. Different observers may prefer slightly different colour reproduction of the same subject, or the same observer may alter their preference over time because of, for example, contemporary trends in colour printing. The purpose and context in which the image is to be used also influences the subjective assessment of image quality. A photograph printed in a newspaper is usually of a ‘good-enough’ quality for illustrative purposes, but should the same relatively low-quality print be displayed in an exhibition, its quality would probably be judged as disappointing.

Measurements relating to image quality assume that there is a functional relationship between the subjective impression of image quality and some selected attributes of the observed stimuli (introduced later). This assumption is based on experience gained from psychophysical experiments, the purpose of which is to provide quantification of qualitative attributes (see later section on image psychophysics). Psychophysical tests are performed under controlled viewing conditions and measure subjective image quality, or attributes of it, using a panel of observers and statistical analysis to quantify the observers’ responses. Objective image quality, on the other hand, employs performance measurements, carried out on images or imaging systems. Useful performance measurements are those which correlate successfully with the subjective impression of image quality. Further, computation and modelling is used to produce complex objective measures, such as image quality models and metrics. These often incorporate models of the HVS along with performance measurements taken from images or systems (see later section on image quality metrics/models).

Basic image quality attributes (or dimensions)

The appearance of images has traditionally been considered to be influenced by five basic image quality attributes:tone (or contrast), colour, resolution, sharpness and noise. These main attributes, also referred to as image quality dimensions, along with the associated visual descriptors, are summarized in Table 19.1. They are assessed using either subjective or objective measurements, collectively (i.e. assessments of the overall image quality) or individually. They are measured either purposefully by image quality evaluators, or considered unconsciously when we look at images. The individual subjective assessment of these attributes is a subject of debate, since many have argued that judgements of image attributes are unlikely to be independent from other attributes, while the relationship between them has been studied extensively. Figure 19.2 demonstrates an example of the interrelationship between resolution, sharpness and contrast in perceived image quality.

Tone reproduction is concerned with the reproduction of intensity and intensity differences between the original and the image, i.e. objective tone reproduction, as well as with the observer’s impression of these qualities, i.e. subjective tone reproduction. The reproduction of tones is the most important attribute of image quality, being a critical component of the subjective impression of both the excellence of an image and the fidelity of the reproduction. Its importance lays in the fact that the achromatic visual channel carries the majority of the visual information (see Chapters 4 and 5). Other image quality attributes, such as colour, resolution and sharpness, are governed by the contrast of the image and their measurement relies on optimum tone reproduction. As we saw in Chapter 8, the objective tone reproduction of silver-halide-based photographic media is evaluated sensitometrically from the characteristic curve. In a similar fashion, the relationship between input and output intensities is plotted for the evaluation of the tone reproduction of electronic devices. Such plots are referred to generally as transfer functions. Chapter 21 introduces the theory of tone reproduction and measures related to digital imaging.

Table 19.1   Image attributes examined in image quality assessments and associated perceptual attributes

IMAGE ATTRIBUTE

VISUAL DESCRIPTION

Tone

Macroscopic contrast or reproduction of intensity

Colour

Reproduction of brightness or lightness, colourfulness or chroma and hue

Resolution

Reproduction of fine detail

Sharpness

Microscopic contrast or reproduction of edges

Noise

Spurious information

image

Figure 19.2   Image (a) was photographed with a lens with poor resolution but has increased contrast. Image (b) was photographed with a lens with good resolution but has less contrast than (a). Image (b) may look sharper than (a) from reading distance, but once the viewing distance is increased, (a) will look sharper than (b) due to its increased contrast.

© Carl Zeiss

The objective evaluation of colour reproduction is achieved via densitometry, spectral colour definition, colorimetry or colour appearance modelling. In Chapter 5, basic colour theory and the objectives of colour reproduction were introduced. Which objective is appropriate for a given application depends on various criteria, including the purpose of the reproduction, restrictions imposed by physical limitations of imaging systems, illumination and viewing constraints. Chapter 22 discusses extensively the colour reproduction of silver-based media. Digital colour imaging media function in their own intrinsic colour dimensions. These vary, from one type of medium to another (e.g. cathode ray tube (CRT) vs. liquid crystal display (LCD)), also from one device to another (e.g. LCD vs. digital camera), resulting in somewhat arbitrary colour reproduction, when colour characterization and management between devices and media is not involved in the image making. Chapter 23 deals with the colour reproduction of digital imaging devices and its evaluation, and Chapter 26 is dedicated to digital colour management.

Resolution is a spatial image attribute. It is concerned with the ability of a system to reproduce fine detail, i.e. high spatial frequency information in images. This ability is a function of the number of basic image points, or point spread functions (PSFs) per unit distance – in other words, a function of the size of a basic image point (see Chapter 7). Traditionally, resolution in photographic and optical systems is determined by measuring the resolving power, in line pairs per mm, using bar targets and visual estimates with the aid of a microscope. The estimation of resolving power is critically dependent on the contrast of the target and each stage in the evaluation (i.e. microscope, visual acutance and experience of the evaluator). Also, resolving power measurements cannot be cascaded though the imaging components of a chain. Other methods involve measuring the limiting resolution from the PSF of the system, or deriving it from the system’s modulation transfer function (MTF) (see Chapter 24). It is worth noting that, in digital imaging systems, resolution often refers to the number of available pixels per unit distance, as in dots per inch (dpi or ppi) for input devices and printers, or to the addressable resolution (or addressability) in displays, i.e. the number of points that are addressed by the graphics card adaptor. This should not be confused with the system’s ‘true’ resolution relating to the effective size of the smallest image point formed by the system.

Like resolution, sharpness is a spatial attribute concerned with definition, more specifically the definition of edges. Objective measures of sharpness, however, take into account the microscopic image contrast and thus correlate with the subjective impression of definition, i.e. the impression received by an observer viewing well-resolved image elements. Although resolution and sharpness are very closely related, there are cases where images of low resolution may appear sharper than images of high resolution due to their increased contrast (see example in Figure 19.2). Traditionally, in photographic and optical systems, objective sharpness has been evaluated by measuring either the acutance or the system’s MTF. Although acutance measurements produce a single figure of merit for assessing image sharpness, the MTF has been a far more successful measure, since it is a function describing the reproduction of micro-image contrast at all available spatial frequencies and falls to some threshold value, which is considered as the limiting system resolution. Today, MTF is the dominant measure in the evaluation of sharpness in all imaging devices. The theory of MTF is strictly applicable to linear and stationary systems, and allows the determination of the overall system MTF from the MTF of the individual imaging components (see Chapter 7). It is necessary to compensate for various non-linearities present in the system for its accurate evaluation. This can be achieved by correcting for input-to-output (transfer) non-linearities or restricting the test target to a very low contrast, i.e. an intensity range where the system is assumed to behave linearly. Chapter 24 discusses in detail objective methods for evaluating image sharpness and MTF measurements. Figure 19.3 demonstrates the fact that although the limiting resolution of two systems might be similar, their sharpness may differ greatly.

Image noise is defined as the intensity fluctuations around a mean signal which is introduced by the imaging system. Noise obscures image features, particularly fine detail, but it is more visible in uniform areas, i.e. areas of low frequencies. It is generally objectionable: as image noise increases, image quality decreases monotonically. It is inevitable in all imaging systems, including the HVS, due to the statistical nature of light. Theoretically, it is random and signal independent, but in real systems it is semi-random because it is often signal dependent; that means it is often controlled by the input intensity. There are various types of distortion referred to as non-random noise, such as pixel-to-pixel non-uniformities in an imaging sensor, or CRT spot misposition produced by inaccurate digital-to-analogue converters. In silver-based photographic media, noise is the result of the binary process of sensitization of silver grains, whilst sources of photographic noise include grain clusters resulting from the random distribution of grains, fog and development effects. In electronic sensors common sources are electronic, photoelectronic and quantization noise. The subjective impression of image non-uniformities due to noise is termed graininess or noisiness. The objective aspect in terms of spatial variations in the image is termed granularity (an expression that has its origins in traditional photographic media and is not used to describe electronic noise) or simply noise. Noise can be objectively evaluated statistically, or in terms of its components at different spatial frequencies (see Chapter 24).

image

Figure 19.3   Systems a and b have approximately the same limiting resolution (indicated with the red arrows), but different image sharpness.

Adapted from Burns (2006)

Imaging performance measures related to the attributes discussed above are presented in Table 19.2.

The image quality circle

To simplify the understanding of image quality, the relationship between the perception of image quality attributes and related objective or physical measures (or physical image parameters, examples in Table 19.2), technological imaging system parameters and observer’s preference, Engeldrum (2000) has proposed a step-by-step approach called the image quality circle (IQC). According to this approach, the goal of an imaging designer is to relate the technological variables of the imaging system to the quality preference of the observer and, in the case of a commercial system, to the preference of the customer. The IQC, as illustrated in Figure 19.4. breaks the relationship down into a series of measurable steps. It contains four elements (shown in grey rectangles), which are linked to each other via models and algorithms (shown in white rectangles).

Table 19.2   Imaging performance measures relating to the objective evaluation of imaging systems

IMAGE ATTRIBUTE

MEASURES

Tone

Characteristic curve, density differences, transfer function and OECF, contrast, gamma, histogram, dynamic range

Colour

Spectral power distribution, CIE tristimulus values, colour appearance values, CIE colour differences

Resolution

Resolving power, imaging cell, limiting resolution

Sharpness

Acutance, ESF, PSF, LSF, MTF

Noise

Granularity, noise power spectrum, autocorrelation function, total variance (σ2TOTAL)

Image content and efficiency

Information capacity, entropy, detective quantum efficiency

image

Figure 19.4   The image quality circle (IQC). The key elements are in grey rectangles and the connecting links in white rectangles.

Adapted from Engeldrum (2000)

The four elements of the IQC approach are:

•  Customer quality preference is defined as the overall image quality, as judged or rated by observers.

•  Customer perceptions are the subjective perceptions of the basic image quality attributes or dimensions introduced above. These are often called the ‘ness’ of the attribute, for example sharpness, graininess, etc.

•  Physical image parameters are measurable system parameters we ascribe to image quality, such as the physical measures in Table 19.2.

•  Technology variables are the parameters of the system that can affect image quality. These are governed by physical constraints. System designers and manufacturers manipulate them in the best possible way to change the quality of their systems. Examples include: the size of a pixel, which affects resolution; and the choice of camera colour filter arrays, which affects colour reproduction.

These four elements are linked to one another via models and algorithms as shown in Figure 19.4:

•  Image quality models are empirical (statistical) models that link the customer perceptions (e.g. sharpness, graininess) to quality preferences or overall image quality.

•  Visual algorithms are used to compute a value of the subjective perception, such as sharpness, from objective quality measures, i.e. imaging performance measures, such as the gradient of an edge. In the context of IQC, ‘visual’ implies that the spatial properties and the nonlinear response aspects of the HVS are incorporated in the algorithm.

•  System models are analytical models that predict the physical image parameters from the technology variables. One example might be the model of the system’s point spread function developed by knowing technology variables such as the pixel aperture and the sampling interval.

Further image quality attributes

Apart from the five basic image quality attributes introduced earlier in this chapter, there are a large number of other attributes that are related either to the objective quantification of systems or the subjective quantification of images. Some frequently employed ones are discussed in this section.

One way to look at the objective quality of an image is in terms of the amount of information it contains. Information theory deals with the objective quantification of information and was originally applied to communication channels. It has been used extensively in quantifying information in images, or the information that imaging systems convey. Related measures, such as the information capacity and entropy, have been shown to correlate with quality and image usefulness.

Another physical attribute that relates to image sensors, in particular, is their effectiveness in recording light signals. The detector’s speed is related to how fast the detector reacts to light, while sensitivity and detective quantum efficiency are related to how sensitive to light the detector is. Measures relating to these attributes are discussed in Chapters 20 and 24.

In addition to attributes introduced up to this point, which relate to physical quantities in systems and have objective measures associated with them, two visuo-cognitive attributes are commonly used - in the evaluation of image quality: usefulness and naturalness. These attributes relate to the ‘successful interpretation’ of the image, which is based on the idea that the interpretation process should result in a satisfactory match between visual representation and knowledge of reality. Usefulness is defined as the precision of the visual representation of the image. Naturalness is defined as the degree of correspondence between the visual representation and the knowledge of reality, as stored in memory. Thus, in this context a ‘good-quality’ image implies that the visual representation of the image should be sufficiently precise, and the correlation between the visual representation and knowledge of reality as stored in memory should be high. The usefulness and naturalness of images are only evaluated subjectively. Objective measures indicating good image sharpness, accurate tone reproduction or only small shifts in the reproduced hue have been shown to correlate successfully with both the usefulness and naturalness in images.

DIGITAL IMAGE ARTEFACTS

Digital images suffer from artefacts that, generally, cannot be classified in a conventional manner or quantified using classical objective measures, as, for example, those listed in Table 19.2. Digital image artefacts can be considered as ‘noise’, since they are information originating from the system – they are not components of the signal – and are objectionable, i.e. their presence decreases image quality. The most common are listed in Table 19.3, along with their origins and image areas that are more susceptible to each artefact. Based on their origins they can broadly be classified as follows: artefacts 1–3 are due to the nature of the digitization process, which involves spatial sampling and quantization; artefacts 4–8 result from digital image processes, such as image compression and sharpening (Chapters 2729); artefacts 9–11 are types of non-isotropic noise, often originating from errors in digital printing; artefacts 12 and 13 are also common to analogue systems.

Table 19.3   Common digital image artefacts, their sources and image areas which are more susceptible to each artefact

image

There are a few objective measures dealing with the measurement of individual artefacts, such as the evaluation of colour misregistration in digital input devices via the spatial frequency response (SFR; Chapter 24), or several image quality metrics designed specifically to quantify the effect of compression artefacts. In the majority of cases, however, the impact of digital image artefacts is evaluated subjectively, by determining visual thresholds or by measuring the decrease in overall image quality due to their presence.

DISTORTION, FIDELITY AND QUALITY

Under the general expression ‘image quality’ three different aspects, all concerned with the assessment of images or imaging systems, may be referred to: image distortion, image fidelity and image quality (as defined earlier on page 345 in its true, narrower sense). Although they investigate different aspects of images, the terminology attached to them is often applied inconsistently in the literature. Figure 19.5 summarizes the components involved in the assessment of image distortion, image fidelity and image quality. It also illustrates the points in the imaging chain where related evaluation is carried out.

Image distortion

Image distortion simply assesses physical differences in images, for example differences in pixel values, densities or spectral power distributions. It is only assessed objectively. Distortion measures and metrics are most often applied to evaluate numerical differences between digital images and more specifically the effects of various image-processing algorithms, such as image compression (see examples in Chapter 29). These measures do not take into account the visual significance of the error and therefore results from distortion assessments do not often correlate with perceived image quality. An example can be seen in Figure 19.6, which illustrates an original image (a) and two distorted versions (b and c): (b) is created by simply adding a positive constant to the original and (c) by adding the same constant, but with a random sign. Images (b) and (c) have exactly the same measured distortion (calculated from the difference image) with respect to the original but their quality differs drastically. Image (b) is simply a lighter version of the original and can be a perfectly acceptable reproduction, but image (c) is a much ‘noisier’ version and consequently of much lower visual quality than (b).

Image fidelity

Image fidelity assesses whether an image is visually distorted. It is concerned with relative thresholds, i.e. minimum changes/differences in images that can be visually detected. Image fidelity involves the HVS. Its physiological and cognitive functions are employed when psychophysical tests are carried out to evaluate subjective image fidelity. These can be threshold experiments designed to determine the just-noticeable difference (JND) in a stimulus and matching techniques employed to determine whether two stimuli are perceptibly different (see section on psychophysics later in the chapter). Fidelity judgements may be related to the observer’s mental prototype used to produce a match of a memorized colour; for example colours of skin, green grass and blue sky may be biased towards idealized prototypes of memory colours of the real objects. The fidelity of images is objectively quantified using models of the reproduction system/image along with models of the HVS describing the lower-order processing of the visual cortex, and in some cases psychophysical functions dealing with the probability of signal detection.

image

Figure 19.5   Image distortion, image fidelity and image quality, and points in the imaging chain where related evaluation is carried out.

Adapted from Ford (1997) and further from Triantaphillidou (2001)

image

Figure 19.6   (a) Original image. (b) Distorted version created by adding a positive constant to each pixel value of the original. (c) Distorted version created by adding the same constant but with a random sign.

Although in the first instance we might think that an image of high fidelity is of a high quality too, it is known that image fidelity does not always correlate with the subjective impression of image quality, i.e. threshold judgements are unrelated to suprathreshold judgements. Observers may notice the difference between two images and yet prefer the distorted version to the original. In many applications accurate or faithful reproduction is not the ultimate goal. For example, a slightly blurred reproduction of a portrait might be assessed to be of a higher quality than the sharper original, since blurring provides a ‘softer’ appearance to skin. Similarly, a digitally sharpened reproduction of a photograph that includes a lot of fine detail may be more appealing than the original. In pictorial imaging, deliberate distortions are often introduced to match the preferred reproduction of some colours such as Caucasian skin, blue sky colours and green grass (see Chapter 5). In most consumer cameras, manufacturers aim to adjust their devices and algorithms so that customers find the resulting images pleasing, rather than faithful reproductions – which may be the aim of scientific imaging devices.

Image quality

Image quality is concerned with threshold points as well as suprathreshold magnitudes. The difference between image fidelity and image quality is often not clearly distinguished. It has been described as the difference between the ability to discriminate between two images and the preference of one image over another, or the difference between the visibility of a factor (such as an artefact) and the degree to which that factor is bothersome. Apart from the physiological and cognitive factors of the HVS, the subjective assessment of quality also involves the quality criteria of the observers, since it deals with the acceptability in the case of loss of fidelity and the measurement of the perceptual magnitudes of error. It is subjectively quantified by suprathreshold judgements, using preference, categorical or other scaling methods aiming to create subjective quality sales (see section on psychophysics). Objective quantification involves measurements of the system’s performance or modelling of one or several system attributes. These are often combined with models of the HVS to produce models of image quality and image quality metrics (see later section on image quality models and metrics).

Some image attributes and/or characteristics have a stronger impact on observers than others. Thus, in objective quality assessments errors of similar magnitude may not be equally acceptable. In the reproduction of colours, for example, some colours are more important than others (greys, blue sky, foliage, skin); in colorimetric errors hue shifts are more serious than chroma differences. A distinction between the perceptibility (the just-noticeable difference) and the acceptability (the maximum colour deviation) of differences is essential.

The use of a reference image

When measuring distortion or fidelity there is always the need for a reference image (or at least a ‘model’ of it) – that is, an original against which the distortion or fidelity of the reproduction is assessed. In image quality evaluations there may or may not be a reference original. Having a reference is not always possible but depends on the imaging application. In lossy image compression, for example, the distortion, fidelity or quality of the compressed image is readily quantified against the uncompressed original. However, when measuring the quality of an imaging device, such as a camera lens, a reference image does not exist in the first place. In such a case, quality is evaluated subjectively, simply by judgements of the resulting images, and objectively by measuring the lens’s imaging performance using methods that involve test targets and related objective measures, such as those in Table 19.2.

TEST CHARTS AND TEST SCENES

Test charts

A wide range of test charts (also referred to as test targets), specifically designed to incorporate objects or image elements, are employed in the objective evaluation of the performance of imaging systems. They are usually designed to measure a specific image attribute, such as colour, tone or resolution. Test charts are calibrated in such a way that they have known and constant properties; these are often provided by the target manufacturer, or measured in the laboratory. They are considered as the input in the imaging system, i.e. ‘the imaging signal’. The response of the system to this signal is used to quantify its imaging performance. A number of commonly employed test charts are illustrated in Figure 19.7. Details on the design and use of specific charts are provided in various relevant chapters in this book. It is important to note that the response of the system is often signal dependent and thus results obtained by imaging a specific test chart may not always be representative of the system’s overall performance.

image

Figure 19.7   Commonly employed test charts for the evaluation of imaging performance. Top left: Kodak Q60 Target. Top right: GrettagMacbethTM Color Checker Chart. Bottom left: 1951 USAF Resolution Test Chart. Bottom middle: test chart for digital camera SFR and OECF measurements, designed by Don Williams, Image Science Associates. Bottom right: star chart, designed by Imatest.

image

Figure 19.8   Top: two images from the ISO 12640:1997 set. Bottom: black-and-white version of ‘Lena’, a test image commonly employed for the evaluation of image-processing algorithms.

Test scenes

In addition to test charts, real scenes, or images of them, can be used as the ‘imaging signal’ to evaluate distortion, fidelity or image quality. A variety of images representing scenes with different physical characteristics (for example, colours or spatial configurations) are often chosen carefully and purposefully by the experimenters. Various criteria have been devised to help the selection of these scenes (see later in the chapter). Some test scenes are commonly employed and have been established as ‘standard’ test images within the imaging community. Examples include the ISO set (see Bibliography) and ‘Lena’, an image that is widely used in image-processing applications (see Figure 19.8).

IMAGE PSYCHOPHYSICS

The subjective evaluation of image quality uses visual psychophysics, which involves the study of visual stimulus –response relationships. Psychophysics, when properly employed, has the capacity to produce accurate quantitative results from qualitative judgements. Human observers and statistical analysis of the resulting observations are the basis of psychophysics and psychometric scaling. The latter, with respect to image quality, deals with the measurement of human response to the continuum of image quality and/or its individual perceptual attributes, so-called ‘nesses’, e.g. sharpness, noisiness. As described by Engeldrum: ‘these are called the “nesses” to emphasize the perceptual as opposed to the physical nature of these attributes’. In particular, it deals with the creation of psychological rulers, or psychometric scales, against which subjective quality, or an image’s ‘ness’, can be measured.

Psychometric scales

According to Engeldrum, the processes that must take place when developing scale values (i.e. numerical values obtained from the scaling study) are:

•   Selection and preparation of samples

•   Selection of observers

•   Determination of observers’ task or question

•   Presentation of the samples to the observers and recording of their judgement

•   Data analysis of the observers’ responses and generation of the sample scale values.

Four types of psychometric scales were defined during the 1940s by Stanley S. Stevens, a renowned American psychophysicist who referred to measurement as ‘the process of assigning numbers to objects or events by rules’. Since then, these scales have been the standard reference classification and measurement system in psychophysics. They are briefly described below and illustrated in Figure 19.9:

•  Nominal scale. Samples or categories are identified solely by labels (numbers or names). A typical example is images classified by categories, such as architectural, natural scenes, portraits.

•  Ordinal scale. Samples or qualitative categories are ordered along the scale but there is no information about the distances along the scale. Category names or numbers in an ordinal scale simply represent the order of the events – for example, from ‘best’ to ‘worst’ quality or from ‘least’ to ‘most’. The scale has greater than or less than properties.

•  Interval scale. Numbers are used for measurement in these scales, which have the property of distance, as when using a ruler. Equal differences in scale values represent equal perceptual differences between the samples with respect to a ‘ness’ or overall image quality. Not all ‘nesses’ in an interval scale have a fixed zero point. Interval scales are usually floating scales (i.e. provide relative values), in that they are subject to linear transformations.

•  Ratio scale. This is an interval scale with an origin equal to zero. The origin may not be experimentally measurable. Unlike the interval scales, ratio scales are fixed with respect to an origin of the scale.

The statistical analysis employed to derive each type of scale increases in complexity as the scales increase in sophistication, from nominal to ratio. The observer’s skill and the amount of data also increase with increasing scale complexity.

image

Figure 19.9   Stevens’ psychometric scales.

The psychometric function

It is also possible to measure image fidelity using psychophysics, where observations are used to determine thresholds of visibility between an original and a reproduction (i.e. absolute thresholds), or determine just-noticeable differences (JNDs) in the psychological continuum of a ‘ness’. The probability of ‘seeing a difference’ is the principle on which the concept of thresholds and JNDs is based. The observer is asked to respond with a ‘yes’ or ‘no’ to the question ‘do you see the attribute?’ or ‘do you see a difference between stimuli?’ and the responses are accumulated over a number of observers. The observers’ responses are shown to vary even if the stimulus is constant! This variation is described statistically by a probability distribution, of which the cumulative distribution is referred to as the psychometric function. Such a typical function is illustrated in Figure 19.10. The absolute threshold is usually taken as the point where 50% of the observers can just see the attribute or ‘ness’, for example an artefact resulting from image compression. This corresponds to the 0.5 probability point on the psychometric curve. A stimulus’s JND is the stimulus change required to produce a just-noticeable difference in the perception of the ‘ness’, for example a JND in the intensity of the artefacts. It is usually defined as the point where 75% of the observers see the ‘ness’ in the stimulus and corresponds to the 0.75 probability on the psychometric function. The range of intensities that generates the range of probabilities between 0.25 and 0.75 is referred to as the interval of uncertainty, for which the judgements can go either way (‘yes’ or ‘no’). In most derivations of the psychometric function, the curve is symmetrical about the threshold and thus the interval of uncertainly is twice a JND.

image

Figure 19.10   The psychometric function.
Adapted from Engeldrum (2000)

Scaling methods

There are various procedures for developing psychometric scales and, in some cases, different procedures give different measurements. The choice of scaling method depends mainly on (i) whether threshold or suprathreshold assessments are required and (ii) the type of scale to be derived. Other factors influencing the experimenter’s choice of scaling method are the number of images to be presented to the observers (and thus the experimental time), the sample set confusion (i.e. whether adjacent sample images are easily distinguishable or not, a requirement for some methods) and the complexity of the statistical operations linked to each experimental method.

For the determination of thresholds and JNDs, the following scaling methods can be employed:

•  Method of limits. Both absolute threshold and JNDs can be determined with this method. Closely spaced image stimuli of increasing or decreasing ‘ness’ are presented to the observer, who is asked whether he or she detects the ‘ness’, or a noticeable difference in the ‘ness’ – e.g. difference in image colourfulness.

•  Method of adjustments. This is similar to the method of limits, except that it is the observer that adjusts the ‘ness’ in the image sample until it is just visible (for threshold determination), or until it matches a standard. This method can be easily implemented in computer displays, by employing a slider that ‘modifies’ the ‘ness’ in the sample.

•  Method of constant stimuli. There are several variations of this method, used to determine both absolute thresholds and JNDs. In all a constant set of samples is selected by the experimenter who has previously defined by pilot experiments the range and closeness of the ‘ness’ of these samples.

•  Paired comparison method. This is one of the most employed variations of the method of constant stimuli, where pairs of pre-selected image stimuli are presented to the observers who are asked to reply with a ‘yes’ or a ‘no’ to the question of whether there is a visual difference between the members of the presented pair. In a complete experiment all pairwise combinations, n2, are presented, where n is the number of images. As the number of comparisons increases very rapidly with increasing number of image stimuli, usually n(n − 1)/2 pairs are presented. In this most common case, image pair A–B is assumed to generate the same observer response as pair B–A, and in an image pair A–A comparison one sample is assumed to be selected over the other 50% of the time. A great advantage of this very popular method is that a psychometric function and thus the JND can be determined for each of the n samples used in the comparisons. It is used for both printed and displayed images.

The most common methods implemented in suprathrshold experiments (in the creation of scale values of image ‘ness’ or overall quality) are:

•  Rank order method. Here (a preferably small number of) images are ordered from best to worst (or the reverse) according to one physical attribute or ‘ness’, or their overall quality. While it is the simplest method to employ when printed images are evaluated, it is rather impractical for assessing displayed samples since monitors are not big enough and also generally fail to produce the necessary spatial uniformity for displaying more than two images simultaneously. Ordinal scales are generated with this method.

•  Paired comparison method. Here all possible pairs of selected image samples are shown to the observer (see above), who is asked to select the preferred image in terms either of the selected attribute (‘ness’) or overall quality. This two-alternative forced choice method is based on the law of comparative judgements, which relates the proportion of times one stimulus is found greater than another stimulus. The proportions are used to generate ordinal scales but the real advantage of the method is that pair-comparison data can be transformed to interval and ratio scales by the application of z-deviates of the Gaussian function. A triplet comparison method that results in a reduction of assessment time is an alternative to the paired comparison method. It is described in ISO 204622:2005 for producing scales calibrated in JNDs.

•  Magnitude estimation, direct and indirect rating scales. These require the assignment of numbers to represent the intensity of sensory experience. A reference image might be presented to the observer, who assigns a number to it (rates it according to its ‘ness’ or overall quality) and judges the other stimuli in terms of this reference. If a reference has not been used in the first place, then the images are assessed in the context of the entire test set. Interval and ratio scales can be generated with these methods. Many potential problems are associated with methods that assign numbers, because people use numbers in different ways and give them different meanings.

•  Categorical scaling method. This is based on the law of categorical judgements, which relates the relative position of stimuli to a number of categories. The observers view one sample at a time and are asked to place it into a specific category according to the sample’s ‘ness’ or overall quality. The number of categories usually range between five and nine; they may be labelled with names. Categorical scaling is a popular method because data collection is simple, scale meanings are easily understood and each judgement is quick to perform; thus, a large number of image samples can be used. Ordinal or interval scales can be created with this method. The latter requires not only the derivation of the sample scale value, but also the definition of category boundaries, which, although they are assumed to be perceptually equally distant, in reality rarely are. Table 19.4 lists the ITU 1995 five-point qualitative scale, which has been widely applied in the evaluation of the quality of television pictures and more generally in the imaging field, and provides an example of how it could be used to define the acceptability of an image artefact. Figure 19.11 illustrates an example of a relative interval scale, generated using categorical scaling, with unequally distant category boundaries (broken lines), where each image sample is placed on the scale according to its perceived quality scale value.

Table 19.4   A five-category qualitative scale of image quality

QUALITY

ARTEFACT

Excellent

Imperceptible

Good

Perceptible but not annoying

Fair

Slightly annoying

Poor

Annoying

Bad

Very annoying

Table 19.5 lists the most common psychometric scaling methods and the types of results they may produce.

The statistical operations employed for the derivation of thresholds, JNDs and scale values are beyond the scope of this chapter. Amongst many, Boynton and Bartleson (1984) and Engeldrum (2000) are two classical textbooks that provide detailed guidelines. Further, the International Standards Organization Technical Committee 42 has produced a three-part standard for psychophysical experimental methods to estimate image quality (ISO 20462: 2005). It discusses specifications regarding observers, test stimuli, instructions, viewing conditions, data analysis and reporting of results. In addition, it describes two perceptual methods, the triplet comparison technique (see above) and the quality ruler, which yield results calibrated in JNDs.

SCENE DEPENDENCY IN IMAGE QUALITY

An unsurprising result that stems from the variations in original scene configuration of test scenes in image quality assessments is that, in many subjective investigations, image quality is found to be scene dependent. That is, the results are shown to vary with the content of the images used for the investigations. The following paragraphs differentiate between three major origins of scene dependency in image quality measurements.

image

Figure 19.11   An interval scale with unequally distant category boundaries (broken lines). Each image is placed on the scale according to its perceived quality scale value.

Original image from Master Kodak PhotoCD

Table 19.5   The most common psychometric scaling methods

image

1.   Scene dependency resulting from the observer’s quality criteria. It is well known that image classes and scene content exert an influence on the observer judgements of image quality. Psychophysical experiments that took place as early as the 1960s indicated that the sharpness of portraits is judged by observers differently than the sharpness of landscapes and architectural scenes. More specifically, in very-high-quality portraiture, subjective quality decreased as objective sharpness increased. This result confirmed what practical photographers had known for years: ‘soft focus’ renders the skin smoother and thus more pleasing. In the same way, it is widely known that observers’ preference of critical colours, such as green grass, blue sky and skin colours, differs from the actual colours themselves.
   An example of this type of scene dependency, which originates from the observer’s quality criteria, is illustrated in Figure 19.12. In this figure, two test scenes have been subjected to the same objective amount of sharpening (versions (a) and (c)) and blurring (versions (b) and (d)). Most observers have judged the blurred version of the portrait to be of a higher quality than the sharpened version, whereas the opposite preference was reported for the ‘door’ scene.

2.   Scene dependency due to the visibility of an artefact. Noise and other artefacts are more or less visible in an image, depending on whether original scene features mask the artefact or reveal it. Scene dependency due to the visibility of an artefact is also known as scene susceptibility. Variations in scene susceptibility occur when the same objective amount of an artefact such as noise, striking or banding, for example, is present in images, but it is more or less evident in different types of scenes, or different areas with the same scene. For example, the digital artefact of striking is more evident in clear-sky image areas (i.e. relatively uniform, light areas) than in image areas of high-frequency signal and in extensive dark areas, which visually mask that striking. Similarly, for a given print granularity, graininess (i.e. the subjective measure of photographic granularity) usually decreases with print density and hence dark areas in prints are less visually susceptible to photographic noise.
   Figure 19.13 provides a demonstration in scene susceptibility to noise in digital images. Images (b) and (d) are noisy versions of images (a) and (c) respectively. In image (b) most observers agree that noise reduces significantly the overall quality of the image, since it is largely evident due to low-frequency original image content. In image (d) the same amount of uniform noise has been digitally added but, due to the abundance of high-frequency information in the original, the noise is hardly visible and thus affects less the image quality.
   The two types of scene dependency introduced so far are commonly seen in results from subjective experiments. They occur because of functions of the HVS, or they are due to the observers’ quality criteria.

image

Figure 19.12   Sharperned (a, c) and blurred (b, d) versions of two images with different scene content.

Original image of (a), (b) from Master Kodak PhotoCD.

3.   Scene dependency of digital processes. A digital image process may objectively change pixel values to a greater or lesser amount in a digital image, depending on the original scene features (colours, spatial properties, etc.). For example, image sharpening will modify more pixels in an image with many contours, lines and fine detail than in one with mostly slow varying, or uniform areas. Thus, scene dependency of digital processes (or image-processing algorithms) has an objective nature; it may also have different visual consequences depending on the type of scene used for processing and its characteristics.
   A classical case is that of image compression. Applying the same objective amount of compression (compression ratio) in two different images, one with mostly high- and the other with mostly low-frequency information, will discard different amounts of information, since most lossy compression algorithms discard mostly high spatial frequencies (see Chapter 29). Figure 19.14 illustrates the results of this effect, where both ‘Motor race’ and ‘Boats’ images have been compressed at a ratio of 60:1. With both JPEG and JPEG 2000 compression schemes, the objective damage introduced by the compression is more severe in the ‘Motor race’ image than in the ‘Boats’ image. However, the artefacts in the image ‘Boats’ might be more objectionable as they occur in flat image areas, whereas in ‘Motor race’ they are visually masked by the high frequencies that are abundant in this image.
   Independently of the type of scene dependency, its main causes (variations in original test image characteristics) as well as the consequences are the same. Scene dependency makes the analysis of results and inter-laboratory comparisons problematic. It often biases mean ratings and this is why it is common use to exclude outlying results or the ‘the odd scenes’ in subjective quality measurements. Additionally, the evaluation of image quality models and metrics is complex and often incomplete due to scene dependency in the perceived quality.

image

Figure 19.13   Original (a, c) and noisy (b, d) versions of two images with different scene content.

Original images from Master Kodak PhotoCD.

image

Figure 19.14   Original and compressed versions of two images with different scene content.
Original images from Master Kodak PhotoCD.

Choice of test scenes

There are methods to overcome some of the problems caused by scene dependency in image quality evaluations. Using a representative set of test stimuli (e.g. well-illuminated subjects, in focus) and excluding atypical stimuli (e.g. some objects out of focus, very high or very low dynamic range scenes) from the set of test stimuli is the most common. Nowadays many experimenters incorporate in their test sets images from the ISO test set, or other known test images (samples in Figure 19.8). A small number of standard test images, however, does not effectively represent the range and variety of different scenes that photographers, artists and consumers may wish to record and reproduce successfully. Furthermore, scenes that deviate in content from representative test scenes may not be reproduced appropriately, since they are not in accordance with the ‘average’ reproduction derived from image quality results.

Bartleson, in an extensive study on psychophysical methods, suggested that the test stimuli should be chosen by a clearly defined set of procedures, and proposed five ways for choosing a sample of stimuli: (i) the random independent sample; (ii) the stratified sample; (iii) the contrast sample; (iv) the purposeful sample; and (v) the identical sample. In all cases apart from the randomly chosen samples, some decisions have to be made by the experimenter on the features and content of the test images. These are usually assessed by inspection and require the experimenter’s intuition and judgement.

One way to identify scene features that play a significant role in image quality assessments is to look closely at the visual description of the image quality attributes (Table 19.1), as well as the susceptible image areas in various digital image artefacts (Table 19.3). Here is a summary of original scene features and characteristics in test scenes that influence image quality:

•   Illumination and luminance contrast

•   Hue, chroma and colour contrast

•   Amount of contour lines and edges Amount of slow varying areas (low-frequency information)

•   Busy areas with high amount of detail/texture (high-frequency information)

•   Camera-to-subject distance

•   Spatial distribution of subjects.

IMAGE QUALITY METRICS AND MODELS

Image quality metrics (IQMs) are objective measures designed to produce a single figure (or a set of values) aiming to describe the overall quality of images. They often employ physical performance measures of various attributes of the imaging chain, combined with models of functions and properties of the HVS (Figure 19.15). True visual IMQs take into account the HVS; otherwise, they only measure distortion between an original and a reproduction (see Chapter 29) and rarely relate to perceived image quality. The performance of IQMs is tested by correlation between the metric value and ratings produced in subjective assessments.

There are a very large number of IQMs and models employed in imaging laboratories. Many metrics have their origins in photographic systems and are adapted for predicting the quality of digital imaging systems. Others are specifically designed to evaluate digital image quality. A number of metrics have been developed to measure spatial image quality and take into account contrast, sharpness and noise aspects, independently or in combination. Examples are Barten’s SQRIn and Jacobson and Topher’s PIC, which produce visually weighted signal-to-noise ratios. Another example of a spatial metric is Jenkin’s EPIC, which was published in the mid-2000s and returns a value related to the perceived information capacity of images. All the above metrics employ a model of the contrast sensitivity function (see Chapter 4) to filter the frequency responses of the image/system and models of the eye’s noise.

Metrics in colour appearance spaces have been used for evaluation of colour image quality. Although images is considered as a colour metric (see Chapter 5), the first true colour IQM was proposed by Pointer, referred to as the Colour Reproduction Index (CRI). It is based on the Hunt colour appearance model for colour vision. The first metric that included both colour and spatial image properties was Zhang and Wandell’s sCIELAB, a spatial extension to CIELAB, which applies three pre-processing stages before computation of the modified colour difference metric ∆E*. It was developed in the 1990s. Later, Johnson and Fairchild presented an extension of sCIELAB, a modular framework developed for calculating image differences and predicting image quality. The framework was designed to be used with advanced colour difference formulae, such as the CIEDE2000, and performs spatial filtering in the frequency, rather than the spatial, domain. Further independent preprocessing modules are added for adaptation, spatial localization and local contrast detection, before performing pixel-by-pixel colour difference calculations (Figure 19.16).

image

Figure 19.15   Measures or models describing the images’, or the imaging systems’ attributes and models of the HVS are used in IQMs.

Adapted from Jacobson and Triantaphillidou (2002)

image

Figure 19.16   Johnson and Fairchild’s modular framework for obtaining colour image differences.

From Johnson (2003).

There are a large number of IQMs that are intended for display assessment and image-processing applications and are designed to assess image fidelity. They involve frequency analysis (i.e. decomposing the image into several frequency channels, or sub-bands), models of the contrast sensitivity function, optical masking and an error pooling method (i.e. an error summing method, for example Minkowski’s method) to obtain a spatial map of perceptually weighted error. A widely known such metric is Daly’s Visual Difference Predictor (VDP). Finally, several models have been produced specifically to measure the effects of image compression, which also include frequency decomposition, such as Watson’s DCT-based metric and wavelet-based metrics.

In reality, there is no unique measure of image quality, because of its subjective and multidimensional nature, as well as the problems of scene dependency described earlier. The purpose of useful metrics is to provide overall, average ratings or figures that can be used to compare in a visually meaningful manner the performance of imaging systems. The reader is suggested to refer to the extended Bibliography on the subject.

BIBLIOGRAPHY

Barten, P.G.J., 1999. Contrast Sensitivity and its Effects on Image Quality. SPIE Optical Engineering Press, Bellingham, WA, USA.

Boynton, R.M., Bartleson, C.L., 1984. Optical Radiation Measurements, Volume 5, Visual Measurements, Part II, Chapters 79. Academic Press, New York, USA.

Burns, P.D., 2006. Evaluating digital camera and scanner imaging performance. Short course, RPS Digital Futures 2006 Conference, London, UK.

Engeldrum, P.G., 1995. A framework for image quality models. Journal of Imaging Science and Technology 39 (4), 259–270.

Engeldrum, P.G., 2000. Psychometric Scaling: A Toolkit for Imaging Systems Development. Imcotek Press, Winchester, MA, USA.

Ford, A.M., 1997. Relationship Between Image Quality and Image Compression. Ph.D. thesis, University of Westminster, London, UK.

Ford, A.M., 1999. Colour Imaging: Vision and Technology, Part 4: Quality Assessment. In: MacDonald, L.W., Luo, R.M. (Eds.), Wiley, Chichester, UK.

ISO 12640:1997, 1997. Graphic Technology: Prepress Digital Data Exchange – CMYK Standard Color Image Data (CMYK/SCID).

ISO 20462:2005, Parts 1, 2 and 3 2005. Photography: Psychophysical Experimental Methods for Estimating Image Quality.

Jacobson, R.E., 1995. An evaluation of image quality metrics. Journal of Photographic Science 43, 7–16.

Jacobson, R.E., Triantaphillidou, S., 2002. Colour Image Science: Exploiting Digital Media, Part 5: Image Quality. In: MacDonald, W.L., Luo, M.R. (Eds.), Wiley, Chichester, UK.

Janssen, T.J.W.M., Blommaert, F.J.J., 1997. Image quality semantics. Journal of Imaging Science and Technology 41 (5), 555–560.

Jenkin, R.B., Triantaphillidou, S., Richardson, M.A. 2007. Effective Pictorial Information Capacity (EPIC) as an image quality metric. Proc. SPIE/IS&T Electronic Imaging: Image Quality and System Performance IV, San Jose, CA, USA, pp. O1–O9.

Johnson, G.M., Fairchild, M.D., 2003. Measuring images: differences, quality and appearance. Proc. SPSPIE/IS&T Electronic Imaging Conference, Santa Clara 5007, 51–60.

Keelan, W.B., 2002. Handbook of Image Quality: Characterization and Prediction. Marcel Dekker, New York, USA.

MacDonald, L.W., Jacobson, R.E., 2006. Digital Heritage: Applying Digital Imaging to Cultural Heritage, Chapter 13: Assessing Image Quality. In: MacDonald, L.W. (Ed.), Butterworth-Heinemann, Oxford, UK.

Shaw, R. (Ed.), 1976. Selected Readings in Image Evaluation. SPIE Press, Bellingham, WA, USA.

Triantaphillidou, S., 2001. Image Quality in the Digitisation of Photographic Collections. Ph.D. thesis. University of Westminster, London, UK. Triantaphillidou, S., Allen, E., Jacobson, R.E., 2007. Image quality of JPEG vs JPEG 2000 image compression schemes, Part 2: Scene analysis. Journal of Imaging Science and Technology 51 (3), 259–270.

Zhang, X., Wandell, B.A., 1997. A spatial extension of CIELAB for digital color-image reproduction. J. SID Digest 96, 61–63.

Intensity is used as a generic term for physical quantities such as luminance, illuminance, transmittance, reflectance, and density.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset