6
QoE Subjective and Objective Evaluation Methodologies

Hong Ren Wu

RMIT, Australia

The discussions in previous chapters have highlighted an increasing emphasis on Quality of Experience (QoE) [1] compared with Quality of Service (QoS) [2] in audio-visual communication, broadcasting and entertainment applications, which signals a transition from technology-driven services to user-centric (or perceived) quality-assured services [3]. QoE as defined by ITU SG 121 is application or service specific and influenced by user expectations and context [1], and therefore necessitates assessments of perceived service quality and/or utility (or usefulness) of the service [4]. Subjective assessment and evaluation of the service are imperative to establish the ground truth for objective assessment and measures which aid in the design/optimization of devices, products, systems, and services in either the online or offline mode [1]. Assessment or prediction of QoE in multimedia services will have to take account of at least three major factors, including audio signal quality perception, visual signal quality perception, and interaction or integration of the perceived audio and visual signal quality [5–7], considering coding, transmission, and application/service conditions [3, 6, 8, 9]. This chapter focuses on the issues underpinning the theoretical frameworks/models and methodologies for QoE subjective and objective evaluation of visual signal communication services.

Subjective picture quality assessment methods for television and multimedia applications have been standardized [10, 11]. Issues relevant to human visual perception and quality scoring or rating are discussed in this chapter, while readers are referred to the standards documents and/or other monographs regarding specific details of the aforementioned standards [10–12].

images

Figure 6.1 A color image sequence example: (a) image sequence; (b) three-level DWT decomposition sub-band designations; (c) three-color channel signals in RGB space; (d) three-color channel signals in YCBCR space

The Human Visual System (HVS) can be modeled in either pixel domain or transform/sub-band decomposition domain, and the same can be said about picture quality metric formulations. Given a color image sequence or video as shown in Figure 6.1(a), x[n, i, ζ], N1 pixels high and N2 pixels wide, where n = [n1, n2] for 0 ⩽ n1N1 − 1 and 0 ⩽ n2N2 − 1 with I frames for 0 ⩽ iI − 1 in tricolor space Ξ = {Y, CB, CR}2 (or Ξ = {R, G, B}3) for ζ ∈ {1, 2, 3} corresponding to, for example, Y, CB, CR (or R, G, B) channels (cf. Figure 6.1(d) or (c)) [13, 14], respectively, its transform or decomposition is represented by X = [k, b, j, ζ], as shown, for example, in Figure 6.1(b) for a Discrete Wavelet Transform (DWT) decomposition, where k = [k1, k2] defines the position (row and column indices) of a coefficient in a block of a frequency band b of slice j in the decomposition domain. For an s-level DWT decomposition, b = [s, θ], where θ ∈ {θ0|LL band, θ1|LH band, θ2|HL band, θ3|HH band} and s = 3 per frame, as shown in Figure 6.1(b). It is noted that, as shown in Figure 6.1(c) and (d), three component channels in the YCBCR color space are better decorrelated than those in the RGB color space, facilitating picture compression. Other color spaces, such as opponent color spaces, which are considered perceptually uniform4 [15], have often been used in color picture quality assessment [16–18].

6.1 Human Visual Perception and QoE Assessment

Quantitative assessment of visual signal5 quality as perceived by the HVS inevitably involves human observers participating in subjective tests which elicit quality ratings using one scale or another [4, 10–12]. Quantification of subjective test results is usually based on the Mean Opinion Score (MOS),6 which indicates the average rating value qualified by a measure of deviation (e.g., standard deviation, variance, minimum and maximum values, or a confidence interval), acknowledging the subjectivity and statistical nature of the assessment [10–12]. This section discusses a number of issues associated with human visual perception, which affect subjective rating outcomes and, thereafter, their reliability and relevance when used as the ground truth for objective or computational metric designs. Models or approaches to computational QoE metric designs to date [19–27] are analyzed to appreciate their theoretical grounding and strength in terms of prediction accuracy, consistency, and computational complexity, as well as their limitations, with respect to QoE assessment and prediction for picture codec design, network planning and performance optimization, and service performance assessment.

Low-level human vision has been characterized by spatial, temporal and color vision, and visual attention (or foveation). There are well-formulated HVS models which have been successfully used in Image (or Video) Quality Assessment (IQA or VQA) and evaluation [19–21] and perception-based picture coder designs [3]. It is noted that the majority of these models have their parameters derived from a threshold vision test to estimate or predict the Just-Noticeable Difference (JND) [28], with a few exceptions [29, 30]. When these threshold vision models are extended to supra-threshold experiments where most of the visual communications, broadcast, and entertainment applications to date apply [3], the selection of the model parameters relies heavily on a regression process to achieve the best fit to subjective test data [18]. Relevancy, accuracy, and reliability of subjective test data therefore directly affect the performance of objective QoE metrics [4, 7, 18, 25]. Four issues have emerged over the years of acquiring ground-truth subjective test data in terms of perceived picture quality and QoE, and are worth noting.

First and foremost, the picture quality perceived and thereafter the ratings given by human observers are affected by what they have seen or experienced prior to a specific subjective test. Contextual effects7 (considered as short-term or limited test sample pool8 effects) have been reported using standardized subjective test methods [31]. Affordability notwithstanding, users' expectations are understandably influenced by their benchmark experience or point of reference in what constitutes the “best” picture quality they have seen or experienced. This benchmark experience (long-term or global sample pool effects) will derive subsequent ratings on a given quality scale. For an observer who had never viewed or experienced, for example, an uncompressed YCBCR 4:4:4 component color video of Standard Definition (SD) [13] or full High Definition (HD) [14] on a broadcast video monitor designed for critical picture evaluation,9 it would be highly uncertain what response one could hope to elicit from the observer when asked if any chroma distortion was present in the 4:2:2 or 4:2:0 component video in the subjective test. While chroma sub-sampling has been widely used in video products to take advantage of HVS' insensitivity to chroma component signals as an initial step to image data compression, the chroma distortions so caused are not always negligible as commonly believed, especially using quality (e.g., broadcast or professional-grade) video monitors. Figure 6.2 uses contrast-enhanced10 difference images between the uncompressed Barbara test image in component YCBCR 4:4:4 format [13, 14] and those chroma sub-sampled images in component 4:2:2, 4:2:0, and 4:11 formats, respectively, to illustrate chromatic distortions. The same thing may be said about responses from observers used in an image or video quality subjective evaluation test, to whom various picture coding artifacts or distortions [32–34] are unknown. Issues associated with the consistency and reliability of subjective test results as reported or reviewed (cf., e.g., [7, 35]) aside, subjective picture quality assessment data collected from observers with limited benchmark experience is deemed not to have a reliable/desirable anchor (or reference) point, making the analysis results difficult to interpret or inconclusive [7], if not questionable, and does not inspire confidence in the application to objective quality metric design and optimization. In other words, using observers with minimum knowledge, experience, or expectations in subjective picture quality evaluation tests generates data with varying or no reference point, often then lowering expectations of what is considered as “Excellent” picture quality, and possibly running a real risk of racing to the bottom in the practice of quality assessment, control, and assurance for visual signal communications, broadcast, and entertainment applications.

Second, it has long been acknowledged that human perception and judgment in a psychophysical measurement task usually perform better in comparison tasks than casting an absolute rating.11 Nevertheless, an Absolute Category Rating (ACR) or absolute rating scale has been used widely in subjective picture quality evaluations [7, 10–12]. To address the issue regarding fluctuations in subjective test data using absolute rating schemes due to the aforementioned contextual effects and the varying experience and expectations of observers, a Multiple Reference Impairment Scale (MRIS) subjective test method for digital video was

images

Figure 6.2 Examples of chroma distortions in Barbara test image due to chroma sub-sampling: (a) the uncompressed image in component YCBCR 4:4:4 format; (b) the image in component 4:2:2 format; (c) the image in component 4:2:0 format; (d) the image in component 4:1:1 format; (e) contrast-enhanced difference image between (a) and (b); (f) contrast-enhanced difference image between (a) and (c); (g) contrast-enhanced difference image between (a) and (d). Contrast enhancement was applied to difference images with a bias of 128 shown in (e)–(g) for better visualization in PDF format or using photo-quality printing

reported in [36], where a five-reference impairment scale (R5 to R1) was used with five reference pictures including the original, xo, as uncorrupted picture reference, xR5 (R5), reference distorted pictures defined, respectively, as xR4, xR3, xR2, and xR1 in terms of their perceptibility of impairment corresponding to perceptible but not annoying (R4), slightly annoying (R3), annoying (R2), and very annoying (R1). The observers compared the processed picture xp with the original xR5 to determine if the impairment was perceptible, or with xRi for i ∈ {1, 2, 3, 4} when there was a perceptible distortion to rate xp as better, similar, or worse than xRi. This approach led to a comparative rating scale based on forced choice methods, which significantly reduced the deviation in the subjective test data and alleviated the contextual effects. Ideally, following this conventional distortion-detection strategy, each of the reference distortion scales as represented by xRi will be better off corresponding to JND levels [3, 36] or Visual Distortion Units (VDUs) [29].

Third, an issue not all together disassociated with the second is the HVS's response under two distinctive picture quality assessment conditions: where artifacts and distortions are at visibility sub-threshold or around the threshold (usually found in high-quality pictures) and at supra-threshold (commonly associated with medium and low-quality pictures). HVS models based on threshold vision test and detection of the JND have been available and widely adopted in objective picture quality/distortion measures/metrics [19–27]. While picture distortion measures based on these models have been successfully employed in perceptually lossless picture coding [3, 37, 38], applications of JND models to picture processing, coding, and transmission or storage at supra-threshold levels have revealed that the supposition of linear scaling JND models is not fully supported by various experimental studies [15, 29, 39] and, therefore, further investigations are required to identify, delineate, and better model HVS characteristics/responses under supra-threshold conditions [29, 30, 35, 39, 40]. It was argued in [30] that for assessment of a high-quality picture with distortions near the visibility threshold, the HVS tends to look past the picture and perform a distortion detection task, whilst for evaluation of a low-quality picture with obviously visible distortions of highly supra-threshold nature, the HVS tends to look past (or to be more forgiving toward) the distortions and look for the content of the picture. This hypothesis is consistent with the HVS behavior as revealed by contextual effects. To cover a wide range of picture quality as perceived or experienced by human observers, HVS modeling and quantitative QoE measurement may need to take account of two, instead of one, assessment strategies which the HVS seems to adopt: distortion detection, which is commonly used in threshold vision tests and gradation of degradations of image appearance, which is practiced in supra-threshold vision tests [30].

Fourth, it is becoming increasingly clear that the assessment of QoE requires more than the evaluation of picture quality alone, with a need to differentiate the measurement of the perceived resemblance of a picture at hand to the original from that of the usefulness of the picture to an intended task. It was reported in [4] that QoP (perceived Quality of Pictures) in a five-scale ACR and UoP (perceived Utility of Pictures) anchored by Recognition Threshold (RT, assigned “0”)12 and Recognition Equivalence Class (REC, assigned “100”)13 could be approximated by a nonlinear function, and that QoP did not predict UoP well and vice versa. Detailed performance comparisons have been reported in [4] of natural scene statistical model [27] and image feature-based QoP and UoP metrics. Compared with subjective test methodology and procedures for QoP assessments which have been standardized over the years [10–12], subjective tests for UoP assessments for various specific applications may face further challenges ranging from the most critical to minimal efforts, and require participation of targeted human observers who have the necessary domain knowledge of intended applications, (e.g., radiologists and radiographers in medical diagnostic imaging) [37].

images

Figure 6.3 R-D optimization considering a perceptual distortion measure [3] or a utility score [4] for QoE-regulated services compared with the MSE.

Source: Adapted from Wu et al., 2014 [91]

QoP and QoE assessments are not just for their own sakes, and they are linked closely to visual signal compression and transmission where Rate–Distortion (R-D) theory is applied for product, system, and service quality optimization [41–43]. From an R-D optimization perspective [44–46], it is widely understood that the use of raw mathematical distortion measures, such as the Mean Squared Error (MSE), do not guarantee visual superiority since the HVS does not compute the MSE [3, 47]. In RpD (Rate-perceptual-Distortion) optimization [48], where perceptual distortion or utility measures matter, the setting of the rate constraint, Rc, in Figure 6.3 is redundant from a perceptual distortion controled coding viewpoint. The perceptual bit-rate constraint, Rpc, makes more sense and delivers a picture quality comparable with JND1. In comparison, Rc is neither sufficient to guarantee a distortion level at JNND (Just-Not-Noticeable Difference) nor necessary to achieve, for example, JND1 in Figure 6.3. By the same token, Dc is ineffective at holding a designated visual quality appreciable to the HVS since it cannot guarantee JND2 nor is it necessary to deliver JND3. As the entropy defines the lower bound of the bit rate required for information lossless picture coding [49, 50], the perceptual entropy [48] sets the minimum bit rate required for perceptually lossless picture coding [37, 38]. Similarly, in UoP-regulated picture coding in terms of a utility measure, utility entropy can be defined as the minimum bit rate required to reconstruct a picture and achieve complete feature recognition equivalent to perceptually lossless pictures, including the original as illustrated in Figure 6.3.

6.2 Models and Approaches to QoE Assessment

Objective QoP measures or metric designs for the purpose of QoE assessment can be classified based on the model and approach which they use and follow. The perceptual distortion or perceived difference between the reference and the processed visual signal can be formulated by applying the HVS process either to the two signals individually before visually significant signal differences are computed, or to the differences of the two signals in varying forms to weigh up their perceptual contributions to the overall perceptual score [19, 21].

6.2.1 Feature-Driven Models with Principal Decomposition Analysis

A feature extraction-based approach to picture quality metric design formulates a linear or nonlinear cost function of various distortion measures using features extracted from given reference and processed pictures, considering aspects of the HVS (e.g., Contrast Sensitivity Function (CSF), luminance adaption, and spatiotemporal masking effects), and optimizing coefficients to maximize the correlation of picture quality/distortion estimate with the MOS from subjective test data.

An objective Picture Quality Scale (PQS) was introduced by Miyahara in [51] and further refined in [52]. The design philosophy of PQS is summarized in [53], which leads to a metric construct consisting of the generation of visually adjusted and/or weighted distortion and distortion feature maps (i.e., images), the computation and normalization of distortion indicators (i.e., measures), decorrelated principal perceptual indicators by Principal Decomposition Analysis (PDA), and pooling principal distortion indicators with weights determined by multiple regression analysis to fit subjective test data (e.g., MOS) to form the quality estimator (i.e., PQS in this case). Among the features considered by the PQS are a luminance coding error, considering the contrast sensitivity and brightness sensitivity described by Weber–Fechner's law [15], a perceptible difference normalized as per Weber's law, perceptible blocking artifacts, perceptible correlated errors, and localized errors of high contrast/intensity transitions by visual masking. The PDA is used to decorrelate any overlap between these distortion indicators based on feature maps which are more or less extracted empirically, and omitted in many later distortion metric implementations only to be compensated by the regression (or optimization) process in terms of the least-mean-square error, linear correlation, or some other measure [18].

A similar approach was followed by an early representative video quality assessment metric by ITS14 [54], (s-hat),15 leading to the standardized Video Quality Metric (VQM) in the ANSI and the ITU-T objective perceptual video quality measurement standards [16, 55]. Seven parameters (including six impairment indicators/measures16 and one picture quality improvement indicator/measure17) are used in linear combination to form the general VQM model with parameters optimized using the iterative nested least-squares algorithm to fit against a set of subjective training data. The general VQM model was reported in [55] to have performed statistically better than, or at least equivalent to, other models recommended in [16] in either the 525-line or 625-line video test.

Various picture distortion or quality metrics designed using this approach rely on extraction of spatial and/or temporal features, notably edge features [52, 55, 87], which are deemed to be visually significant in the perception of picture quality, and a pooling strategy for formulation of an overall distortion measure with parameters optimized by a regression process to fit a set of subjective test data.

6.2.2 Natural Scene Statistics Model-Based Perceptual Metrics

The Natural Scene Statistics (NSS) model-based approach to QoP measurement is based on the hypothesis that modeling natural scenes and modeling HVS are dual problems, and QoP can be captured by NSS [27, 56]. Of particular interest are the Structural Similarity Index (SSIM) [57] and its variants, the Visual Information Fidelity (VIF) measure [58] and the texture similarity measure [59], the former two of which have been highly referenced and used in QoP performance benchmarking in recent years, as well as frequently applied to perceptual picture coding design using RpD optimization [3].

6.2.2.1 Structural Similarity [57]

Formulation of the SSIM is based on the assumption that structural information perception plays an important role in perceived QoP by the HVS and structural distortions due to additive noise, low-pass-filtering-induced blurring, and other coding artifacts affecting perceived picture quality more than non-structural distortions such as a change in brightness and contrast, spatial shift or rotation, or a Gamma correction or change [47]. The SSIM replaces pixel-by-pixel comparisons with comparisons of regional statistics [57]. The SSIM for monochrome pictures measures the similarity between a reference/original image, xref[n],18 and a processed image, xp[n], N1 pixels high and N2 pixels wide, in luminance as approximated by picture mean intensities, and , contrast as estimated by picture standard deviations, and , and structure as measured by the cross-correlation coefficients between xref[n] and xp[n], . It is defined as follows [57]:

where the luminance, contrast, and structure similarity measures are, respectively,

and

al, ac, and as are constants to avoid instability, with values selected proportional to the dynamic range of pixel values; α > 0, β > 0, and γ > 0 are parameters to define the relative importance of the three components, and for the vector index set, , encompassing all pixel locations of xref[n] and xp[n], with card( · ) denoting the cardinality of a set,

and

To address the issues with the non-stationary nature of spatial (and temporal) picture and distortion signals, as well as the visual attention of the HVS, the SSIM is applied locally (e.g., to a defined window), leading to windowed SSIM, SSIMW(xref, xp). Sliding this window across the entire picture pixel-by-pixel will result in, for example, a total number of M SSIMW values for each pixel location in an entire picture. The overall SSIM is then computed as the average of the relevant SSIMs, as follows:

images

Figure 6.4 An information-theoretic framework used by VIF measurement (after [58]), where is the GSM, an RF, as the NSS model in the wavelet domain, approximating the reference image, Ck and Uk are M-dimensional vectors consisting of non-overlapping blocks of M coefficients in a given sub-band, a Gaussian vector RF with zero mean and covariance , an RF of positive scalars, symbol “⊙” defines the element-by-element product of two RFs [58], and is the set of location indices in the wavelet decomposition domain; , the RF representing the distorted image in the same sub-band, a deterministic scalar field, a stationary additive zero-mean Gaussian noise RF with variance which is white and independent of with identity matrix I and ; and modeling HVS visual distortions to the reference and channel/coding distortion , respectively, with RFs and being zero-mean uncorrelated multivariate Gaussian of M dimensions with covariance and the variance of the visual noise; b is the sub-band index and the selected sub-band critical for VIF computation

An 11 × 11 circular-symmetric Gaussian weighting function with standard deviation of 1.5 samples normalized to a unity sum was used in [57] for computation of the mean, standard deviation, and cross deviation in (6.5)–(6.7), respectively, to avoid blocking artifacts in the SSIM map. The SSIM has been extended to color images [60] and video [61, 90].

6.2.2.2 Visual Information Fidelity [58]

VIF formulation takes an information-theoretic approach to QoP assessment, where mutual information is used as the measure formulating source (natural scene picture statistics) model, distortion model, and HVS “visual distortion”19 model. As shown in Figure 6.4, a Gaussian Scale Mixture (GSM) model, , in the wavelet decomposition domain20 is used to represent the reference picture. A Random Field (RF), , models the attenuation such as blur and contrast changes, and additive noise of the channel and/or coding which represent equal perceptual annoyance from the distortion instead of modeling specific image artifacts. All HVS effects are considered as uncertainty and treated as visual distortion, which is modeled as a stationary, zero-mean, additive white Gaussian noise model, , corresponding to reference (or for processed) in the wavelet domain. For a selected sub-band b, where b = [s, θ] with level s and orientation θ, in the wavelet transform domain, the VIF measure is defined as

where the mutual information between the reference image and the perceived image in the same sub-band b is defined as H(CN[b]; EN[b]|ξN[b]), with ξN[b] being a realization of N elements in for a given reference image, and that between the processed image and the image perceived by HVS is H(CN[b]; FN[b]|ξN[b]), with , , , , , and the selected sub-band critical for VIF computation.

When there is no distortion, the VIF equals unity. When the VIF is greater than unity, the processed picture is perceptually superior to the reference picture, as may be the case in a visually enhanced picture.

6.2.2.3 Textural Similarity

The Structure Texture Similarity Metric (STSIM) measures perceived texture similarity between a reference picture and a processed counterpart to address an issue with SSIM, which tends to give low similarity values to textures which are perceptually similar. The framework used by the STSIM consists of sub-band decomposition (e.g., using steerable filter banks), computation of a set of statistics including the mean, variance, horizontal and vertical autocorrelations, and cross-band correlation, statistical comparisons and pooling scores across statistics, sub-bands and window positions. More detailed discussions and reviews of various textural similarity metrics can be found in [59].

6.2.3 HVS Model-Based Perceptual Metrics

The HVS model-based approach devises picture quality metrics to simulate the human visual perception using a model to characterize low-level vision for picture quality estimation, in terms of spatial vision, temporal vision, color vision, and foveation. Three types of HVS model have emerged, including JND models, multichannel CGC models, and supra-threshold models, which have been applied successfully to picture quality assessment and perceptual picture coding design using RpD optimization [3]. The multichannel structure of the HVS decomposes the visual signal into several spatial, temporal, and orientation bands where masking parameters will be determined based on human visual experiments [16, 18].

6.2.3.1 JND Models

The HVS cannot perceive all changes in an image/video, nor does it respond to varying changes in a uniform manner [15, 63]. In picture coding, JND threshold detection-based HVS models are reported extensively [19–26, 28] and used in QoP assessment, perceptual quantization for picture coding, and perceptual distortion measures in RpD performance optimization for visual signal processing and transmission services [3].

The JND models reported currently in the literature consider (1) spatial/temporal CSF, which describes the sensitivity of the HVS to each frequency component, as determined by psychophysical experiments; (2) background Luminance Adaptation (LA), which refers to how the contrast sensitivity of the HVS changes as a function of the background luminance; and (3) Contrast Masking (CM), which refers to the masking effect of the HVS in the presence of two or more simultaneous frequency components. The JND model can be represented in either the spatiotemporal domain or the transform/decomposition domain, or both. Examples of JND models are found with CSF, CM, and LA modeling in the DCT domain [64–67], and CSF and CM modeling using sub-band decomposition [68–71]; or in the pixel domain [72], where the key issue is to differentiate edge from textured regions [73].

A general luminance JND modeling in the sub-band decomposition domain is given by [3, 26]

where is the base visibility threshold at the location k in sub-band b of frame j determined by spatiotemporal CSF, and , ℘ ∈ {intra, inter, temp, lum, …}, represents different elevation factors due to intra-band (intra) masking, inter-band (inter) masking, temporal (temp) masking, luminance (lum) adaptation, and so on. The frame index j is redundant for single-frame images.

It is well known that HVS sensitivity reaches its maximum at the fovea over two degrees of the visual angle and decreases toward the peripheral retina, which spans 10–15° of visual angle [74]. While JND accounts for the local response, Visual Attention (VA) models the global response. In the sub-band decomposition domain, Foveated JND (FJND) can be modeled as follows [3]:

where JNDSD[k, b] is defined in (6.10), denotes the modulatory function determined by V[k] and usually taking a smaller value with larger V[k], which denotes the VA estimation corresponding to spatial frequency location k in block b. JND is a special case of FJND when VA is not considered and reduces to unity (i.e., ).

There are two approaches to JND modeling for color pictures (i.e., modeling of Just-Noticeable Color Difference (JNCD)). Each color component channel can be modeled independently in a similar way to that in which the luminance JND model is formulated. Alternatively, JNCD can be modeled by a base visibility threshold of distortion for all colors, JNCD00(n),21 modulated by the masking effect of the non-uniform neighborhood (measured by the variance) represented by and a scale function , modeling the masking effect induced primarily by local changes of luminance (measured by the gradient of the luminance component), assuming that the CIELAB color space Ξ = {L, a, b} is used, and ζ = 1 corresponds to the L component, as follows:

where n is the pixel coordinate vector in a pixel domain formulation.

Based on the JND model, a Peak Signal-to-Perceptual-Noise Ratio (PSPNR) was devised in [76] as follows:

where , xref and xrec are the reference and the reconstructed pictures, respectively,

In (6.16), the luminance adaptation factor JNDpL[n] at pixel location n can be decided according to the luminance in the pixel neighborhood; the texture masking factor JNDpT[n] can be determined via the weighted average of gradients around n [72] and refined with more detailed signal classification [73]; κ accounts for the overlapping effect between JNDpL and JNDpT, and 0< κ ≤ 1. For video, factor JNDp[n] can be multiplied further by an elevation factor as in (6.15) to account for the temporal masking effect, which is depicted by a convex function of inter-frame change formulated in (6.17) [76]:

where x[n, i] denotes the pixel value of the ith frame and xBG[n, i] the average background luminance of the ith frame.

When .JNDST[n, i, ζ]|∀ζ ≡ 0 in (6.13), the PSPNR reduces to the PSNR.

A Visual Signal-to-Noise Ratio (VSNR) is devised in [77] using a wavelet-based visual model of masking and summation, which claims low computational complexity and low memory requirements.

6.2.3.2 Multichannel Vision Model

Contrast Gain Control (CGC) [78] has been used successfully in varying implementations for JND detection, QoP assessment, and perceptual picture coding in either standalone or embedded forms [18–21, 37, 43, 79–82], with Picture Quality Rating (PQR) extended from the original Sarnoff's Visual Discrimination Model–JNDmetrix™ [83, 84], extensively documented in ITU-T J.144 recommendation and frequently used as a benchmark [16].

An example of the CGC model in the visual decomposition domain used in [82] is described briefly for embedding a perceptual distortion measure in RpD optimization of a standard compliant coder. As shown in Figure 6.5, it consists of a frequency transform (with 9/7 filter) [85], CSF weighting, intra-band and inter-orientation masking, detection and pooling.

Given the Mallat DWT decomposition [86] of an image, x[n, ζ], for , XDWT[k, b, z], where b = [s, θ] defines the decomposition level or scale s ∈ {1, 2, ..., 5} representing five levels and orientation θ ∈ Θ = {θ0|LL band, θ1|LH band, θ2|HL band, θ3|HH band} representing three orientations, and k = [k1, k2] with k1 and k2 as the row and column spatial frequency indices within the band specified by b, the CGC model for a designated color channel has a masking response function of the form [82]

where ζ is assumed to be 1 (representing the luminance Y component) and omitted to simplify the mathematical expressions, Ez[k, b] and Iz[k, b] are the excitation and inhibition functions, ρz and σz are the scaling and saturation coefficients, and z ∈ {Θ, ϒ}, with Θ and ϒ specifying inter-orientation and intra-frequency masking domains, respectively.

images

Figure 6.5 Multichannel contrast gain control model. Source: Tan et al., 2014 [82]. Reproduced with permission of Dr. Tan

The excitation and inhibition functions of the two domains (i.e., z ∈ {Θ, ϒ}) are given as follows:

and

where the exponents pz and q represent, respectively, the excitatory and inhibitory nonlinearities and are governed by the condition pz > q > 0 according to [78], Ms(k) is a neighborhood area surrounding XCSF[k, b], whose population is dependent on the frequency level, s = {1, 2, 3, 4, 5} (from lowest to highest; cf. Figure 6.1(b)), such that card(Ms(k)) = (2s + 1)2, and XCSF[k, b] contains the weighted transform coefficients, accounting for the CSF and defined as

In (6.21), represents the sum of transformed coefficients spanning all oriented bands. The variation in neighborhood windowing associated with in (6.22) addresses the uneven spatial coverage between different resolution levels in a multi-resolution transform. The spatial variance, λ2, in (6.22) has been added to the inhibition process to account for texture masking [74]. where Lλ denotes the code block and μ the mean of Lλ. In (6.23), Wδ for δ ∈ {LL, 1, 2, ..., 5} represents six CSF weights, one for each resolution level plus an additional weight for the isotropic (LL) band and [76]. is the base visibility threshold at the location k in sub-band b determined by spatiotemporal CSF.

In [82], a simple squared-error (or l2-norm-squared) function is used to detect the visual difference between the visual masking responses of the reference, XRefz[b, k], and processed CSF-weighted DWT coefficients, XProz[b, k], respectively, to form a perceptual distortion measure PDM as

Here, gz is the gain factor associated with inter-orientation () and intra-frequency () masking. In (6.24), is computed separately, since the LL band contains a substantial portion of the image energy in the transform domain, exhibiting a higher level of sensitivity to changes than that of all oriented bands at all resolution levels:

Here, is a scaling constant, and are, respectively, the processed and reference visually weighted DWT coefficients for the LL band of the lowest resolution level, and is as defined in (6.18).

6.2.3.3 Supra-threshold Vision Models

A wide range of picture processing and compression applications require cost-effective solutions and belong to, more often than not, the so-called supra-threshold domain, where processing distortions or compression artifacts are visible. A supra-threshold wavelet coefficient quantization experiment reported that the first three visible differences (relative to the original image) are well predicted by an exponential function of sub-band standard deviation, and regression lines with respect to JND2 and JND3 are parallel to that of JND1 [29].

The Most Apparent Distortion (MAD) measures supra-threshold distortion using a detection model and appearance model in the form of [30]

where Ddetection is the perceived distortion due to visual detection, which is formulated in a similar way to JND models, and Dappearance is a visual appearance-based distortion measure based on changes in log-Gabor statistics such as the standard deviation, skewness, and kurtosis of sub-band coefficients, and is weight adapted to the severity of the distortion as measured by Ddetection:

with β1 = 0.467 and β2 = 0.130.

6.2.4 Lightweight Bit-Stream-Based Models [9]

In real-time visual communications, broadcasting, and entertainment services, QoE assessment and monitoring tasks face various constraints such as availability of full or partial information on reference pictures, computation power, and time. While no-reference picture quality metrics provide feasible solutions [25, 88], investigations have been prompted into lightweight QoE methods and associated standardization activities. There are at least three identifiable models, including the parametric model, packet-layer model, and bit-stream-layer model. With very limited information acquired or extracted from the transmission payload, stringent transmission delay constraints and limited computational resources, these models share a common technique – that is, optimization of perceptual quality or distortion predictors via, for example, regression or algorithms of similar trade using ground truth subjective test data (e.g., MOS or DMOS) and optimization criteria such as Pearson linear correlation, Spearman rank-order correlation, outlier ratio, and Root Mean Square Error (RMSE) [18, 25].

6.2.4.1 Parametric Model

Relying on Key Performance Indicators (KPIs) collected by network equipment via statistical analysis, a crude prediction of perceived picture quality or distortion is formulated using (bit) Rate (R) and Packet Loss Rate (PLR) along with side information (e.g., codec type and video resolution), which may be used to assist the adaptation of model parameters to differently coded pictures. Since the bit rate does not correlate well with the MOS data for pictures of varying content, packet loss occurring at different locations in a bit stream may have significantly different impact on perceived picture quality [3]; the quality estimation accuracy based on this model is limited, while the computation required is usually trivial.

6.2.4.2 Packet-Layer Model

With more information available via packet header analysis, distortions at picture frame level can be estimated better with information on coding parameters such as frame type and bit rate per frame, frame rate and position of lost packets, as well as PLR. The temporal complexity of video content can be estimated using ratios between bit rates of different type of frame. This enables temporal pooling for better quality or distortion prediction.

6.2.4.3 Bit-Stream-Layer Model

By accessing the media payload as well as packet-layer information, this model allows picture quality estimation either with or without pixel information [21, 88].

6.3 Offline and Online Evaluation

Evaluation of QoP for visual communication, broadcasting, and entertainment services may be conducted at different points between the source and the receiver [6, 9, 25, 88] for the purpose of product, system, or service provider quality control, monitoring and regulation/optimization, performance benchmarking; QoP monitoring and regulation/optimization along transmission path(s) within the network (e.g., at nodes) [89]; and QoP advisory and feedback at the receiver. The suitability of QoP measures based on various models and approaches to software or hardware online or offline performance evaluation depends on the measurement point/location in the encoding, transmission, and decoding chain, the availability of the reference video sequence(s), obtainable hardware and/or software computing resources, and the computational complexity of the QoP metrics. Table 6.1 shows the feasibility of QoP measures based on various models and approaches for online or offline assessments.

6.4 Remarks

To conclude this chapter, a number of observations can be made with respect to the current state of play in QoE for visual signal compression and transmission.

First, HVS model-based quality metrics have higher computational complexity than feature-driven-based, NSS-based, or lightweight quality measures, which makes software online solutions to QoE assessment all but impractical based on current computing technologies, if not entirely impossible, for most quality monitoring applications. Hardware online solutions have been demonstrated for full-reference quality assessments, which have a higher degree of system complexity and incur considerably more cost compared with alternative approaches.

Second, existing IQA and VQA metrics [16] have demonstrated their ability and success in grading the quality of pictures, corresponding to the traditional ACR subjective test data [11, 16]. However, it remains a challenge whether or not these metrics can be equally effective and able to produce accurate and robust values which correspond to JNND, JND1, JND2, etc., respectively, for quality-driven perceptually lossless and/or perceptual quality-regulated coding and transmission applications.

Table 6.1 Feasibility of QoP measures for online or offline assessment On-: online evaluation: Off-: offline evaluation: Y: suitable; N: unsuitable.

Encoding Network nodes Decoding
Coder R-DO Coder Evaluation Evaluation Evaluation
Hardware Software Hardware Software Hardware Software Hardware Software
Computation
Type of metrics On- Off- On- Off- On- Off- On- Off- On- Off- On- Off- On- Off- On- Off- Complexity
HVS model JND model based Y Y N Y Y Y Y Y                 Moderate to high
Multichannel model based Y Y N Y Y Y N Y High
Suprathreshold vision model based Moderate to high
Feature PQS Y Y N Y Y Y N Y Moderate to high
s-hat Y Y N Y Y Y N Y Moderate
VQM Y Y N Y Y Y N Y Moderate
NSS SSIM Y Y Y Y Y Y Y Moderate
Model VIF Y Y Y Y Y Y High
STSIM Y Y Y Y Y Y Moderate
Light weight Parametric model Y Y Y Y Low
Packet layer model Y Y Y Y Low
Bit-stream layer model Y Y Y Y Y Y Y Y Low to moderate

Third, there has been an obvious lack of reports on HVS modeling and perceptual distortion measures which capture 3-D video coding artifacts and distortions for 3-D visual signal coding and transmission applications.

Fourth, there have been very limited investigations into QoE assessment which integrates audio and visual components beyond the preliminary based on human perception and integrated human audiovisual system modeling [6, 7].

Significant theoretical and practical contributions to QoE research and development are required to complete the ongoing transition in audiovisual communications, broadcasting, and entertainment systems, and applications from the best-effort rate-driven technology-centric service to a quality-driven user-centric quality-ensured experience [3].

Acknowledgments

H.R. Wu is indebted to all his past and present collaborators and co-authors of joint publications relevant to the subject matter for their invaluable contributions to the material this chapter is sourced from and based on. Special thanks go to Professor W. Lin of Nanyang Technological University, Singapore, Dr. A.R. Reibman of AT&T Research, USA, Professor F. Pereira of Instituto Superior Tecnico-Insituto de Telecomunicacoes, Portugal, Professor S.W. Hemami of Northeastern University, USA, Professor F. Yang of Xidian University, China, Professor S. Wan of Northwestern Polytechnical University, China, Professor L.J. Karam of Arizona State University, USA, Professor K.R. Rao of University of Texas at Arlington, USA, Dr. D.M. Tan of HD2 Technologies Pty Ltd, Australia, Dr. D. Wu of HD2 Technologies Pty Ltd, Australia, Dr. T. Ferguson of Flexera Software, Australia, and Dr. C.J. van den Branden Lambrecht.

Notes

References

  1. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘Vocabulary for performance and quality of service, Amendment 2: New definitions for inclusion in Recommendation ITU-T P.10/G.100.’ Rec. P.10/G.100, July 2008.
  2. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘Vocabulary for performance and quality of service, Amendment 3: New definitions for inclusion in Recommendation ITU-T P.10/G.100.’ Rec. P.10/G.100, December 2011.
  3. Wu, H.R., Reibman, A., Lin, W., Pereira, F., and Hemami, S., ‘Perception-based visual signal compression and transmission’ (invited paper). Special Issue on Perception-Based Media Processing. Proceedings of the IEEE, 101(9), 2013, 2025–2043.
  4. Rouse, D.M., Hemami, S.S., Pépion, R., and Le Callet, P., ‘Estimating the usefulness of distorted natural images using an image contour degradation measure.’ Journal of the Optical Society of America A, 28(2), 2011, 157–188.
  5. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘Opinion model for video-telephony applications.’ Rec. G.1070, April 2007.
  6. Coverdale, P., Möller, S., Raake, A., and Takahashi, A., ‘Multimedia quality assessment standards in ITU-T SG12.’ IEEE Signal Processing Magazine, 28(6), 2011, 91–97.
  7. Pinson, M.H., Ingram, W., and Webster, A., ‘Audiovisual quality components.’ IEEE Signal Processing Magazine, 28(6), 2011, 60–67.
  8. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘P.NBAMS ToR.’ SG12 Doc. TD-379, September 2010.
  9. Yang, F. and Wan, S., ‘Bitstream-based quality assessment for networked video: A review., IEEE Communications Magazine, November 2012, pp. 203–209.
  10. International Telecommunication Union, Radiocommunication Sector (ITU-R), ‘Methodology for the subjective assessment of the quality of television pictures.’ Rec. BT.500-13, January 2012.
  11. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘Subjective video quality assessment methods for multimedia applications.’ Rec. P.910, April 2008.
  12. Corriveau, P., ‘Video quality testing.’ In Wu, H.R. and Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding. CRC Press, Boca Raton, FL, 2006, pp. 125–153.
  13. International Telecommunication Union, Radiocommunication Sector (ITU-R), ‘Studio encoding parameters of digital television for standard 4:3 and wide-screen 16:9 aspect ratios.’ Rec. BT.601-7, March 2011.
  14. International Telecommunication Union, Radiocommunication Sector (ITU-R), ‘Parameter values for the HDTV standards for production and international programme exchange.’ Rec. BT.709-5, April 2002.
  15. Wandell, B.A., Foundations of Vision. Sinauer, Sunderland, MA, 1995.
  16. International Telecommunication Union, Telecommunication Standardization Sector (ITU-T), ‘Objective perceptual video quality measurement techniques for digital cable television in the presence of a full reference.’ Rec. J.144, March 2004.
  17. Zhang, X. and Wandell, B.A., ‘Color image fidelity metrics evaluated using image distortion maps.’ Signal Processing, 70, 1998, 201–214.
  18. Yu, Z., Wu, H., Winkler, S., and Chen, T., ‘Objective assessment of blocking artifacts for digital video with a vision model’ (invited paper). Proceedings of the IEEE, 90(1), 2002, 154–169.
  19. Watson, A.B. (ed.), Digital Images and Human Vision. MIT Press, Cambridge, MA, 1993.
  20. van den Branden Lambrecht, C. (ed.), Special Issue on Image and Video Quality Metrics. Signal Processing, 70(3), 1998.
  21. Wu, H.R. and Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding. CRC Press, Boca Raton, FL, 2006.
  22. Muntean, G.-M., Ghinea, G., Frossard, P., Etoh, M., Speranza, F., and Wu, H.R. (eds), Special Issue on Quality Issues on Multimedia Broadcasting. IEEE Transactions on Broadcasting, 54(3), 2008.
  23. Karam, L.J., Ebrahimi, T., Hemami, S., et al. (eds), Special Issue on Visual Media Quality Assessment. IEEE Journal on Selected Topics in Signal Processing, 3(2), 2009.
  24. Lin, W., Ebrahimi, T., Loizou, P.C., Moller, S., and Reibman, A.R. (eds), Special Issue on New Subjective and Objective Methodologies for Audio and Visual Signal Processing. IEEE Journal on Selected Topics in Signal Processing, 6(6), 2012.
  25. Hemami, S.S. and Reibman, A.R., ‘No-reference image and video quality estimation: Applications and human-motivated design.’ Signal Processing: Image Communication, 25, 2010, 469–481.
  26. Lin, W. and Jay Kuo, C.-C., ‘Perceptual visual quality metrics: A survey.’ Journal of Visual Communication and Image Representation, 22(4), 2011, 297–312.
  27. Bovik, A.C., ‘Automatic prediction of perceptual image and video quality’ (invited paper). Special Issue on Perception-Based Media Processing. Proceedings of the IEEE, 101(9), 2013, 2008–2024.
  28. Lin, W., ‘Computational models for just-noticeable difference.’ In Wu, H.R. and Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding. CRC Press, Boca Raton, FL, 2006, pp. 281–303.
  29. Ramos, M.G. and Hemami, S.S., ‘Suprathreshold wavelet coefficient quantization in complex stimuli: Psychophysical evaluation and analysis.’ Journal of the Optical Society of America A, 18(10), 2001, 2385–2397.
  30. Larson, E.C. and Chandler, D.M., ‘Most apparent distortion: Full-reference image quality assessment and the role of strategy.’ Journal of Electronic Imaging, 19(1), 2010, 011006.
  31. Corriveau, P., Gojmerac, C., Hughes, B., and Stelmach, L., ‘All subjective scales are not created equal: The effect of context on different scales.’ Signal Processing, 77(1), 1999, 1–9.
  32. Plompen, R., Motion video coding for visual telephony. PTT Research Neher Laboratories, 1989.
  33. ANSI T1.801.02-1995, Digital Transport of Video Teleconferencing/Video Telephony Signals – Performance Terms, Definitions, and Examples. American National Standard for Telecommunications, ANSI, 1995.
  34. Yuen, M. and Wu, H.R., ‘A survey of hybrid MC/DPCM/DCT video coding distortions.’ Signal Processing, 70, 1998, 247–278.
  35. Chandler, D.M., ‘Seven challenges in image quality assessment: Past, present, and future research.’ Signal Processing, 2013, 2013, 1–53.
  36. Wu, H.R., Yu, Z., and Qiu, B., ‘Multiple reference impairment scale subjective assessment method for digital video.’ Proceedings of the 14th International Conference on Digital Signal Processing, Santorini, Greece, July 2002, Vol. 1, pp. 185–189.
  37. Wu, D., Tan, D.M., Baird, M., DeCampo, J., White, C., and Wu, H.R., ‘Perceptually lossless medical image coding.’ IEEE Transactions on Medical Imaging, 25(3), 2006, 335–344.
  38. Oh, H., Bilgin, A., and Marcellin, M.W., ‘Visually lossless encoding for JPEG2000.’ IEEE Transactions on Image Processing, 22(1), 2013, 189–201.
  39. Peli, E., ‘Contrast in complex images.’ Journal of the Optical Society of America A, 7(10), 1990, 2032–2040.
  40. Chandler, D.M. and Hemami, S.S., ‘Effects of natural images on the detectability of simple and compound wavelet subband quantization distortions.’ Journal of the Optical Society of America A, 20(7), 2003, 1164–1180.
  41. Budrikis, Z.L., ‘Visual fidelity criterion and modeling.’ Proceedings of the IEEE, 60(7), 1972, 771–779.
  42. Mannos, J.L. and Sakrison, D.J., ‘The effects of a visual fidelity criterion on the encoding of images.’ IEEE Transactions on Information Theory, 20(4), 1974, 525–536.
  43. Pica, A., Isnardi, M., and Lubin, J., ‘HVS based perceptual video encoders.’ In Wu, H.R. and Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding. CRC Press, Boca Raton, FL, 2006, pp. 337–360.
  44. Shannon, C.E., ‘Coding theorems for a discrete source with a fidelity criteria.’ IRE National Convention Record, Vol. 7, 1959, pp. 142–163.
  45. Berger, T. and Gibson, J.D., ‘Lossy source coding.’ IEEE Transactions on Information Theory, 44(10), 1998, 2693–2723.
  46. Jayant, N.S. and Noll, P., Digital Coding of Waveforms: Principles and Applications to Speech and Video. Prentice-Hall, Englewood Cliffs, NJ, 1984.
  47. Wang, Z. and Bovik, A.C., ‘Mean squared error: Love it or leave it.’ IEEE Signal Processing Magazine, 26(1), 2009, 98–117.
  48. Jayant, N.S., Johnston, J., and Safranek, R., ‘Signal compression based on models of human perception.’ Proceedings of the IEEE, 81(10), 1993, 1385–1422.
  49. Shannon, C.E., ‘A mathematical theory of communication.’ Bell Systems Technology Journal, 27, 1948, 379–423.
  50. Clarke, R.J., Transform Coding of Images. Academic, New York, 1985.
  51. Miyahara, M., ‘Quality assessments for visual service.’ IEEE Communications Magazine, 26, 1988, 51–60.
  52. Miyahara, M., Kotani, K., and Algazi, V.R., ‘Objective picture quality scale (PQS) for image coding.’ IEEE Transactions on Communications, 46(9), 1998, 1215–1226.
  53. Miyahara, M. and Kawada, R., ‘Philosophy of picture quality scale.’ In Wu, H.R. and Rao, K.R. (eds), Digital Video Image Quality and Perceptual Coding. CRC Press, Boca Raton, FL, 2006, pp. 181–223.
  54. Webster, A.A., Jones, C.T., Pinson, M.H., Voran, S.D., and Wolf, S., ‘An objective video quality assessment system based on human perception.’ SPIE Proceedings: Human Vision, Visual Processing, and Digital Display IV, 1913, 1993, 15–26.
  55. Pinson, M.H. and Wolf, S., ‘A new standardized method for objectively measuring video quality.’ IEEE Transactions on Broadcasting, 50(3), 2004, 312–322.
  56. Párraga, C.A., Troscianko, T., and Tolhurst, D.J., ‘The effects of amplitude-spectrum statistics on foveal and peripheral discrimination of changes in natural images, and a multi-resolution model.’ Vision Research, 45(25&26), 2005, 3145–3168.
  57. Wang, Z., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P., ‘Image quality assessment: From error visibility to structural similarity.’ IEEE Transactions on Image Processing, 13(4), 2004, 600–612.
  58. Sheikh, H. and Bovik, A.C., ‘Image information and visual quality.’ IEEE Transactions on Image Processing, 15(2), 2006, 430–444.
  59. Pappas, T.N., Neuhoff, D.L., de Ridder, H., and Zujovic, J., ‘Image analysis: Focus on texture similarity’ (invited paper). Special Issue on Perception-Based Media Processing. Proceedings of the IEEE, 101(9), 2013, 2044–2057.
  60. Hassan, M. and Bhagvati, C., ‘Structural similarity measure for color images.’ International Journal of Computer Applications, 43(14), 2012, 7–12.
  61. Wang, Z. and Li, Q., ‘Video quality assessment using a statistical model of human visual speed perception.’ Journal of the Optical Society of America A, 24(12), 2007, B61–B69.
  62. Wainwright, M.J., Simoncelli, E.P., and Wilsky, A.S., ‘Random cascades on wavelet trees and their use in analyzing and modeling natural images.’ Applied and Computational Harmonics Analysis, 11, 2001, 89–123.
  63. Sekuler, R. and Blake, R. Perception, 3rd edn. McGraw-Hill, New York, 1994.
  64. Ahumada, A.J., ‘Luminance-model-based DCT quantization for color image compression.’ SPIE Proceedings: Human Vision, Visual Processing and Digital Display III, 1666, 1992, 365–374.
  65. Peterson, H.A., Ahumada, A.J., and Watson, A.B., ‘Improved detection model for DCT coefficient quantization.’ SPIE Proceedings: Human Vision, Visual Processing and Digital Display, 1913, 1993, 191–201.
  66. Watson, A.B., ‘DCTune: A technique for visual optimization of DCT quantization matrices for individual images.’ Society for Information Display Digest of Technical Papers XXIV, 1993, pp. 946–949.
  67. Safranek, R.J., ‘A JPEG compliant encoder utilizing perceptually based quantization.’ SPIE Proceedings: Human Vision, Visual Processing and Digital Display V, 2179, 1994, 117–126.
  68. Safranek, R.J. and Johnston, J.D., ‘A perceptually tuned subband image coder with image dependent quantization and post-quantization.’ Proceedings of IEEE ICASSP, 1989, pp. 1945–1948.
  69. Chou, C.-H. and Li, Y.-C., ‘A perceptually tuned subband image coder based on the measure of just-noticeable distortion profile.’ IEEE Transactions on Circuits and Systems for Video Technology, 5, 1995, 467–476.
  70. Höntsch, I. and Karam, L.J., ‘Locally adaptive perceptual image coding.’ IEEE Transactions on Image Processing, 9(9), 2000, 1285–1483.
  71. Liu, Z., Karam, L.J., and Watson, A.B., ‘JPEG2000 encoding with perceptual distortion control.’ IEEE Transactions on Image Processing, 15(7), 2006, 1763–1778.
  72. Yang, X.K., Lin, W.S., Lu, Z.K., Ong, E.P., and Yao, S.S., ‘Just noticeable distortion model and its applications in video coding.’ Signal Processing: Image Communication, 20(7), 2005, 662–680.
  73. Liu, A., Lin, W., Paul, M., Deng, C., and Zhang, F., ‘Just noticeable difference for image with decomposition model for separating edge and textured regions.’ IEEE Transactions on Circuits and Systems for Video Technology, 20(11), 2010, 1648–1652.
  74. Daly, S., ‘Engineering observations from spatiovelocity and spatiotemporal visual models.’ In van den Branden Lambrecht, C.J. (ed.), Vision Models and Applications to Image and Video Processing. Kluwer, Norwell, MA, 2001.
  75. Chou, C. and Liu, K., ‘A perceptually tuned watermarking scheme for color images.’ IEEE Transactions on Image Processing, 19(11), 2010, 2966–2982.
  76. Chou, C.-H. and Chen, C.-W., ‘A perceptually optimized 3-D subband image codec for video communication over wireless channels.’ IEEE Transactions on Circuits and Systems for Video Technology, 6(2), 1996, 143–156.
  77. Chandler, D.M. and Hemami, S.S., ‘VSNR: A wavelet-based visual signal-to-noise ratio for natural images.’ IEEE Transactions on Image Processing, 16(9), 2007, 2284–2298.
  78. Watson, A.B. and Solomon, J.A., ‘A model of visual contrast gain control and pattern masking.’ Journal of the Optical Society of America A, 14(9), 1997, 2379–2391.
  79. van den Branden Lambrecht, C.J., ‘Perceptual models and architectures for video coding applications.’ Ph.D. dissertation, Swiss Federal Institute of Technology, Zurich, 1996.
  80. Winkler, S., ‘A perceptual distortion metric for digital color video.’ SPIE Proceedings: Human Vision and Electronic Imaging IV, 3644, 1999, 175–184.
  81. Tan, D.M., Wu, H.R., and Yu, Z., ‘Perceptual coding of digital monochrome images.’ IEEE Signal Processing Letters, 11(2), 2004, 239–242.
  82. Tan, D.M., Tan, C.-S., and Wu, H.R., ‘Perceptual colour image coder with JPEG2000.’ IEEE Transactions on Image Processing, 19(2), 2010, 374–383.
  83. Visual Information Systems Research Group, ‘A methodology for imaging system design and evaluation.’ Sarnoff Corporation, 1995.
  84. Visual Information Systems Research Group, ‘Sarnoff JND vision model algorithm description and testing.’ Sarnoff Corporation, 1997.
  85. Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I., ‘Image coding using wavelet transform.’ IEEE Transactions on Image Processing, 1(2), 1992, 205–220.
  86. Mallat, S.G., ‘A theory for multiresolution signal decomposition: The wavelet representation.’ IEEE Transactions on Pattern Analysis and Machine Intelligence, 11(7), 1989, 674–693.
  87. International Telecommunication Union, Radiocommunication Sector (ITU-R), ‘Objective perceptual video quality measurement techniques for standard definition digital broadcast television in the presence of a full reference.’ Rec. BT.1683, June 2004.
  88. Yang, F., Wan, S., Xie, Q., and Wu, H.R., ‘No-reference quality assessment for networked video via primary analysis of bit stream.’ IEEE Transactions on Circuits and Systems for Video Technology, 20(11), 2010, 1544–1554.
  89. Hioki, W., Telecommunications, 2nd edn. Prentice-Hall, Englewood Cliffs, NJ, 1995.
  90. Wang, Z., Lu, L., and Bovik, A.C., ‘Video quality assessment based on structural distortion measurement.’ Signal Processing: Image Communication, 19(2), 2004, 121–132.
  91. Wu, H.R., Lin, W., and Ngan, K.N., ‘Rate-perceptual-distortion optimization (RpDO) based picture coding – issues and challenges.’ Proceedings of 19th International Conference on Digital Signal Processing, Hong Kong, August 2014, pp. 777–782.

Acronyms

DWT

Discrete Wavelet Transform

HVS

Human Visual System

IQA

Image Quality Assessment

MRIS

Multiple Reference Impairment Scale

NSS

Natural Scene Statistics

PDA

Principal Decomposition Analysis

PQS

objective Picture Quality Scale

QoE

Quality of Experience

QoP

perceived Quality of Picture

QoS

Quality of Service

REC

Recognition Equivalence Class

RT

Recognition Threshold

UoP

perceived Utility of Picture

VDU

Visual Distortion Unit

VQA

Video Quality Assessment

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset