Anish Mittal1, Anush K. Moorthy2 and Alan C. Bovik3
1Nokia Research Center, USA
2Qualcomm Inc., USA
3University of Texas at Austin, USA
Visual quality assessment as a field has gained tremendous importance in the past decade, as evinced by the flurry of research activity that has been conducted in leading universities and commercial corporations on topics that fall under its umbrella. The reason for this is the exponential growth of visual data that is being captured, stored, transmitted, and viewed across the world. Driving the previously unfathomable growth in communications, images and videos now form a major chunk of transmitted data. This is not surprising since, from the dawn of time, humans have been visual animals who have preferred images over the written word. One need only look at the amount of area devoted to visual signal processing in the human brain to surmise that vision and its perception forms a major chunk of neurological processing [1, 2]. Hence, researchers have attempted to decode human vision processing and have used models of the visual system for image-processing applications [3–7].
While an image can convey more than a thousand words, transmission of visual content occupies an equivalently large amount of bandwidth in modern communication systems, hence images and videos are compressed before storage or transmission. With increasing resolutions and user expectations, increasingly scarce bandwidth is being strained. While the user is concerned only with the final quality of the image or the video, the service provider attempts to provide said quality with the least possible expense of bandwidth. With increasing quality expectations, algorithm developers attempt to better image/video-processing algorithms so that visual quality at the output is enhanced. In all cases, however, the quantity being addressed is a highly subjective one, that of “visual quality” as perceived by a human observer. The goal of automated Quality Assessment (QA) is to attempt to produce an estimate of this human-perceived quality, so that such a quantitative measure may be used in place of subjective human perception. Our discussion here centers on quality assessment algorithms and their applications in modern image/video-processing systems.
Typically, QA algorithms are classified on the basis of the amount of information that is available to the algorithm. Full-Reference (FR) quality assessment algorithms require the distorted image/video whose quality needs to be accessed as well as the clean, pristine reference image/video for comparison [8–17], whereas Reduced-Reference (RR) approaches only use limited information regarding the reference image/video in lieu of the actual reference content itself, together with the distorted image [18–21]. Blind or No-Reference (NR) QA refers to automatic quality assessment of an image/video using an algorithm which only utilizes the distorted image/video whose quality is being assessed [22–45].
While tremendous progress has been made in understanding human vision, our current knowledge is far from complete. In the face of such incomplete knowledge, it is not surprising that researchers have focused on simpler FR algorithms [14, 15, 18]. FR algorithms themselves have many applications, especially in the field of image and video compression where the original image/video is available. Apart from applications, FR algorithms also provide researchers with a fundamental foundation on which NR algorithms can be built. The tools that proved useful in the FR framework have been modified suitably and later used for NR algorithms. For example, Natural-Scene-Statistics (NSS) models of images[16, 17] have been used successfully in recent NR algorithms [35–41, 43–45]. Since FR research has been covered in this compendium and elsewhere [7, 46, 47], in this chapter we shall focus on NR algorithms.
One important aspect of quality assessment research is evaluating the performance of an algorithm. Since QA algorithms attempt to reproduce human opinion scores, it is obvious that a good algorithm is one that correlates well with human opinion of quality. In order to estimate human opinion of quality, large databases spanning a wide range of contents and visual impair-ments (such as those we detail later in this chapter) are created, and large-scale human studies are conducted. The human opinion scores thus produced represent the ground-truth and are used in evaluating algorithm performance. Apart from evaluating performance, these scores serve an additional, very important function in the case of NR algorithms.
NR QA approaches can be classified on the basis of whether the algorithm has access to subjective/human opinion prior to deployment. Algorithms could use machine learning techniques along with human judgments of quality during a “training” phase and then could attempt to reproduce human opinion during the “testing” phase. Such algorithms, which first learn human behavior from subjective quality data, are referred to as Opinion-Aware (OA) NR algorithms. Opinion-aware algorithms are the first step toward building a completely blind algorithm (i.e., one that is not only blind to the reference image, but also to the human opinion of quality). Such completely blind algorithms, which do not use subjective data on quality to perform blind quality assessment, are termed Opinion-Unaware (OU) algorithms. While both OA and OU algorithms have practical applications, OU NR algorithms hold more practical relevance. This is because it is impossible to anticipate all of the different distortions that may occur in a practical system. Further, no controlled database can possibly span all distortions and quality ranges well enough. Recent research in NR QA has spanned both OA and OU algorithms. While OA algorithms perform as well as FR algorithms, recent OU approaches have been catching up and perform exceedingly well without access to human opinion [39–45].
In this chapter, we shall detail recent no-reference approaches to image and video quality assessment. Specifically, we shall cover both opinion-aware and opinion-unaware models. Most of the approaches that we shall cover are based on understanding and modeling the underlying statistics of natural images and/or distortions using perceptual principles. These approaches measure deviations from statistical regularities and quantify such deviations, leading to estimates of quality. In this chapter, we shall analyze the motivation and the principles underlying such statistical descriptions of quality and describe the algorithms in detail. We shall provide exhaustive comparative analysis of these approaches and discuss the potential applications of no-reference algorithms. Specifically, we shall cover the case of distortion-unaware perceptual image repair and quality assessment of tone-mapped images. We then segue into a discussion of the challenges that lie ahead for the field to gain maturity and other practical application scenarios that we envision for these algorithms. We conclude the chapter with a discussion of some philosophical predictions of future directions that the field of automatic quality assessment of images and videos may take.
Early research on no-reference quality assessment was focused toward predicting the quality of images afflicted with specific distortions [22–29]. Such approaches aim to model distortion-specific artifacts that can relate well to the loss in visual quality. For example, JPEG compressed images could be evaluated for their visual quality using the strength of edges at block boundaries [23, 24, 48, 49]. Such algorithms are restricted to the distortions they are designed for, limiting their scalability and usage in real scenarios, since the distortion type afflicting the image is almost never a known quantity. Having said that, these algorithms represented the first steps toward truly blind NR algorithms and hence form an important landmark in the field of quality assessment.
The next leap in NR algorithms was that of developing distortion-agnostic approaches to QA. These algorithms can predict the quality of the image without information on the distortion afflicting the image [30–45]. Some of these models are also capable of identifying the distortion afflicting the image [35, 39]; information that could potentially be used for varied applications [50].
Early general-purpose, distortion-unaware NR QA algorithms were developed using models that can learn to predict human judgments of image quality from databases of human-judged distorted images [30–39]. Such OA models, which use distorted images with co-registered human scores, have been shown to deliver high performance on images corrupted with different kinds of distortions and severities [51]. As we have mentioned before, while these algorithms are tremendously useful, they may be limited in their application since they are limited by the distortions that they are trained on, as well as bound by the quality ranges that the controlled training database has on offer. Since no controlled database can completely offer the variety of practical distortions, the algorithms trained on these databases remain partially crippled. This is not to say that the underlying model used to “train” these algorithms is at fault; in fact, as with the FR-to-NR transition, the models used for OA NR algorithms have been tweaked and utilized for opinion-unaware approaches.
OU approaches predict the visual quality of images without any access to human judgments during the training phase [39–45]. During the training phase, these algorithms may or may not have access to the distorted images. OU approaches which are trained on distorted images are limited by the distortion types and are referred to as Distortion-Aware (DA) algorithms. OU DA algorithms may be viewed as close cousins of the OA approaches, since both of them are limited by the training data available. OU Distortion-Unaware (DU) algorithms are those that do not utilize distorted images during the training phase. Since OU DU algorithms have no access to any distorted image or human opinion a priori, they represent a final frontier in NR QA research. OU DU algorithms find applications in uncontrolled environments such as highly unpredictable wireless networks or quality assessment of user-captured photographs and so on. It may seem that these algorithms have almost no information to make judgments on quality, but researchers find motivation in the fact that leading FR IQA models (such as the Structural SIMilarity index, SSIM [14]) are both opinion and distortion-unaware. Since one of the goals of NR QA research is to develop algorithms that could replace FR algorithms, OU DU NR algorithm development has caught the fancy of researchers in recent years [43–45]. A summary of NR algorithm classification is given in Figure 5.1.
OA DA approaches make use of both distorted images and associated human judgments to develop QA models. Depending on the types of features they extract from the image/video, they can be categorized into codebook-based, ensemble-based, and Natural Scene Statistics (NSS)-based approaches.
The authors of [31, 32] use Gabor filter responses, based on local appearance descriptors to form a visual codebook post-quantization. The codebook feature space is then used to yield an estimate of quality. This is accomplished in one of two ways: (a) an example-based method or (b) a Support Vector Regression (SVR)-based method. The example-based method estimates the quality score of the test image using a weighted average of the quality scores of training images, where the authors assume that there exists a linear relationship between codeword histograms and quality scores. In contrast, the SVR-based method “learns” the mapping between the codeword histograms and the quality scores during the “training” phase. The approach is competitive in performance with other general-purpose NR IQA measures. However, its computational complexity limits its use in practical applications.
Tang et al. [34] proposed an approach which learns an ensemble of regressors trained on three different groups of features – natural image statistics, distortion texture statistics, and blur/noise statistics. These regressors learn the mapping from feature space to quality and when deployed during the test phase, the algorithm reproduces the quality of the image using a combination of learned regressors. Another approach is based on a hybrid of curvelet, wavelet, and cosine transforms [52]. Although these approaches work on a variety of distortions, each set of features (in the first approach) and transforms (in the second approach) caters only for certain kinds of distortion processes, thereby limiting the deployment of these algorithms.
NSS-based approaches work on the rationale that all natural scenes obey statistical laws that are independent of the content of the scene being imaged. For instance, local quantities such as contrast are scale invariant in nature and follow heavy-tailed distributions [53]. In the case of distorted images, however, such scene statistics deviate from natural distributions, rendering them unnatural. These deviations, when quantified appropriately, can be utilized to evaluate the quality of the distorted image. This strategy of approaching the problem of NR QA from a natural scene perspective instead of a distortion-based perspective makes NSS-based NR approaches much less dependent on distortion-specific characteristics such as blocking. NSS models have proved to be very powerful tools for quality assessment in general, and have been used successfully for developing FR QA algorithms [16, 17, 25] and RR algorithms [54] in the past as well.
Recently, three successful blind/NR QA approaches to Image Quality Assessment (IQA) based on NSS were proposed [35–37], which exploit different NSS regularities in wavelet, DCT, and spatial domains, respectively. An NR IQA model developed in the wavelet domain, dubbed the Distortion Identification-based Image Verity and INtegrity Evaluation (DIIVINE) index, makes use of a series of statistical features derived from an NSS wavelet coefficient model. These features are subsequently used in the two-stage framework for QA, where the first stage identifies the distortion type and the second stage performs distortion-specific quality assessment [35]. While the approach performs at par with some of the successful FR IQA algorithms, the expensive computation of spatial correlation-based features makes it impractical for use in real-world applications. Their modification on the lines of proposed pairwise product-based features in the Blind/Referenceless Image Spatial QUality Evalu-ator (BRISQUE) model can alleviate the problem [37].
The DCT domain-based approach – BLInd Image Notator using DCT Statistics (BLIINDS-II index) – computes a small number of features from an NSS model of block DCT coefficients [36]. Such NSS features, once calculated, are supplied to a regression function that predicts human judgments of visual quality. In comparison with DIIVINE, BLIINDS-II is a single-stage algorithm and instead of training multiple distortion-specific QA models, it only makes use of a single NSS model that is able to deliver highly competitive QA prediction power. Although BLIINDS-II uses a small number of features (4), the non-linear sorting of features makes the approach-computationally complex.
The third approach – BRISQUE – was developed with the express purpose of efficiency [37]. BRISQUE explores the possibility of a transform-free approach and operates directly on spatial pixel data. BRISQUE is based on the spatial NSS model of Ruderman [55] and uses pointwise statistics of locally normalized luminance signals and distribution of pairwise products of neighboring locally normalized luminance signals as features. Once these features are computed, a mapping from features to human judgment is learned using a regression module, yielding a measure of image quality.
This section summarizes approaches to NR QA that do not use human opinion scores to design QA models - OU DU NR QA algorithms. The advantage in such a scheme is that these approaches are not limited by the size of the databases with human judgments, thereby increasing their versatility. OU DA algorithms could either use large databases of reference and distorted images along with the corresponding FR algorithm scores as a proxy for human judgments, or in the ideal case, use only a set of pristine and distorted images together with the associated distorted categories.
Approaches based on visual words first decompose images using an energy-compacting filter bank and then divisively normalize the responses, yielding outputs that are well modeled using NSS models [41, 43]. Once such a representation is obtained, the image is divided into patches, and perceptually relevant NSS features are computed at each image patch. Features are computed from image patches obtained from both reference and distorted images to create distributions over visual words. Quality prediction is then accomplished by computing the Kullback–Leibler (KL) divergence between the visual word distribution of the distorted image and the signature visual word distribution of the space of exemplar images. One drawback of such an approach is that the creation of these visual word distributions from the features is lossy owing to the quantization involved in the process and could potentially affect predictions. Further, the approach is only as good as the diversity in the chosen training set of images and distortions and may not generalize to other distortions.
Algorithms based on topic models work on the principle that distorted images have certain latent characteristics that differ from those of “pristine” images [40]. These latent characteristics are explored through application of a “topic model” to visual words extracted from an assorted set of pristine and distorted images. Visual words, which are obtained using quality-aware NSS features, are used to determine the correct choice of latent characteristics, which are in turn capable of discriminating between pristine and distorted image content. The similarity between the probability of occurrence of the different topics in an unseen image and the distribution of latent topics averaged over a large number of pristine natural images is indicative of the image quality. The advantage of this approach is that it not only predicts the visual quality of the image, but also discovers groupings amongst artifacts of distortions in the corrupted images without any supervision. Unfortunately, in its current form, the approach does not perform as well as general-purpose OA models.
These approaches use quality scores produced by the application of an FR IQA algorithm on each distorted reference image pair in a training database as a proxy for human judgments of quality [42]. Distorted images and their reference versions are first partitioned into patches and a percentile pooling strategy is used to estimate the quality of each patch. The patches are then grouped into different groups based on their quality levels using special clustering techniques [56]. Quality-aware clustering is then applied to each group to learn the quality-aware centroids. Each patch of the distorted image is compared with the learned quality-aware centroids during the testing stage and the final quality score is assigned based on a simple weighted average. The score for each patch, once obtained, can be pooled to obtain the quality, of the image. This approach shows high correlation with respect to human judgments of image quality, and also high efficiency. As with all OA DU models, the approach is limited by the database of distortions that it is trained on and the quality estimates of the FR algorithm used as the proxy for human judgments.
A holy grail of the blind/NR IQA problem is that of the design of perceptual models that can predict the quality of distorted images with as little prior knowledge of the images or their distortions as possible [41, 43, 44]. This section discusses the first steps toward completely blind approaches to NR QA (i.e., those approaches that do not make use of any prior distortion knowledge or subjective opinion of quality to predict the quality of images – OU DU models).
This approach is an extension of the OU DA visual word-based NR QA model of [41, 43]. While the crux of the approach remains the same as that summarized above, the DU extension differs in the way the visual words are formed during the training stage of the algorithm. In the DU case, instead of using both the distorted and reference images, the model uses only the natural undistorted reference images to form the visual codebook.
The feature-to-visual-word-distribution conversion is a lossy process due to the quantization involved and affects the accuracy of human judgment prediction. The approach described below overcomes these shortcomings and delivers performance on a par with the top-performing FR and NR IQA models that require training on human-judged databases of distorted images.
This NR OU-DU IQA model [44], dubbed Naturalness Image Quality Evaluator (NIQE), is based on constructing a collection of quality-aware features. This approach is based on the principle that all “natural” images captured are distorted in some form or another. For instance, during image capture, there is always a loss of resolution due to the low-pass nature of the lens; further, there exists defocus blur in different parts of the image depending on the associated depth of field. Since humans appear to more heavily weight their judgments of image quality from image regions which are in focus and hence appear sharper, salient quality measurements can be made from “sharp” patches in an image. From amongst a collection of natural patches, this approach uses data only from those patches that are richest in information (i.e., those that are less likely to have been subjected to a limiting distortion such as blur). The algorithm extracts quality-aware perceptual features from these patches to construct an MVG model. The quality of a given test image is then expressed as the distance between an MVG fit of the NSS features extracted from the test image whose quality is to be assessed, and an MVG model of the quality-aware features extracted from the corpus of natural images. Experimental results demonstrate that this approach performs as well as top-performing FR IQA models that require corresponding reference images and NR IQA models that require training on large databases of human opinions of distorted images (i.e., OA DU models).
While there has been a lot of activity in the area of distortion-agnostic image quality assessment, the field of distortion-agnostic No-Reference Video Quality Assessment (NR VQA) has been relatively quiet. This is not to say that NR VQA algorithms do not exist, but most existing models are specific to the application that they were designed for and hence do not find wide-spread use. The popularity of distortion-specific measures for NR VQA can be attributed to two factors. First, VQA is a far more complex subject than IQA, and given that distortion-unaware models for NR IQA have just started to appear, such models for NR VQA are still forthcoming. Second, the lack of a universal measure for NR VQA and the need for blind quality assessment in many applications necessitates the development of such distortion-specific measures. Since this chapter has focused on distortion-specific NR IQA measures, and we believe that distortion-agnostic NR VQA algorithms are just around the corner, in this section we do not expressly summarize each proposed approach. Instead, we point the reader to relevant work in the area and to popular surveys which describe the algorithms in far more detail.
One of the most popular distortions that has been evaluated in the area of NR VQA is that of compression. Since video compression has been a popular subject for research and has tremendous practical applications, it comes as no surprise that many NR VQA metrics are geared toward compression. One artifact of compression that is commonly quantified is that of blocking [57–60], where edge strength at block boundaries is measured and correlated with perceptual quality. Techniques used include harmonic analysis based on Sobel edge detection [57], looking for correlations at 8 × 8 block boundaries in MPEG video [58], and using luminance masking along with edge-strength measures [59]. Another distortion that manifests due to compression is that of blur [61]. Jerkiness, which may be an artifact of the encoder, has also been evaluated for its quality using several techniques [62–66].
Researchers have also studied the quality effect of multiple coincident distortions on videos. The combination of blocking, ringing, and sharpness is evaluated in [67], where each distortion is evaluated using a technique that quantifies the distortion indicator. Packet-loss artifacts and their effect on visual quality, along with that of blocking due to compression, was studied in [68]. Another measure that evaluated the effect of blockiness, bluriness, and noisiness on visual quality was proposed in [69]. Apart from blocking and blurring, the authors of [70] evaluated the effects of motion artifacts on visual quality for 2.5G/3G systems. Bit-error impairments, noise, and blocking have been studied in [71], as has a motion-compensated approach [72].
Apart from spatial measures, researchers have also evaluated the effect of temporal quality degradations on visual perception. For example, motion information to evaluate the effect of frame-drop severity can be extracted from time-stamp information as in [65]. The authors segment the video and determine the motion activity for each of these segments, which is then used to quantify the significance of frame drops in the segments. Another measure of frame-drop quality is that in [73], where the discontinuity along time is computed using the mean-squared error between neighboring frames. The authors of [74] studied the effect of error concealment on visual quality. Channel distortion was modeled in [75], where information from macro blocks was used to quantify loss and its effect on visual quality. Jerkiness between frames was modeled using absolute frame differences between adjacent frames in [76].
The authors of [77] evaluate a multitude of factors to quantify overall qualities such as blur, blockiness, activity and so on. While this method still uses distortion-specific indicators of quality, it is one of the few NR VQA algorithms that does not limit itself to a particular application. A survey of such distortion-specific measures appears in [78].
The only truly distortion-agnostic measure for video quality was recently proposed in [79]. This algorithm is based on the algorithm for blind image quality assessment proposed in [36] and uses the principle of natural video statistics. As in the case of images, the assumption is that natural videos have certain underlying statistics that are destroyed in the presence of distortion. The deviation from naturalness is quantified to produce a quality measure. In [79], the authors use DCT coefficient differences between adjacent frames as the basis, and extract block-motion estimates to quantify motion characteristics. Features are then extracted by weighting DCT differences by the amount of motion and its effect on visual perception. The method is generic in nature and hence could be applied to a wide variety of distortions. The authors demonstrate that the algorithm quantifies quality for compression and packet loss using a popular video quality assessment database [80].
The performance of any IQA/VQA model is gauged by its correlation with human subjective judgments of quality, since the human is the ultimate receiver of the visual signal. Such human opinions of visual quality are generally obtained by conducting large-scale human studies, referred to as subjective quality assessment, where human observers rate a large number of distorted (and possibly reference) signals. When the individual opinions are averaged across the subjects, a Mean Opinion Score (MOS) or Differential Mean Opinion Score (DMOS) is obtained for each of the visual signals in the study, where the MOS/DMOS is representative of the perceptual quality of the visual signal. The goal of an objective QA algorithm is to predict quality scores for these signals such that the scores produced by the algorithm correlate well with human opinions of signal quality (MOS/DMOS). Practical application of QA algorithms requires that these algorithms compute perceptual quality efficiently. In this section, we summarize the available image and video databases.
A comparison of many of the databases mentioned here, as well as others, appears in [94]. The author of [94] also maintains a comprehensive list of image and video quality databases [95].
In this section we summarize the performance of some of the image quality assessment algorithms discussed in this chapter and compare their performance to that of leading full-reference algorithms. We do not report NR VQA performance, since there exists only one truly blind, distortion-agnostic NR VQA algorithm [79].
We use the LIVE IQA database [82], which consists of 29 reference images and 779 distorted images spanning five different distortion categories - JPEG and JPEG2000 (JP2K) compression, additive white Gaussian noise (WN), Gaussian blur (blur), and a Rayleigh fast-fading channel distortion (FF) as the test bench. The database provides associated DMOS for each distorted image, representing its subjective quality. We list the performances of the three FR indices, Peak-Signal-to-Noise Ratio (PSNR), single-scale Structural SIMilarity index (SSIM) [14], and the Multi-Scale Structural SIMilarity index (MS-SSIM) [15]. While the PSNR is often criticized for its poor performance, it forms a baseline for QA algorithm performance and is still widely used to quantify visual quality, while the latter is a leading FR algorithm with high correlation with human performance – a goal that all NR algorithms attempt to achieve.
The NR algorithms evaluated are: CBIQ [31], LBIQ [34], BLIINDS-II [36], DIIVINE [35], and BRISQUE [96], all OA-DA algorithms; TMIQ [40], an OU-DA approach and NIQE [44], an OU-DU algorithm. The correlations for CBIQ [31] and LBIQ [34] were provided by the authors.
Since all of the IQA approaches that we compare require a training procedure to calibrate the regressor module, we divided the LIVE database randomly into chosen subsets for training and testing. While the OU approaches do not require such training (they are trained on an alternate database of distorted + natural images or natural images only, as summarized before), and
FR approaches do not need any training at all, to ensure a fair comparison across methods, the correlations of predicted scores with subjective opinion of visual quality are only reported on the test set. The dataset was divided into 80% training and 20% testing such that no overlap occurs between train and test content. This train–test procedure was repeated 1000 times to ensure that there was no bias due to the spatial content used for training. We report the median performance across all iterations.
We use Spearman's Rank-Ordered Correlation Coefficient (SROCC) and Pearson's (Linear) Correlation Coefficient (LCC) to test the model. SROCC and LCC represent the correlation between algorithm and subjective scores so that a value of 1 indicates perfect correlation. Since LCC is a linear correlation, all algorithm scores are passed through a logistic non-linearity [82] before computing LCC for mapping to DMOS space. This is a standard procedure that is used to detail the algorithm performance of QA algorithms. The SROCC and LCC values of the algorithms summarized in this chapter are tabulated in Tables 5.1 and 5.2, respectively.
The correlations demonstrate that OA-DA algorithms correlate well with human perception approaching the performance of leading full-reference algorithms such as MS-SSIM and besting popular measures such as PSNR. While the performance of TMIQ (the OU-DA measure) is not as good, the results are encouraging, especially for certain distortions such as JP2K and JPEG compression, where the performance is close with that of PSNR. The OU-DU measure of NIQE produces a performance comparable with that of the OA-DA measures and hence to that of successful FR IQA algorithms. While there remains room for improvement, the tables demonstrate that state-of-the-art NR algorithm performance is comparable with that of FR algorithms and NR approaches can be used successfully in place of FR approaches.
Table 5.1 Median SROCC across 1000 train–test combinations on the LIVE IQA database. Italics indicate (OA/OU)-DA no-reference algorithms and bold face indicates OU-DU model algorithms
JP2K | JPEG | WN | Blur | FF | All | |
PSNR | 0.8646 | 0.8831 | 0.9410 | 0.7515 | 0.8736 | 0.8636 |
SSIM | 0.9389 | 0.9466 | 0.9635 | 0.9046 | 0.9393 | 0.9129 |
MS-SSIM | 0.9627 | 0.9785 | 0.9773 | 0.9542 | 0.9386 | 0.9535 |
CBIQ | 0.8935 | 0.9418 | 0.9582 | 0.9324 | 0.8727 | 0.8954 |
LBIQ | 0.9040 | 0.9291 | 0.9702 | 0.8983 | 0.8222 | 0.9063 |
BLIINDS-II | 0.9323 | 0.9331 | 0.9463 | 0.8912 | 0.8519 | 0.9124 |
DIIVINE | 0.9123 | 0.9208 | 0.9818 | 0.9373 | 0.8694 | 0.9250 |
BRISQUE | 0.9139 | 0.9647 | 0.9786 | 0.9511 | 0.8768 | 0.9395 |
TMIQ | 0.8412 | 0.8734 | 0.8445 | 0. 8712 | 0.7656 | 0.8010 |
NIQE | 0.9172 | 0.9382 | 0.9662 | 0.9341 | 0.8594 | 0.9135 |
This section summarizes some of the possible applications of NR QA that have been explored in the recent past. These applications are only meant to be representative and are in no way exhaustive. The section serves as a reference for the reader and is only a starting point in understanding visual quality applications.
One topic that has received a large amount of interest from the image processing community is image denoising, where the goal is to “remove” the noise from the image, thereby making it “cleaner” [97]. Although a lot of progress has been made in the development of sophisticated denoising models, blind image denoising algorithms [96, 98] which denoise the image
Table 5.2 Median LCC across 1000 train–test combinations on the LIVE IQA database. Italics indicate (OA/OU)-DA no-reference algorithms and bold face indicates OU-DU model algorithms
JP2K | JPEG | WN | Blur | FF | All | |
PSNR | 0.8762 | 0.9029 | 0.9173 | 0.7801 | 0.8795 | 0.8592 |
SSIM | 0.9405 | 0.9462 | 0.9824 | 0.9004 | 0.9514 | 0.9066 |
MS-SSIM | 0.9746 | 0.9793 | 0.9883 | 0.9645 | 0.9488 | 0.9511 |
CBIQ | 0.8898 | 0.9454 | 0.9533 | 0.9338 | 0.8951 | 0.8955 |
LBIQ | 0.9103 | 0.9345 | 0.9761 | 0.9104 | 0.8382 | 0.9087 |
BLIINDS-II | 0.9386 | 0.9426 | 0.9635 | 0.8994 | 0.8790 | 0.9164 |
DIIVINE | 0.9233 | 0.9347 | 0.9867 | 0. c9370 | 0.8916 | 0.9270 |
BRISQUE | 0.9229 | 0.9734 | 0.9851 | 0.9506 | 0.9030 | 0.9424 |
TMIQ | 0.8730 | 0. 8941 | 0.8816 | 0.8530 | 0.8234 | 0.7856 |
NIQE | 0.9370 | 0.9564 | 0.9773 | 0.9525 | 0.9128 | 0.9147 |
without knowledge of the noise severity remain relatively underexplored. Such blind denoising approaches generally combine blind parameter estimation with denoising algorithms to yield completely blind image denoising algorithms.
Initial approaches to blind denoising made use of empirical strategies such as L-curve methods [99–102], the discrepancy principle [103], cross validation [103–108], and risk-based estimates [109–113] of the reference image for parameter optimization. The use of perceptual optimization functions has been shown to yield better parameter estimates [114].
NSS-based blind image denoising approaches seek to reduce the amount of noise in a corrupted image without knowledge of the noise strength [98, 115]. In these approaches, the parameter being estimated is the noise variance, since most denoising algorithms assume that the noise is Gaussian in nature with zero mean and unknown variance. Although the approaches in [98, 115] discuss the estimation of the noise variance parameter only, they can be used to estimate other parameters, depending on the underlying noise model assumed in the image denoising algorithm used.
In [115], the authors exploit content-based statistical regularities of the image. While the approach works well, it is exhaustive and computationally intensive. This is because the image is denoised multiple times using different values of the noise variance and the quality of each denoised image is estimated using a no-reference content-evaluation algorithm and the best image picked from this chosen is set as the output denoised image.
The approach in [98] uses a different strategy. Here, the input parameter is estimated using statistical properties of natural scenes, where statistical features identical to those in [96] are extracted and then mapped onto an estimate of noise variance. One interesting observation that the authors make is that the denoised image produced when the algorithm is provided with the accurate noise variance estimate has lower perceptual quality (as measured by an algorithm) than one that is produced using a different (although incorrect) noise variance. The features extracted were hence designed so that the denoised image has the highest visual quality. During the training stage, given a large set of noisy images, each image was denoised with various values of the input noise variance parameter using [116], and its visual quality was evaluated using MS-SSIM [15]. The image having the highest perceptual quality as gauged by MS-SSIM [15] was selected, and the corresponding noise variance parameter was set as training input to the blind parameter estimation algorithm. This scheme is able to produce denoised images of higher visual quality, without knowledge of the actual noise variance even during the training stage. An extension of this approach would be to evaluate how this approach would perform while estimating parameters of other distortions such as blur or compression [50].
Most of the discussion in this chapter has focused on the quality of “regular” images/videos (i.e., those images and videos which have a limited dynamic range). In the recent past, High Dynamic Range (HDR) imaging techniques have gained tremendous popularity both in academia and in commercial products [117]. An HDR image is typically created from multiple “low” dynamic range images. For instance, in a camera, the exposure time is varied so that different parts of the incident spectrum of light are captured separately and then combined to form an HDR image. The saturation that is seen at the high or low end of the light spectrum in regular images is overcome by using the HDR technique. In order to combine multiple exposures, an algorithm called tone mapping is applied. The goal of a tone-mapping algorithm is to produce an HDR image that not only has a high perceptual quality, but also looks “natural.” One way to measure this is through a tone-mapping-specific IQA algorithm. Since HDR images have no traditionally defined “reference,” the problem is blind in nature. Recently, the authors of [118] proposed such a Tone-Mapping NR IQA algorithm called the tone-mapped image Quality Index (TMQI).
TMQI compares the low and high dynamic range images using a modification of the “structure” term of the SSIM [14] to account for the non-linear response of the human visual system. Such computation is undertaken at multiple scales, drawing inspiration from [15]. A naturalness term is also computed based on previously unearthed statistics of natural low-dynamic images [119, 120]. A combination of the two measures leads to the TMQI. The authors demonstrate that the index correlates well with human perceptions of quality based on findings from previous subjective studies on HDR quality.
Apart from discussing the algorithm and evaluating its performance, the authors also demonstrate how the TMQI measure could be used to aid parameter tuning for tone-mapping algorithms. Another interesting use of the measure is its application in adaptive fusion of tone-mapped images, which takes advantage of multiple tone-mapping operators to produce a high-quality HDR image. The work in [118] is a pioneering effort in the field of tone-mapped quality assessment and demonstrates how useful NR QA algorithms could be in real-life applications.
Photo capture is picking up at a very fast pace with the launch of new hand held devices and smart phones. Americans captured 80 billion digital photographs in 2011 [121], and this number is increasing annually. More than 250 million photographs are being posted daily on Facebook. Consumers are becoming overwhelmed by the amount of available digital visual content, and finding ways to review and control the quality of digital photographs is becoming almost impossible. With this objective in mind, a photo quality helper app [122] for android phones has also been designed recently as part of a senior design project by an undergraduate team at the University of Texas at Austin, which does an automatic judgment of photo quality and enables those photos which do not meet the user's quality threshold to be discarded.
While visual quality assessment has made giant leaps in the recent past, so much so that researchers have quickly moved on from intense research in FR QA to opinion and distortion-unaware NR QA, there remain multiple challenges that need to be addressed in order for the field to attain maturity. In this section, we discuss a handful of such challenges and speculate on possible research directions that may yield positive results.
In this chapter, we have summarized recent research in the field of no-reference or blind image and video quality assessment. The field of NR QA has recently witnessed a host of activity, and several high-performance algorithms have been developed to solve the problem of predicting human perceptions of quality without the need for a reference. The methods that we explored were broken down by the amount of information available to the algorithm during the “training” stage prior to deployment. As we summarized, even in the face of almost no information regarding the distortion type or human opinion on quality, there exist algorithms that predict visual quality with a fair degree of accuracy. This is an extremely encouraging fact, and we expect high-performing NR QA algorithms to approach the level of full-reference correlation on the near future. While great progress has been made on the image quality front, no-reference video quality assessment has received less attention. As we summarized, the complexity of motion and its interaction with distortion make NR VQA a difficult problem to solve.
While the field is certainly maturing, it is far from mature. We summarized a handful of challenges that the field of quality assessment needs to address. Primary amongst the challenges is that of deployment of QA algorithms in practical applications. While QA algorithms have been researched painstakingly, applying QA algorithms to traditional image-processing problems is still under explored. This is true in the case of FR QA algorithms [114], but even more so in the case of NR QA algorithms [50]. As NR QA algorithm performance peaks, we hope that the traditional measures of error, such as mean-squared error, will be replaced by far more meaningful perceptual quality measures. Of course, such a replacement requires demonstration of tangible success – a task that needs concentrated involvement of researchers in the field of quality assessment.
The future is still bright for visual quality measures, especially in areas that have not been explored much before – such as interactions between visual quality and visual tasks. It is but natural to posit separation of measurements of quality impairment (from capture, processing, compression, transmission, post-processing) from scene-dependent factors, so their effects on detection, recognition, or other tasks can be identified and mitigated. This is particularly true in high-distortion environments, such as the increasingly crowded wireless/mobile environment. A principled, ground-up approach is needed whereby the effects of blindly measured video quality degradations on visual tasks can be established. This is of particular importance in forthcoming wireless vision applications where severe distortions occur, and in security applications such as human tracking, which have taken on an increasingly important role in modern-day systems.
BLInd Image Notator using DCT Statistics
Blind/Referenceless Image Spatial Quality Evaluator
Categorical Subjective Image Quality
Distortion Aware
Distortion Identification-based Image Verity and INtegrity Evaluation
Differential Mean Opinion Score
Distortion Unaware
Full Reference
High Dynamic Range
Image Quality Assessment
Linear Correlation Coefficient
Mean Opinion Score
Multi-Variate Gaussian
Naturalness Image Quality Evaluator
No Reference
Natural Scene Statistics
Opinion Aware
Opinion Unaware
Peak-Signal-to-Noise Ratio
Quality Assessment
Reduced Reference
Spearman Rank-Ordered Correlation Coefficient
Structural SIMilarity Index
Support Vector Regression
Tone-Mapped Quality Index
Video Quality Assessment