5
No-Reference Approaches to Image and Video Quality Assessment

Anish Mittal1, Anush K. Moorthy2 and Alan C. Bovik3

1Nokia Research Center, USA

2Qualcomm Inc., USA

3University of Texas at Austin, USA

5.1 Introduction

Visual quality assessment as a field has gained tremendous importance in the past decade, as evinced by the flurry of research activity that has been conducted in leading universities and commercial corporations on topics that fall under its umbrella. The reason for this is the exponential growth of visual data that is being captured, stored, transmitted, and viewed across the world. Driving the previously unfathomable growth in communications, images and videos now form a major chunk of transmitted data. This is not surprising since, from the dawn of time, humans have been visual animals who have preferred images over the written word. One need only look at the amount of area devoted to visual signal processing in the human brain to surmise that vision and its perception forms a major chunk of neurological processing [1, 2]. Hence, researchers have attempted to decode human vision processing and have used models of the visual system for image-processing applications [3–7].

While an image can convey more than a thousand words, transmission of visual content occupies an equivalently large amount of bandwidth in modern communication systems, hence images and videos are compressed before storage or transmission. With increasing resolutions and user expectations, increasingly scarce bandwidth is being strained. While the user is concerned only with the final quality of the image or the video, the service provider attempts to provide said quality with the least possible expense of bandwidth. With increasing quality expectations, algorithm developers attempt to better image/video-processing algorithms so that visual quality at the output is enhanced. In all cases, however, the quantity being addressed is a highly subjective one, that of “visual quality” as perceived by a human observer. The goal of automated Quality Assessment (QA) is to attempt to produce an estimate of this human-perceived quality, so that such a quantitative measure may be used in place of subjective human perception. Our discussion here centers on quality assessment algorithms and their applications in modern image/video-processing systems.

Typically, QA algorithms are classified on the basis of the amount of information that is available to the algorithm. Full-Reference (FR) quality assessment algorithms require the distorted image/video whose quality needs to be accessed as well as the clean, pristine reference image/video for comparison [8–17], whereas Reduced-Reference (RR) approaches only use limited information regarding the reference image/video in lieu of the actual reference content itself, together with the distorted image [18–21]. Blind or No-Reference (NR) QA refers to automatic quality assessment of an image/video using an algorithm which only utilizes the distorted image/video whose quality is being assessed [22–45].

While tremendous progress has been made in understanding human vision, our current knowledge is far from complete. In the face of such incomplete knowledge, it is not surprising that researchers have focused on simpler FR algorithms [14, 15, 18]. FR algorithms themselves have many applications, especially in the field of image and video compression where the original image/video is available. Apart from applications, FR algorithms also provide researchers with a fundamental foundation on which NR algorithms can be built. The tools that proved useful in the FR framework have been modified suitably and later used for NR algorithms. For example, Natural-Scene-Statistics (NSS) models of images[16, 17] have been used successfully in recent NR algorithms [35–41, 43–45]. Since FR research has been covered in this compendium and elsewhere [7, 46, 47], in this chapter we shall focus on NR algorithms.

One important aspect of quality assessment research is evaluating the performance of an algorithm. Since QA algorithms attempt to reproduce human opinion scores, it is obvious that a good algorithm is one that correlates well with human opinion of quality. In order to estimate human opinion of quality, large databases spanning a wide range of contents and visual impair-ments (such as those we detail later in this chapter) are created, and large-scale human studies are conducted. The human opinion scores thus produced represent the ground-truth and are used in evaluating algorithm performance. Apart from evaluating performance, these scores serve an additional, very important function in the case of NR algorithms.

NR QA approaches can be classified on the basis of whether the algorithm has access to subjective/human opinion prior to deployment. Algorithms could use machine learning techniques along with human judgments of quality during a “training” phase and then could attempt to reproduce human opinion during the “testing” phase. Such algorithms, which first learn human behavior from subjective quality data, are referred to as Opinion-Aware (OA) NR algorithms. Opinion-aware algorithms are the first step toward building a completely blind algorithm (i.e., one that is not only blind to the reference image, but also to the human opinion of quality). Such completely blind algorithms, which do not use subjective data on quality to perform blind quality assessment, are termed Opinion-Unaware (OU) algorithms. While both OA and OU algorithms have practical applications, OU NR algorithms hold more practical relevance. This is because it is impossible to anticipate all of the different distortions that may occur in a practical system. Further, no controlled database can possibly span all distortions and quality ranges well enough. Recent research in NR QA has spanned both OA and OU algorithms. While OA algorithms perform as well as FR algorithms, recent OU approaches have been catching up and perform exceedingly well without access to human opinion [39–45].

In this chapter, we shall detail recent no-reference approaches to image and video quality assessment. Specifically, we shall cover both opinion-aware and opinion-unaware models. Most of the approaches that we shall cover are based on understanding and modeling the underlying statistics of natural images and/or distortions using perceptual principles. These approaches measure deviations from statistical regularities and quantify such deviations, leading to estimates of quality. In this chapter, we shall analyze the motivation and the principles underlying such statistical descriptions of quality and describe the algorithms in detail. We shall provide exhaustive comparative analysis of these approaches and discuss the potential applications of no-reference algorithms. Specifically, we shall cover the case of distortion-unaware perceptual image repair and quality assessment of tone-mapped images. We then segue into a discussion of the challenges that lie ahead for the field to gain maturity and other practical application scenarios that we envision for these algorithms. We conclude the chapter with a discussion of some philosophical predictions of future directions that the field of automatic quality assessment of images and videos may take.

5.2 No-Reference Quality Assessment

Early research on no-reference quality assessment was focused toward predicting the quality of images afflicted with specific distortions [22–29]. Such approaches aim to model distortion-specific artifacts that can relate well to the loss in visual quality. For example, JPEG compressed images could be evaluated for their visual quality using the strength of edges at block boundaries [23, 24, 48, 49]. Such algorithms are restricted to the distortions they are designed for, limiting their scalability and usage in real scenarios, since the distortion type afflicting the image is almost never a known quantity. Having said that, these algorithms represented the first steps toward truly blind NR algorithms and hence form an important landmark in the field of quality assessment.

The next leap in NR algorithms was that of developing distortion-agnostic approaches to QA. These algorithms can predict the quality of the image without information on the distortion afflicting the image [30–45]. Some of these models are also capable of identifying the distortion afflicting the image [35, 39]; information that could potentially be used for varied applications [50].

Early general-purpose, distortion-unaware NR QA algorithms were developed using models that can learn to predict human judgments of image quality from databases of human-judged distorted images [30–39]. Such OA models, which use distorted images with co-registered human scores, have been shown to deliver high performance on images corrupted with different kinds of distortions and severities [51]. As we have mentioned before, while these algorithms are tremendously useful, they may be limited in their application since they are limited by the distortions that they are trained on, as well as bound by the quality ranges that the controlled training database has on offer. Since no controlled database can completely offer the variety of practical distortions, the algorithms trained on these databases remain partially crippled. This is not to say that the underlying model used to “train” these algorithms is at fault; in fact, as with the FR-to-NR transition, the models used for OA NR algorithms have been tweaked and utilized for opinion-unaware approaches.

images

Figure 5.1 Blind quality assessment models requiring different amounts of prior information

OU approaches predict the visual quality of images without any access to human judgments during the training phase [39–45]. During the training phase, these algorithms may or may not have access to the distorted images. OU approaches which are trained on distorted images are limited by the distortion types and are referred to as Distortion-Aware (DA) algorithms. OU DA algorithms may be viewed as close cousins of the OA approaches, since both of them are limited by the training data available. OU Distortion-Unaware (DU) algorithms are those that do not utilize distorted images during the training phase. Since OU DU algorithms have no access to any distorted image or human opinion a priori, they represent a final frontier in NR QA research. OU DU algorithms find applications in uncontrolled environments such as highly unpredictable wireless networks or quality assessment of user-captured photographs and so on. It may seem that these algorithms have almost no information to make judgments on quality, but researchers find motivation in the fact that leading FR IQA models (such as the Structural SIMilarity index, SSIM [14]) are both opinion and distortion-unaware. Since one of the goals of NR QA research is to develop algorithms that could replace FR algorithms, OU DU NR algorithm development has caught the fancy of researchers in recent years [43–45]. A summary of NR algorithm classification is given in Figure 5.1.

5.2.1 Opinion-Aware Distortion-Aware Models

OA DA approaches make use of both distorted images and associated human judgments to develop QA models. Depending on the types of features they extract from the image/video, they can be categorized into codebook-based, ensemble-based, and Natural Scene Statistics (NSS)-based approaches.

5.2.1.1 Codebook-Based Approaches

The authors of [31, 32] use Gabor filter responses, based on local appearance descriptors to form a visual codebook post-quantization. The codebook feature space is then used to yield an estimate of quality. This is accomplished in one of two ways: (a) an example-based method or (b) a Support Vector Regression (SVR)-based method. The example-based method estimates the quality score of the test image using a weighted average of the quality scores of training images, where the authors assume that there exists a linear relationship between codeword histograms and quality scores. In contrast, the SVR-based method “learns” the mapping between the codeword histograms and the quality scores during the “training” phase. The approach is competitive in performance with other general-purpose NR IQA measures. However, its computational complexity limits its use in practical applications.

5.2.1.2 Ensemble-Based Approaches

Tang et al. [34] proposed an approach which learns an ensemble of regressors trained on three different groups of features – natural image statistics, distortion texture statistics, and blur/noise statistics. These regressors learn the mapping from feature space to quality and when deployed during the test phase, the algorithm reproduces the quality of the image using a combination of learned regressors. Another approach is based on a hybrid of curvelet, wavelet, and cosine transforms [52]. Although these approaches work on a variety of distortions, each set of features (in the first approach) and transforms (in the second approach) caters only for certain kinds of distortion processes, thereby limiting the deployment of these algorithms.

5.2.1.3 Natural Scene Statistics-Based Approaches

NSS-based approaches work on the rationale that all natural scenes obey statistical laws that are independent of the content of the scene being imaged. For instance, local quantities such as contrast are scale invariant in nature and follow heavy-tailed distributions [53]. In the case of distorted images, however, such scene statistics deviate from natural distributions, rendering them unnatural. These deviations, when quantified appropriately, can be utilized to evaluate the quality of the distorted image. This strategy of approaching the problem of NR QA from a natural scene perspective instead of a distortion-based perspective makes NSS-based NR approaches much less dependent on distortion-specific characteristics such as blocking. NSS models have proved to be very powerful tools for quality assessment in general, and have been used successfully for developing FR QA algorithms [16, 17, 25] and RR algorithms [54] in the past as well.

Recently, three successful blind/NR QA approaches to Image Quality Assessment (IQA) based on NSS were proposed [35–37], which exploit different NSS regularities in wavelet, DCT, and spatial domains, respectively. An NR IQA model developed in the wavelet domain, dubbed the Distortion Identification-based Image Verity and INtegrity Evaluation (DIIVINE) index, makes use of a series of statistical features derived from an NSS wavelet coefficient model. These features are subsequently used in the two-stage framework for QA, where the first stage identifies the distortion type and the second stage performs distortion-specific quality assessment [35]. While the approach performs at par with some of the successful FR IQA algorithms, the expensive computation of spatial correlation-based features makes it impractical for use in real-world applications. Their modification on the lines of proposed pairwise product-based features in the Blind/Referenceless Image Spatial QUality Evalu-ator (BRISQUE) model can alleviate the problem [37].

The DCT domain-based approach – BLInd Image Notator using DCT Statistics (BLIINDS-II index) – computes a small number of features from an NSS model of block DCT coefficients [36]. Such NSS features, once calculated, are supplied to a regression function that predicts human judgments of visual quality. In comparison with DIIVINE, BLIINDS-II is a single-stage algorithm and instead of training multiple distortion-specific QA models, it only makes use of a single NSS model that is able to deliver highly competitive QA prediction power. Although BLIINDS-II uses a small number of features (4), the non-linear sorting of features makes the approach-computationally complex.

The third approach – BRISQUE – was developed with the express purpose of efficiency [37]. BRISQUE explores the possibility of a transform-free approach and operates directly on spatial pixel data. BRISQUE is based on the spatial NSS model of Ruderman [55] and uses pointwise statistics of locally normalized luminance signals and distribution of pairwise products of neighboring locally normalized luminance signals as features. Once these features are computed, a mapping from features to human judgment is learned using a regression module, yielding a measure of image quality.

5.2.2 Opinion-Unaware Distortion-Aware Models

This section summarizes approaches to NR QA that do not use human opinion scores to design QA models - OU DU NR QA algorithms. The advantage in such a scheme is that these approaches are not limited by the size of the databases with human judgments, thereby increasing their versatility. OU DA algorithms could either use large databases of reference and distorted images along with the corresponding FR algorithm scores as a proxy for human judgments, or in the ideal case, use only a set of pristine and distorted images together with the associated distorted categories.

5.2.2.1 Visual Words-Based Quality Assessment

Approaches based on visual words first decompose images using an energy-compacting filter bank and then divisively normalize the responses, yielding outputs that are well modeled using NSS models [41, 43]. Once such a representation is obtained, the image is divided into patches, and perceptually relevant NSS features are computed at each image patch. Features are computed from image patches obtained from both reference and distorted images to create distributions over visual words. Quality prediction is then accomplished by computing the Kullback–Leibler (KL) divergence between the visual word distribution of the distorted image and the signature visual word distribution of the space of exemplar images. One drawback of such an approach is that the creation of these visual word distributions from the features is lossy owing to the quantization involved in the process and could potentially affect predictions. Further, the approach is only as good as the diversity in the chosen training set of images and distortions and may not generalize to other distortions.

5.2.2.2 Topic Model-Based Learning for Quality Assessment

Algorithms based on topic models work on the principle that distorted images have certain latent characteristics that differ from those of “pristine” images [40]. These latent characteristics are explored through application of a “topic model” to visual words extracted from an assorted set of pristine and distorted images. Visual words, which are obtained using quality-aware NSS features, are used to determine the correct choice of latent characteristics, which are in turn capable of discriminating between pristine and distorted image content. The similarity between the probability of occurrence of the different topics in an unseen image and the distribution of latent topics averaged over a large number of pristine natural images is indicative of the image quality. The advantage of this approach is that it not only predicts the visual quality of the image, but also discovers groupings amongst artifacts of distortions in the corrupted images without any supervision. Unfortunately, in its current form, the approach does not perform as well as general-purpose OA models.

5.2.2.3 Full-Reference Image Quality Assessment-Based Learning for Quality Assessment

These approaches use quality scores produced by the application of an FR IQA algorithm on each distorted reference image pair in a training database as a proxy for human judgments of quality [42]. Distorted images and their reference versions are first partitioned into patches and a percentile pooling strategy is used to estimate the quality of each patch. The patches are then grouped into different groups based on their quality levels using special clustering techniques [56]. Quality-aware clustering is then applied to each group to learn the quality-aware centroids. Each patch of the distorted image is compared with the learned quality-aware centroids during the testing stage and the final quality score is assigned based on a simple weighted average. The score for each patch, once obtained, can be pooled to obtain the quality, of the image. This approach shows high correlation with respect to human judgments of image quality, and also high efficiency. As with all OA DU models, the approach is limited by the database of distortions that it is trained on and the quality estimates of the FR algorithm used as the proxy for human judgments.

5.2.3 Opinion-Unaware Distortion-Unaware Models

A holy grail of the blind/NR IQA problem is that of the design of perceptual models that can predict the quality of distorted images with as little prior knowledge of the images or their distortions as possible [41, 43, 44]. This section discusses the first steps toward completely blind approaches to NR QA (i.e., those approaches that do not make use of any prior distortion knowledge or subjective opinion of quality to predict the quality of images – OU DU models).

5.2.3.1 Visual words-based Quality Assessment

This approach is an extension of the OU DA visual word-based NR QA model of [41, 43]. While the crux of the approach remains the same as that summarized above, the DU extension differs in the way the visual words are formed during the training stage of the algorithm. In the DU case, instead of using both the distorted and reference images, the model uses only the natural undistorted reference images to form the visual codebook.

The feature-to-visual-word-distribution conversion is a lossy process due to the quantization involved and affects the accuracy of human judgment prediction. The approach described below overcomes these shortcomings and delivers performance on a par with the top-performing FR and NR IQA models that require training on human-judged databases of distorted images.

5.2.3.2 Multi-Variate Gaussian (MVG) Model-Based Quality Assessment

This NR OU-DU IQA model [44], dubbed Naturalness Image Quality Evaluator (NIQE), is based on constructing a collection of quality-aware features. This approach is based on the principle that all “natural” images captured are distorted in some form or another. For instance, during image capture, there is always a loss of resolution due to the low-pass nature of the lens; further, there exists defocus blur in different parts of the image depending on the associated depth of field. Since humans appear to more heavily weight their judgments of image quality from image regions which are in focus and hence appear sharper, salient quality measurements can be made from “sharp” patches in an image. From amongst a collection of natural patches, this approach uses data only from those patches that are richest in information (i.e., those that are less likely to have been subjected to a limiting distortion such as blur). The algorithm extracts quality-aware perceptual features from these patches to construct an MVG model. The quality of a given test image is then expressed as the distance between an MVG fit of the NSS features extracted from the test image whose quality is to be assessed, and an MVG model of the quality-aware features extracted from the corpus of natural images. Experimental results demonstrate that this approach performs as well as top-performing FR IQA models that require corresponding reference images and NR IQA models that require training on large databases of human opinions of distorted images (i.e., OA DU models).

5.2.4 No-Reference Video Quality Assessment

While there has been a lot of activity in the area of distortion-agnostic image quality assessment, the field of distortion-agnostic No-Reference Video Quality Assessment (NR VQA) has been relatively quiet. This is not to say that NR VQA algorithms do not exist, but most existing models are specific to the application that they were designed for and hence do not find wide-spread use. The popularity of distortion-specific measures for NR VQA can be attributed to two factors. First, VQA is a far more complex subject than IQA, and given that distortion-unaware models for NR IQA have just started to appear, such models for NR VQA are still forthcoming. Second, the lack of a universal measure for NR VQA and the need for blind quality assessment in many applications necessitates the development of such distortion-specific measures. Since this chapter has focused on distortion-specific NR IQA measures, and we believe that distortion-agnostic NR VQA algorithms are just around the corner, in this section we do not expressly summarize each proposed approach. Instead, we point the reader to relevant work in the area and to popular surveys which describe the algorithms in far more detail.

One of the most popular distortions that has been evaluated in the area of NR VQA is that of compression. Since video compression has been a popular subject for research and has tremendous practical applications, it comes as no surprise that many NR VQA metrics are geared toward compression. One artifact of compression that is commonly quantified is that of blocking [57–60], where edge strength at block boundaries is measured and correlated with perceptual quality. Techniques used include harmonic analysis based on Sobel edge detection [57], looking for correlations at 8 × 8 block boundaries in MPEG video [58], and using luminance masking along with edge-strength measures [59]. Another distortion that manifests due to compression is that of blur [61]. Jerkiness, which may be an artifact of the encoder, has also been evaluated for its quality using several techniques [62–66].

Researchers have also studied the quality effect of multiple coincident distortions on videos. The combination of blocking, ringing, and sharpness is evaluated in [67], where each distortion is evaluated using a technique that quantifies the distortion indicator. Packet-loss artifacts and their effect on visual quality, along with that of blocking due to compression, was studied in [68]. Another measure that evaluated the effect of blockiness, bluriness, and noisiness on visual quality was proposed in [69]. Apart from blocking and blurring, the authors of [70] evaluated the effects of motion artifacts on visual quality for 2.5G/3G systems. Bit-error impairments, noise, and blocking have been studied in [71], as has a motion-compensated approach [72].

Apart from spatial measures, researchers have also evaluated the effect of temporal quality degradations on visual perception. For example, motion information to evaluate the effect of frame-drop severity can be extracted from time-stamp information as in [65]. The authors segment the video and determine the motion activity for each of these segments, which is then used to quantify the significance of frame drops in the segments. Another measure of frame-drop quality is that in [73], where the discontinuity along time is computed using the mean-squared error between neighboring frames. The authors of [74] studied the effect of error concealment on visual quality. Channel distortion was modeled in [75], where information from macro blocks was used to quantify loss and its effect on visual quality. Jerkiness between frames was modeled using absolute frame differences between adjacent frames in [76].

The authors of [77] evaluate a multitude of factors to quantify overall qualities such as blur, blockiness, activity and so on. While this method still uses distortion-specific indicators of quality, it is one of the few NR VQA algorithms that does not limit itself to a particular application. A survey of such distortion-specific measures appears in [78].

The only truly distortion-agnostic measure for video quality was recently proposed in [79]. This algorithm is based on the algorithm for blind image quality assessment proposed in [36] and uses the principle of natural video statistics. As in the case of images, the assumption is that natural videos have certain underlying statistics that are destroyed in the presence of distortion. The deviation from naturalness is quantified to produce a quality measure. In [79], the authors use DCT coefficient differences between adjacent frames as the basis, and extract block-motion estimates to quantify motion characteristics. Features are then extracted by weighting DCT differences by the amount of motion and its effect on visual perception. The method is generic in nature and hence could be applied to a wide variety of distortions. The authors demonstrate that the algorithm quantifies quality for compression and packet loss using a popular video quality assessment database [80].

5.3 Image and Video Quality Databases

The performance of any IQA/VQA model is gauged by its correlation with human subjective judgments of quality, since the human is the ultimate receiver of the visual signal. Such human opinions of visual quality are generally obtained by conducting large-scale human studies, referred to as subjective quality assessment, where human observers rate a large number of distorted (and possibly reference) signals. When the individual opinions are averaged across the subjects, a Mean Opinion Score (MOS) or Differential Mean Opinion Score (DMOS) is obtained for each of the visual signals in the study, where the MOS/DMOS is representative of the perceptual quality of the visual signal. The goal of an objective QA algorithm is to predict quality scores for these signals such that the scores produced by the algorithm correlate well with human opinions of signal quality (MOS/DMOS). Practical application of QA algorithms requires that these algorithms compute perceptual quality efficiently. In this section, we summarize the available image and video databases.

5.3.1 Image Quality Assessment Databases

  • LIVE Multiply Distorted Image Quality Database. A subjective study [81] was conducted in two parts to obtain human judgments on images corrupted under two multiple distortion scenarios. The two scenarios considered are: (1) image storage, where images are first blurred and then compressed by a JPEG encoder; (2) camera image acquisition process, where images are first blurred due to a narrow depth of field or other defocus and then corrupted by white Gaussian noise to simulate sensor noise.
  • LIVE Image Quality Database. The LIVE database [82] developed at the University of Texas at Austin, TX, contains 29 reference images and 779 distorted images at different image resolutions ranging from 634 × 438 to 768 × 512 pixels. Reference images are simulated with five different types of distortions to varying degrees – JPEG compression, JPEG2000 compression, additive Gaussian white noise, Gaussian blurring, and fast fading distortion, where the JPEG2000 compression bit stream is passed through a simulated Rayleigh fading channel. Human judgments were obtained from 29 subjects.
  • IRCCyN/IVC Image Quality Database (IVC). The IRCCyN/IVC database [83] developed at the Institut de Recherche en Communications et Cyberntique de Nantes (IRCCyN), France contains 10 reference images and 185 distorted images at an image resolution of 512 × 512 pixels. There are five distortion types in this database: JPEG compression, JPEG compression of only the luminance component, JPEG2000 compression, locally adaptive resolution coding, and Gaussian blurring. Each type of distortion was generated with five different amounts of distortion. Human judgments were collected from 15 subjects.
  • Tampere Image Quality Database. The Tampere database [84] developed at the Tampere University of Technology, Finland contains 25 reference images and 1700 distorted versions of them at a resolution of 384 × 512 pixels. There are 17 different distortion types in the database including different types of noise, blur, denoising, JPEG and JPEG2000 compression, transmission of JPEG, JPEG2000 images with errors, local distortions, luminance, and contrast changes. Each type of distortion was generated with four different amounts. Human opinions were obtained from 838 subjects.
  • Categorical Subjective Image Quality (CSIQ) Database. The CSIQ database [85] developed at Oklahoma State University, OK contains 30 reference images and 866 distorted images at a resolution of 512 × 512 pixels. Six distortion types are present in this database: JPEG compression, JPEG2000 compression, additive Gaussian white noise, additive Gaussian pink noise, Gaussian blurring, and global contrast decrements where each distortion type was generated with four or five varying degrees. The ratings were obtained from 35 subjects.
  • The Real Blur Image Database (RBID). The RBID [86] developed at the Uni-versidade Federal do Rio de Janeiro, Brazil, contains 585 blurred images captured from a real camera device with image sizes ranging from 1280 × 960 to 2272 × 1704 pixels. The images in this database are categorized into five different blur classes: unblurred, out of focus, simple motion, complex motion, and others. The ratings were collected from 20 subjects.

5.3.2 Video Quality Assessment Databases

  • LIVE Video Quality Database. The LIVE VQA database [80] consists of 150 distorted videos created from 10 reference videos and span distortions such as MPEG-2 and H.264 compression, and simulated transmission of H.264 streams over IP and wireless channels. The videos in this database are at a resolution of 768 × 432 pixels and, as of this writing, the LIVE VQA database is a de-facto standard database to test the performance of any new VQA algorithm.
  • EPFL-PoliMI VQA Database. Consisting of 156 video streams at CIF and 4CIF resolutions, this database incorporates packet-loss distortions due to transmission of H.264 streams over error-prone networks [87, 88]. With 40 subjects taking part in the study, this database can be used to measure the performance of VQA algorithms.
  • IRCCyN/IVC HD Video Database. This database consists of a total of 192 distorted videos of 9–12 s duration at 1080i @ 50 fps [89]. The distorted videos were created using H.264 compression at different bit rates and were rated by 28 subjects using the Absolute Category Rating (ACR) scale [90]. The database also includes human opinion scores from a SAMVIQ test methodology [89].
  • MMSP Scalable Video Database. Developed by researchers at EPFL, this scalable video database consists of compressed videos created by using two different scalable video codecs [91, 92]. Three HD videos were processed at three spatial and four temporal resolutions to create a total of 72 distorted videos which were rated using both a paired comparison and a single-stimulus one [90].
  • LIVE Mobile VQA Database. Consisting of 200 distorted videos from 10 RAW HD (720p) reference videos, the LIVE Mobile VQA database is the first of its kind where temporal distortions such as time-varying video quality and frame-freezes of varying duration accompany the traditional compression and packet-loss distortion [93]. The videos were viewed and rated by human subjects on two mobile devices - a cellphone and a tablet - to produce the DMOS. With over 50 subjects, and 5300 summary subjective scores and time-sampled traces of quality, the LIVE Mobile VQA database is one of the largest and newest publicly available databases as of today.

A comparison of many of the databases mentioned here, as well as others, appears in [94]. The author of [94] also maintains a comprehensive list of image and video quality databases [95].

5.4 Performance Evaluation

In this section we summarize the performance of some of the image quality assessment algorithms discussed in this chapter and compare their performance to that of leading full-reference algorithms. We do not report NR VQA performance, since there exists only one truly blind, distortion-agnostic NR VQA algorithm [79].

We use the LIVE IQA database [82], which consists of 29 reference images and 779 distorted images spanning five different distortion categories - JPEG and JPEG2000 (JP2K) compression, additive white Gaussian noise (WN), Gaussian blur (blur), and a Rayleigh fast-fading channel distortion (FF) as the test bench. The database provides associated DMOS for each distorted image, representing its subjective quality. We list the performances of the three FR indices, Peak-Signal-to-Noise Ratio (PSNR), single-scale Structural SIMilarity index (SSIM) [14], and the Multi-Scale Structural SIMilarity index (MS-SSIM) [15]. While the PSNR is often criticized for its poor performance, it forms a baseline for QA algorithm performance and is still widely used to quantify visual quality, while the latter is a leading FR algorithm with high correlation with human performance – a goal that all NR algorithms attempt to achieve.

The NR algorithms evaluated are: CBIQ [31], LBIQ [34], BLIINDS-II [36], DIIVINE [35], and BRISQUE [96], all OA-DA algorithms; TMIQ [40], an OU-DA approach and NIQE [44], an OU-DU algorithm. The correlations for CBIQ [31] and LBIQ [34] were provided by the authors.

Since all of the IQA approaches that we compare require a training procedure to calibrate the regressor module, we divided the LIVE database randomly into chosen subsets for training and testing. While the OU approaches do not require such training (they are trained on an alternate database of distorted + natural images or natural images only, as summarized before), and

FR approaches do not need any training at all, to ensure a fair comparison across methods, the correlations of predicted scores with subjective opinion of visual quality are only reported on the test set. The dataset was divided into 80% training and 20% testing such that no overlap occurs between train and test content. This train–test procedure was repeated 1000 times to ensure that there was no bias due to the spatial content used for training. We report the median performance across all iterations.

We use Spearman's Rank-Ordered Correlation Coefficient (SROCC) and Pearson's (Linear) Correlation Coefficient (LCC) to test the model. SROCC and LCC represent the correlation between algorithm and subjective scores so that a value of 1 indicates perfect correlation. Since LCC is a linear correlation, all algorithm scores are passed through a logistic non-linearity [82] before computing LCC for mapping to DMOS space. This is a standard procedure that is used to detail the algorithm performance of QA algorithms. The SROCC and LCC values of the algorithms summarized in this chapter are tabulated in Tables 5.1 and 5.2, respectively.

The correlations demonstrate that OA-DA algorithms correlate well with human perception approaching the performance of leading full-reference algorithms such as MS-SSIM and besting popular measures such as PSNR. While the performance of TMIQ (the OU-DA measure) is not as good, the results are encouraging, especially for certain distortions such as JP2K and JPEG compression, where the performance is close with that of PSNR. The OU-DU measure of NIQE produces a performance comparable with that of the OA-DA measures and hence to that of successful FR IQA algorithms. While there remains room for improvement, the tables demonstrate that state-of-the-art NR algorithm performance is comparable with that of FR algorithms and NR approaches can be used successfully in place of FR approaches.

Table 5.1 Median SROCC across 1000 train–test combinations on the LIVE IQA database. Italics indicate (OA/OU)-DA no-reference algorithms and bold face indicates OU-DU model algorithms

JP2K JPEG WN Blur FF All
PSNR 0.8646 0.8831 0.9410 0.7515 0.8736 0.8636
SSIM 0.9389 0.9466 0.9635 0.9046 0.9393 0.9129
MS-SSIM 0.9627 0.9785 0.9773 0.9542 0.9386 0.9535
CBIQ 0.8935 0.9418 0.9582 0.9324 0.8727 0.8954
LBIQ 0.9040 0.9291 0.9702 0.8983 0.8222 0.9063
BLIINDS-II 0.9323 0.9331 0.9463 0.8912 0.8519 0.9124
DIIVINE 0.9123 0.9208 0.9818 0.9373 0.8694 0.9250
BRISQUE 0.9139 0.9647 0.9786 0.9511 0.8768 0.9395
TMIQ 0.8412 0.8734 0.8445 0. 8712 0.7656 0.8010
NIQE 0.9172 0.9382 0.9662 0.9341 0.8594 0.9135

5.5 Applications

This section summarizes some of the possible applications of NR QA that have been explored in the recent past. These applications are only meant to be representative and are in no way exhaustive. The section serves as a reference for the reader and is only a starting point in understanding visual quality applications.

5.5.1 Image Denoising

One topic that has received a large amount of interest from the image processing community is image denoising, where the goal is to “remove” the noise from the image, thereby making it “cleaner” [97]. Although a lot of progress has been made in the development of sophisticated denoising models, blind image denoising algorithms [96, 98] which denoise the image

Table 5.2 Median LCC across 1000 train–test combinations on the LIVE IQA database. Italics indicate (OA/OU)-DA no-reference algorithms and bold face indicates OU-DU model algorithms

JP2K JPEG WN Blur FF All
PSNR 0.8762 0.9029 0.9173 0.7801 0.8795 0.8592
SSIM 0.9405 0.9462 0.9824 0.9004 0.9514 0.9066
MS-SSIM 0.9746 0.9793 0.9883 0.9645 0.9488 0.9511
CBIQ 0.8898 0.9454 0.9533 0.9338 0.8951 0.8955
LBIQ 0.9103 0.9345 0.9761 0.9104 0.8382 0.9087
BLIINDS-II 0.9386 0.9426 0.9635 0.8994 0.8790 0.9164
DIIVINE 0.9233 0.9347 0.9867 0. c9370 0.8916 0.9270
BRISQUE 0.9229 0.9734 0.9851 0.9506 0.9030 0.9424
TMIQ 0.8730 0. 8941 0.8816 0.8530 0.8234 0.7856
NIQE 0.9370 0.9564 0.9773 0.9525 0.9128 0.9147

without knowledge of the noise severity remain relatively underexplored. Such blind denoising approaches generally combine blind parameter estimation with denoising algorithms to yield completely blind image denoising algorithms.

Initial approaches to blind denoising made use of empirical strategies such as L-curve methods [99–102], the discrepancy principle [103], cross validation [103–108], and risk-based estimates [109–113] of the reference image for parameter optimization. The use of perceptual optimization functions has been shown to yield better parameter estimates [114].

NSS-based blind image denoising approaches seek to reduce the amount of noise in a corrupted image without knowledge of the noise strength [98, 115]. In these approaches, the parameter being estimated is the noise variance, since most denoising algorithms assume that the noise is Gaussian in nature with zero mean and unknown variance. Although the approaches in [98, 115] discuss the estimation of the noise variance parameter only, they can be used to estimate other parameters, depending on the underlying noise model assumed in the image denoising algorithm used.

In [115], the authors exploit content-based statistical regularities of the image. While the approach works well, it is exhaustive and computationally intensive. This is because the image is denoised multiple times using different values of the noise variance and the quality of each denoised image is estimated using a no-reference content-evaluation algorithm and the best image picked from this chosen is set as the output denoised image.

The approach in [98] uses a different strategy. Here, the input parameter is estimated using statistical properties of natural scenes, where statistical features identical to those in [96] are extracted and then mapped onto an estimate of noise variance. One interesting observation that the authors make is that the denoised image produced when the algorithm is provided with the accurate noise variance estimate has lower perceptual quality (as measured by an algorithm) than one that is produced using a different (although incorrect) noise variance. The features extracted were hence designed so that the denoised image has the highest visual quality. During the training stage, given a large set of noisy images, each image was denoised with various values of the input noise variance parameter using [116], and its visual quality was evaluated using MS-SSIM [15]. The image having the highest perceptual quality as gauged by MS-SSIM [15] was selected, and the corresponding noise variance parameter was set as training input to the blind parameter estimation algorithm. This scheme is able to produce denoised images of higher visual quality, without knowledge of the actual noise variance even during the training stage. An extension of this approach would be to evaluate how this approach would perform while estimating parameters of other distortions such as blur or compression [50].

5.5.2 Tone Mapping

Most of the discussion in this chapter has focused on the quality of “regular” images/videos (i.e., those images and videos which have a limited dynamic range). In the recent past, High Dynamic Range (HDR) imaging techniques have gained tremendous popularity both in academia and in commercial products [117]. An HDR image is typically created from multiple “low” dynamic range images. For instance, in a camera, the exposure time is varied so that different parts of the incident spectrum of light are captured separately and then combined to form an HDR image. The saturation that is seen at the high or low end of the light spectrum in regular images is overcome by using the HDR technique. In order to combine multiple exposures, an algorithm called tone mapping is applied. The goal of a tone-mapping algorithm is to produce an HDR image that not only has a high perceptual quality, but also looks “natural.” One way to measure this is through a tone-mapping-specific IQA algorithm. Since HDR images have no traditionally defined “reference,” the problem is blind in nature. Recently, the authors of [118] proposed such a Tone-Mapping NR IQA algorithm called the tone-mapped image Quality Index (TMQI).

TMQI compares the low and high dynamic range images using a modification of the “structure” term of the SSIM [14] to account for the non-linear response of the human visual system. Such computation is undertaken at multiple scales, drawing inspiration from [15]. A naturalness term is also computed based on previously unearthed statistics of natural low-dynamic images [119, 120]. A combination of the two measures leads to the TMQI. The authors demonstrate that the index correlates well with human perceptions of quality based on findings from previous subjective studies on HDR quality.

Apart from discussing the algorithm and evaluating its performance, the authors also demonstrate how the TMQI measure could be used to aid parameter tuning for tone-mapping algorithms. Another interesting use of the measure is its application in adaptive fusion of tone-mapped images, which takes advantage of multiple tone-mapping operators to produce a high-quality HDR image. The work in [118] is a pioneering effort in the field of tone-mapped quality assessment and demonstrates how useful NR QA algorithms could be in real-life applications.

5.5.3 Photo Selection

Photo capture is picking up at a very fast pace with the launch of new hand held devices and smart phones. Americans captured 80 billion digital photographs in 2011 [121], and this number is increasing annually. More than 250 million photographs are being posted daily on Facebook. Consumers are becoming overwhelmed by the amount of available digital visual content, and finding ways to review and control the quality of digital photographs is becoming almost impossible. With this objective in mind, a photo quality helper app [122] for android phones has also been designed recently as part of a senior design project by an undergraduate team at the University of Texas at Austin, which does an automatic judgment of photo quality and enables those photos which do not meet the user's quality threshold to be discarded.

5.6 Challenges and Future Directions

While visual quality assessment has made giant leaps in the recent past, so much so that researchers have quickly moved on from intense research in FR QA to opinion and distortion-unaware NR QA, there remain multiple challenges that need to be addressed in order for the field to attain maturity. In this section, we discuss a handful of such challenges and speculate on possible research directions that may yield positive results.

  • Content Bias. The discussion in this chapter has assumed that all content is equal. Algorithms for quality assessment are almost always content blind and only look at low-level feature descriptions such as texture to make quality judgments. Humans, however, relate to images at a much higher level. While distortions in texture and smooth regions definitely influence user perceptions of quality, the content of the image has an important role to play in the overall quality rating. Images that a subject likes, for example, a picture of the subject's baby, may be rated as higher quality even if the distortion on the image is unacceptable. Likes and dislikes may not always be personal. Sunsets, beaches, good-looking faces, etc. are universally liked. Current databases for visual quality assessment attempt to remove this bias by selecting content that does not necessarily invoke strong feelings of like or dislike. While this helps in understanding human perceptions of quality, real-life content seldom tends to be “boring.” It is hence of interest to try to ferret out content preferences and use this as a seed in producing quality ratings. If the goal of visual quality assessment is to replace the human observer, human biases need to be modeled and the area of content bias needs to be explored.
  • Aesthetics. Closely related to the content bias is the notion of aesthetics [123–128]. The aesthetics of an image or video capture how appealing a video is to a user. While there has been some research in the field of visual aesthetics [123, 124, 126–128] a lot more remains to be done. Current aesthetics algorithms more often than not can only classify images as pleasing and not-pleasing. Those which produce aesthetics ratings do not correlate with human perception as well as quality assessment algorithms. Apart from improving this performance, it is of interest to study the joint effect of aesthetics and distortions on an image. Is a pleasingly shot image as bad as a non-pleasingly shot one under the same distortion regime? This question ties in closely with content bias, for while aesthetics may be evaluated independently of content, an ideal algorithm that reproduces human behavior needs to understand and model human opinion on aesthetics, content, and quality, and all interaction between the three.
  • No-Reference Video Quality Assessment. In comparison with advances in NR IQA, algorithms for NR VQA remain scarce. This is not without reason. The addition of motion information in the visual content complicates the problem of NR QA drastically. Indeed, even in the slightly easier problem of NR VQA, successful algorithms on one database do not necessarily translate into successful algorithms on another [80, 93]. The nature of the distortion in question, and its effect on motion information, needs to be studied in detail before any modeling effort for motion is undertaken. As can be imagined, this is not an easy task. One possible direction of effort that may yield results is that of task-specific quality assessment. For example, in the video teleconferencing use case, one is interested in producing high-quality facial images, where the face occupies a major portion of the screen. VQA algorithms could target the face and use it as the region-of-interest to gauge quality. Obviously, this will not translate to other use cases. However, given the lack of progress made in understanding motion models with respect to distortions, it may be worthwhile investigating application-specific models.
  • Human Behavior Modeling. One aspect of video quality assessment that remains relatively unexplored is that of human behavioral modeling [129, 130]. When humans view a video, the overall satisfaction with the video is a function of the temporal dynamics that the human sees as the video plays out. The temporal variation in the quality as the user views the video, and its effect on overall quality perception, is an interesting field of study [93, 129]. For instance, one could ask the question: If a low-quality segment precedes a high-quality segment, is the overall satisfaction higher than when the case is reversed? This is of course one of the many questions that can be asked and, in order to answer these questions, one needs databases that have been developed to study human responses to time-varying quality. It is only recently that such databases have started to appear [93]. Once human responses to temporal quality variation are decoded, models need to be designed to account for such human behavioral responses and incorporated into video quality assessment algorithms. The design of the databases that may answer these questions is in itself a pretty arduous task, and we do not expect the modeling to be any easier. However, once such models are available, the applications are tremendous. Apart from the obvious improvements to existing VQA algorithms, such models will allow for scheduling of video packets to be transmitted based on current and predicted future channel conditions. Scheduling visual information to maximize perceptual quality is a field that is still nascent, and remains of tremendous future interest.

5.7 Conclusion

In this chapter, we have summarized recent research in the field of no-reference or blind image and video quality assessment. The field of NR QA has recently witnessed a host of activity, and several high-performance algorithms have been developed to solve the problem of predicting human perceptions of quality without the need for a reference. The methods that we explored were broken down by the amount of information available to the algorithm during the “training” stage prior to deployment. As we summarized, even in the face of almost no information regarding the distortion type or human opinion on quality, there exist algorithms that predict visual quality with a fair degree of accuracy. This is an extremely encouraging fact, and we expect high-performing NR QA algorithms to approach the level of full-reference correlation on the near future. While great progress has been made on the image quality front, no-reference video quality assessment has received less attention. As we summarized, the complexity of motion and its interaction with distortion make NR VQA a difficult problem to solve.

While the field is certainly maturing, it is far from mature. We summarized a handful of challenges that the field of quality assessment needs to address. Primary amongst the challenges is that of deployment of QA algorithms in practical applications. While QA algorithms have been researched painstakingly, applying QA algorithms to traditional image-processing problems is still under explored. This is true in the case of FR QA algorithms [114], but even more so in the case of NR QA algorithms [50]. As NR QA algorithm performance peaks, we hope that the traditional measures of error, such as mean-squared error, will be replaced by far more meaningful perceptual quality measures. Of course, such a replacement requires demonstration of tangible success – a task that needs concentrated involvement of researchers in the field of quality assessment.

The future is still bright for visual quality measures, especially in areas that have not been explored much before – such as interactions between visual quality and visual tasks. It is but natural to posit separation of measurements of quality impairment (from capture, processing, compression, transmission, post-processing) from scene-dependent factors, so their effects on detection, recognition, or other tasks can be identified and mitigated. This is particularly true in high-distortion environments, such as the increasingly crowded wireless/mobile environment. A principled, ground-up approach is needed whereby the effects of blindly measured video quality degradations on visual tasks can be established. This is of particular importance in forthcoming wireless vision applications where severe distortions occur, and in security applications such as human tracking, which have taken on an increasingly important role in modern-day systems.

References

  1. Ullman, S. and Poggio, T., Vision: A computational investigation into the human representation and process of visual information. MIT Press, Cambridge, MA, 2010.
  2. Grady, D., ‘The vision thing: Mainly in the brain.’ Discover, 14(6), 1993, 56–66.
  3. Bovik, A.C., Handbook of Image and Video Process. Academic Press, New York, 2010.
  4. Heeger, D.J. and Simoncelli, E.P., ‘Model of visual motion sensing.’ Spatial Vision in Humans and Robots, 19, 1993, 367–392.
  5. Jayant, N., Johnston, J., and Safranek, R., ‘Sig. compression based on models of human perception.’ Proceedings of the IEEE, 81(10), 1993, 1385–1422.
  6. Park, J., Seshadrinathan, K., Lee, S., and Bovik, A.C., ‘VQ pooling: Video quality pooling adaptive to perceptual distortion severity.’ IEEE Transactions on Image Processing, 22(2), 2013, 610–620.
  7. Seshadrinathan, K. and Bovik, A.C., ‘Automatic prediction of perceptual quality of multimedia sig.sa survey.’ Multimedia Tools and Applications, 51(1), 2011, 163–186.
  8. Chandler, D.M. and Hemami, S.S., ‘VSNR: A wavelet-based visual Sig.-to-noise ratio for natural images.’ IEEE Transactions on Image Processing, 16(9), 2007, 2284–2298.
  9. Daly, S., ‘The visible differences predictor: An algorithm for the assessment of image fidelity.’ In Watson, A.B. (ed.), Digital Images and Human Vision. MIT Press, Cambridge, MA, 1993, pp. 179–206.
  10. Lubin, J., ‘The use of psychophysical data and models in the analysis of display system performance.’ In Watson, A.B. (ed.), Digital Images and Human Vision. MIT Press, Cambridge, MA, 1993, pp. 163–178.
  11. Teo, P.C. and Heeger, D.J., ‘Perceptual image distortion.’ IEEE International Conference on Image Processing, Vol. 2, 1994, pp. 982–986.
  12. Egiazarian, K., Astola, J., Ponomarenko, N., Lukin, V., Battisti, F., and Carli, M., ‘New full-reference quality metrics based on HVS.’ International Workshop on Video Processing and Quality Metrics, 2006.
  13. Wang, Z. and Bovik, A.C., ‘A universal image quality index.’ IEEE Signal Processing Letters, 9(3), 2002, 81–84.
  14. Wang, Z., Bovik, A.C., Sheikh, H.R., and Simoncelli, E.P., ‘Image quality assessment: From error visibility to structural similarity.’ IEEE Transactions on Image Processing, 13(4), 2004, 600–612.
  15. Wang, Z., Simoncelli, E.P., and Bovik, A.C., ‘Multiscale structural similarity for image quality assessment.’ Asilomar Conference on Signals, Systems and Computers, Vol. 2, 2003, pp. 1398–1402.
  16. Sheikh, H.R., Bovik, A.C., and De Veciana, G., ‘An information fidelity criterion for image quality assessment using natural scene statistics.’ IEEE Transactions on Image Processing, 14(12), 2005, 2117–2128.
  17. Sheikh, H.R. and Bovik, A.C., ‘Image information and visual quality.’ IEEE Transactions on Image Processing, 15(2), 2006, 430–444.
  18. Soundararajan, R. and Bovik, A.C., ‘Rred indices: Reduced reference entropic differencing for image quality assessment.’ IEEE Transactions on Image Processing, 21(2), 2012, 517–526.
  19. Li, Q. and Wang, Z., ‘Reduced-reference image quality assessment using divisive normalization-based image representation.’ IEEE Journal of Selected Topics in Signal Processing, 3(2), 2009, 202–211.
  20. Chono, K., Lin, Y.C., Varodayan, D., Miyamoto, Y., and Girod, B., ‘Reduced-reference image quality assessment using distributed source coding.’ IEEE International Conference on Multimedia and Expo, 2008, pp. 609–612.
  21. Engelke, U., Kusuma, M., Zepernick, H.J., and Caldera, M., ‘Reduced-reference metric design for objective perceptual quality assessment in wireless imaging.’ Signal Processing: Image Communication, 24(7), 2009, 525–547.
  22. Barland, R. and Saadane, A., ‘Reference free quality metric using a region-based attention model for JPEG-2000 compressed images.’ Proceedings of SPIE, 6059, 2006, 605 905-1–605 905-10.
  23. Chen, J., Zhang, Y., Liang, L., Ma, S., Wang, R., and Gao, W., ‘A no-reference blocking artifacts metric using selective gradient and plainness measures.’ Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing. Springer-Verlag, Berlin, 2008, pp. 894–897.
  24. Suthaharan, S., ‘No-reference visually significant blocking artifact metric for natural scene images.’ Journal of Signal Processing, 89(8), 2009, 1647–1652.
  25. Sheikh, H.R., Bovik, A.C., and Cormack, L.K., ‘No-reference quality assessment using natural scene statistics: JPEG2000.’ IEEE Transactions on Image Processing, 14(11), 2005, 1918–1927.
  26. Varadarajan, S. and Karam, L.J., ‘An improved perception-based no-reference objective image sharpness metric using iterative edge refinement.’ IEEE International Conference on Image Processing, 2008, pp. 401–404.
  27. Ferzli, R. and Karam, L.J., ‘A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB).’ IEEE Transactions on Image Processing, 18(4), 2009, 717–728.
  28. Narvekar, N.D. and Karam, L.J., ‘A no-reference perceptual image sharpness metric based on a cumulative probability of blur detection.’ IEEE International Workshop on Quality of Multimedia Expo, 2009, pp. 87–91.
  29. Sadaka, N.G., Karam, L.J., Ferzli, R., and Abousleman, G.P., ‘A no-reference perceptual image sharpness metric based on saliency-weighted foveal pooling.’ IEEE International Conference on Image Processing, 2008, pp. 369–372.
  30. Li, X., ‘Blind image quality assessment.’ IEEE International Conference on Image Processing, Vol. 1, 2002, pp. 449–452.
  31. Ye, P. and Doermann, D., ‘No-reference image quality assessment using visual codebook.’ IEEE International Conference on Image Processing, 2011.
  32. Ye, P. and Doermann, D., ‘No-reference image quality assessment using visual codebooks.’ IEEE Transactions on Image Processing, 21(7), 2012, 3129–3138.
  33. Gabarda, S. and Cristóbal, G., ‘Blind image quality assessment through anisotropy.’ Journal of the Optical Society of America, 24(12), 2007, 42—51
  34. Tang, H., Joshi, N., and Kapoor, A., ‘Learning a blind measure of perceptual image quality.’ IEEE International Conference on Computer Vision Pattern Recognition, 2011.
  35. Moorthy, A.K. and Bovik, A.C., ‘Blind image quality assessment: From natural scene statistics to perceptual quality.’ IEEE Transactions on Image Processing, 20(12), 2011, 3350–3364.
  36. Saad, M., Bovik, A.C., and Charrier, C., ‘Blind image quality assessment: A natural scene statistics approach in the DCT domain.’ IEEE Transactions on Image Processing, 21(8), 2012, 3339–3352.
  37. Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘No-reference image quality assessment in the spatial domain.’ IEEE Transactions on Image Processing, 21(12), 2012, 4695–4708.
  38. Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘Making image quality assessment robust.’ Asilomar Conference on Signals, Systems and Computers, 2012, pp. 1718–1722.
  39. Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘Blind/referenceless image spatial quality evaluator.’ Asilomar Conference on Signals, Systems and Computers, 2011, pp. 723–727.
  40. Mittal, A., Muralidhar, G.S., Ghosh, J., and Bovik, A.C., ‘Blind image quality assessment without human training using latent quality factors.’ IEEE Signal Processing Letters, 19, 2011, 75–78.
  41. Mittal, A., Soundararajan, R., Muralidhar, G., Ghosh, J., and Bovik, A.C., ‘Un-naturalness modeling of image distortions.’ Vision Sciences Society, 2012.
  42. Xue, W., Zhang, L., and Mou, X., ‘Learning without human scores for blind image quality assessment.’ IEEE International Conference on Computer Vision Pattern Recognition, 2011.
  43. Mittal, A., Soundararajan, R., Muralidhar, G.S., Bovik, A.C., and Ghosh, J., ‘Blind image quality assessment without training on human opinion scores.’ In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2013, pp. 86510T–86510T.
  44. Mittal, A., Soundararajan, R., and Bovik, A.C., ‘Making a “completely blind” image quality analyzer.’ IEEE Signal Processing Letters, 20(3), 2013, 209–212.
  45. Mittal, A., Soundararajan, R., and Bovik, A.C., ‘Prediction of image naturalness and quality.’ Journal of Vision, 13(9), 2013, 1056.
  46. Soundararajan, R. and Bovik, A.C., ‘Survey of information theory in visual quality assessment.’ Signal, Image and Video Processing, 7(3), 2013, 391–401.
  47. Moorthy, A.K. and Bovik, A.C., ‘Visual quality assessment algorithms: What does the future hold?’ Multimedia Tools and Applications, 51(2), 2011, 675–696.
  48. Meesters, L. and Martens, J.B., ‘A single-ended blockiness measure for JPEG-coded images.’ Journal of Signal Processing, 82(3), 2002, 369–387.
  49. Wang, Z., Sheikh, H.R., and Bovik, A.C., ‘No-reference perceptual quality assessment of JPEG compressed images.’ International Conference on Image Processing, 1, 2002, 477–480.
  50. Moorthy, A.K., Mittal, A., and Bovik, A.C., ‘Perceptually optimized blind repair of natural images.’ Signal Processing: Image Communication, 28, 2013, 1478–1493.
  51. Wang, Z. and Bovik, A.C., ‘Reduced- and no-reference image quality assessment.’ IEEE Signal Processing Magazine, 28(6), 2011, 29–40.
  52. Shen, J., Li, Q., and Erlebacher, G., ‘Hybrid no-reference natural image quality assessment of noisy, blurry, JPEG2000, and JPEG images.’ IEEE Transactions on Image Processing, 20(8), 2011, 2089–2098.
  53. Ruderman, D.L. and Bialek, W., ‘Statistics of natural images: Scaling in the woods.’ Physical Review Letters, 73, 1994, 814–817.
  54. Soundararajan, R. and Bovik, A.C., ‘RRED indices: Reduced reference entropic differencing for image quality assessment.’ IEEE Transactions on Image Processing, 21(2), 2011, 517–526.
  55. Ruderman, D.L., ‘The statistics of natural images.’ Network Computation in Neural Systems, 5(4), 1994, 517–548.
  56. Chen, X. and Cai, D., ‘Large scale spectral clustering with landmark-based representation.’ Advances in Artificial Intelligence, 2011.
  57. Tan, K.T. and Ghanbari, M., ‘Blockiness detection for mpeg2-coded video.’ IEEE Signal Processing Letters, 7(8), 2000, 213–215.
  58. Vlachos, T., ‘Detection of blocking artifacts in compressed video.’ Electronics Letters, 36(13), 2000, 1106–1108.
  59. Suthaharan, S., ‘Perceptual quality metric for digital video coding.’ Electronics Letters, 39(5), 2003, 431–433.
  60. Muijs, R. and Kirenko, I., ‘A no-reference blocking artifact measure for adaptive video processing.’ European Signal Processing Conference, 2005.
  61. Lu, J., ‘Image analysis for video artifact estimation and measurement.’ Photonics West-Electronic Imaging, 2001, pp. 166–174.
  62. Huynh-Thu, Q. and Ghanbari, M., ‘Impact of jitter and jerkiness on perceived video quality.’ International Workshop on Video Processing and Quality Metrics for Consumer Electronics, 2006.
  63. Huynh-Thu, Q. and Ghanbari, M., ‘Temporal aspect of perceived quality in mobile video broadcasting.’ IEEE Transactions on Broadcasting, 54(3), 2008, 641–651.
  64. Mei, T., Hua, X.S., Zhu, C.Z., Zhou, H.Q., and Li, S., ‘Home video visual quality assessment with spatiotemporal factors.’ IEEE Transactions on Circuits and Systems for Video Technology, 17(6), 2007, 699–706.
  65. Yang, K., Guest, C.C., El-Maleh, K., and Das, P.K., ‘Perceptual temporal quality metric for compressed video.’ IEEE Transactions on Multimedia, 9(7), 2007, 1528–1535.
  66. Ou, Y.F., Ma, Z., Liu, T., and Wang, Y., ‘Perceptual quality assessment of video considering both frame rate and quantization artifacts.’ IEEE Transactions on Circuits and Systems for Video Technology, 21(3), 2011, 286–298.
  67. Caviedes, J.E. and Oberti, F., ‘No-reference quality metric for degraded and enhanced video.’ Visual Communication and Image Processing, 2003, pp. 621–632.
  68. Babu, R.V., Bopardikar, A.S., Perkis, A., and Hillestad, O.I., ‘No-reference metrics for video streaming applications.’ International Workshop on Packet Video, 2004.
  69. Farias, M.C.Q. and Mitra, S.K., ‘No-reference video quality metric based on artifact measurements.’ IEEE International Conference on Image Processing, Vol. 3, 2005, pp. III–141.
  70. Massidda, F., Giusto, D.D., and Perra, C., ‘No reference video quality estimation based on human visual system for 2.5/3G devices.’ Electronic Imaging, 2005, pp. 168–179.
  71. Dosselmann, R. and Yang, X.D., ‘A prototype no-reference video quality system.’ Fourth Canadian Conference on Computer and Robot Vision, 2007, pp. 411–417.
  72. Yang, F., Wan, S., Chang, Y., and Wu, H., ‘A novel objective no-reference metric for digital video quality assessment.’ IEEE Signal Processing Letters, 12(10), 2005, 685–688.
  73. Pastrana-Vidal, R.R. and Gicquel, J.C., ‘Automatic quality assessment of video fluidity impairments using a no-reference metric.’ International Workshop on Video Processing and Quality Metrics for Consumer Electronics, 2006.
  74. Yamada, T., Miyamoto, Y., and Serizawa, M., ‘No-reference video quality estimation based on error-concealment effectiveness.’ Packet Video, 2007, pp. 288–293.
  75. Naccari, M., Tagliasacchi, M., Pereira, F., and Tubaro, S., ‘No-reference modeling of the channel induced distortion at the decoder for H. 264/AVC video coding.’ IEEE International Conference on Image Processing, 2008, pp. 2324–2327.
  76. Ong, E.P., Wu, S., Loke, M.H., et al., ‘Video quality monitoring of streamed videos.’ IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 1153–1156.
  77. Keimel, C., Oelbaum, T., and Diepold, K., ‘No-reference video quality evaluation for high-definition video.’ IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 1145–1148.
  78. Hemami, S.S. and Reibman, A.R., ‘No-reference image and video quality estimation: Applications and human-motivated design.’ Signal Processing: Image Communication, 25(7), 2010, 469–481.
  79. Saad, M.A. and Bovik, A.C., ‘Blind quality assessment of videos using a model of natural scene statistics and motion coherency.’ Asilomar Conference on Signals, Systems and Computers, 2012.
  80. Seshadrinathan, K., Soundararajan, R., Bovik, A., and Cormack, L., ‘Study of subjective and objective quality assessment of video.’ IEEE Transactions on Image Processing, 19(6), 2010, 1427–1441.
  81. Jayaraman, D., Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘Objective image quality assessment of multiply distorted images.’ Asilomar Conference on Signals, Systems and Computers, 2012, pp. 1693–1697.
  82. Sheikh, H.R., Sabir, M.F., and Bovik, A.C., ‘A statistical evaluation of recent full reference image quality assessment algorithms.’ IEEE Transactions on Image Processing, 15(11), 2006, 3440–3451.
  83. Le Callet, P. and Autrusseau, F., ‘Subjective quality assessment IRCCYN/IVC database.’ http://www.irccyn.ec-nantes.fr/ivcdb/.
  84. Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian, K., Carli, M., and Battisti, F., ‘TID2008 – A database for evaluation of full-reference visual quality assessment metrics.’ Advances in Modern Radioelectronics, 10, 2009, 30–45.
  85. Larson, E.C. and Chandler, D.M., ‘Most apparent distortion: Full-reference image quality assessment and the role of strategy.’ Journal of Electronic Imaging, 19(1), 2010, 011 006-1–011 006-21.
  86. Ciancio, A.G., da Costa, A.L.N.T., da Silva, E.A.B., Said, A., Samadani, R., and Obrador, P., ‘No-reference blur assessment of digital pictures based on multifeature classifiers.’ IEEE Transactions on Image Processing, 20(1), 2011, 64–75.
  87. De Simone, F., Naccari, M., Tagliasacchi, M., Dufaux, F., Tubaro, S., and Ebrahimi, T., ‘Subjective assessment of h.264/avc video sequences transmitted over a noisy channel.’ Quality Multimedia Expo, 2009, pp. 204–209.
  88. De Simone, F., Tagliasacchi, M., Naccari, M., Tubaro, S., and Ebrahimi, T., ‘A h.264/avc video database for the evaluation of quality metrics.’ IEEE International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 2430–2433.
  89. Péchard, S., Pépion, R., and Le Callet, P., ‘Suitable methodology in subjective video quality assessment: A resolution dependent paradigm.’ International Workshop on Image Media Quality and its Applications, 2008.
  90. Rec. ITU-R BT.500-11. http://www.dii.unisi.it/menegaz/DoctoralSchool2004/papers/ITU-R_BT.500-11.pdf.
  91. Lee, J., De Simone, F., Ramzan, N., et al., ‘Subjective evaluation of scalable video coding for content distribution.’ International Conference on Multimedia, 2010, pp. 65–72.
  92. Lee, J., De Simone, F., and Ebrahimi, T., ‘Subjective quality evaluation via paired comparison: Application to scalable video coding.’ IEEE Transactions on Multimedia, 13(5), 2011, 882–893.
  93. Moorthy, A.K., Choi, L., Bovik, A.C., and De Veciana, G., ‘Video quality assessment on mobile devices: Subjective, behavioral and objective studies.’ IEEE Selected Topics in Signal Processing, 6(6), 2012, 652–671.
  94. Winkler, S., ‘Analysis of public image and video databases for quality assessment.’ IEEE Selected Topics in Signal Processing, 6(6), 2012, 616–625.
  95. Winkler, S., ‘List of image and video quality databases.’ http://stefan.winkler.net/resources.html.
  96. Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘No-reference image quality assessment in the spatial domain.’ IEEE Transactions on Image Processing, 21(12), 2012, 4695–4708.
  97. Buades, A., Coll, B., and Morel, J.M., ‘A review of image denoising algorithms, with a new one.’ Multiscale Modeling & Simulation, 4(2), 2005, 490–530.
  98. Mittal, A., Moorthy, A.K., and Bovik, A.C., ‘Automatic parameter prediction for image denoising algorithms using perceptual quality features.’ IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2012, pp. 82910G–82910G.
  99. Hansen, P.C., ‘Analysis of discrete ill-posed problems by means of the l-curve.’ SIAM Review, 34(4), 1992, 561–580.
  100. Hansen, P.C. and Oleary, D.P., ‘The use of the l-curve in the regularization of discrete hx-posed problems.’ SIAM Journal on Scientific Computing, 14(6), 1993, 1487–1503.
  101. Regińska, T., ‘A regularization parameter in discrete ill-posed problems.’ SIAM Journal on Scientific Computing, 17(3), 1996, 740–749.
  102. Oraintara, S., Karl, W., Castanon, D., and Nguyen, T., ‘A method for choosing the regularization parameter in generalized Tikhonov regularized linear inverse problems.’ International Conference on Image Processing, Vol. 1, 2000, pp. 93–96.
  103. Karl, W.C., ‘Regularization in image restoration and reconstruction.’ In Bovik, A.C. (ed.), Handbook of Image and Video Processing. Academic Press, New York, 2000, pp. 141–160.
  104. Craven, P. and Wahba, G., ‘Smoothing noisy data with spline functions.’ Numerische Mathematik, 31(4), 1978, 377–403.
  105. Golub, G.H., Heath, M., and Wahba, G., ‘Generalized cross-validation as a method for choosing a good ridge parameter.’ Technometrics, 21(2), 1979, 215–223.
  106. Nychka, D., ‘Bayesian confidence intervals for smoothing splines.’ Journal of the American Statistical Association, 83, 1988, 1134–1143.
  107. Thompson, A.M., Brown, J.C., Kay, J.W., and Titterington, D.M., ‘A study of methods of choosing the smoothing parameter in image restoration by regularization.’ Transactions on Pattern Analysis and Machine Intelligence, 13(4), 1991, 326–339.
  108. Galatsanos, N.P. and Katsaggelos, A.K., ‘Methods for choosing the regularization parameter and estimating the noise variance in image restoration and their relation.’ Transactions on Image Processing, 1(3), 1992, 322–336.
  109. Ramani, S., Blu, T., and Unser, M., ‘Monte-Carlo sure: A black-box optimization of regularization parameters for general denoising algorithms.’ Transactions on Image Processing, 17(9), 2008, 1540–1554.
  110. Blu, T. and Luisier, F., ‘The sure-let approach to image denoising.’ Transactions on Image Processing, 16(11), 2007, 2778–2786.
  111. Luisier, F., Blu, T., and Unser, M., ‘A new sure approach to image denoising: Interscale orthonormal wavelet thresholding.’ Transactions on Image Processing, 16(3), 2007, 593–606.
  112. Zhang, X. and Desai, M., ‘Adaptive denoising based on sure risk.’ Signal Processing Letters, 5(10), 1998, 265–267.
  113. Donoho, D. and Johnstone, I., ‘Adapting to unknown smoothness via wavelet shrinkage.’ Journal of the American Statistical Association, 90, 1995, 1200–1224.
  114. Channappayya, S.S., Bovik, A.C., and Heath, R.W., ‘A linear estimator optimized for the structural similarity index and its application to image denoising.’ International Conference on Image Processing, 2006, pp. 2637–2640.
  115. Zhu, X. and Milanfar, P., ‘Automatic parameter selection for denoising algorithms using a no-reference measure of image content.’ IEEE Transactions on Image Processing, 19(12), 2010, 3116–3132.
  116. Dabov, K., Foi, A., Katkovnik, V., and Egiazarian, K., ‘Image denoising by sparse 3-d transform-domain collaborative filtering.’ IEEE Transactions on Image Processing, 16(8), 2007, 2080–2095.
  117. Reinhard, E., Heidrich, W., Debevec, P., Pattanaik, S., Ward, G., and Myszkowski, K., High Dynamic Range Imaging: Acquisition, display, and image-based lighting. Morgan Kaufmann, Burlington, MA, 2010.
  118. Yeganeh, H. and Wang, Z., ‘Objective quality assessment of tone mapped images.’ IEEE Transactions on Image Processing, 22(2), 2013, 657–667.
  119. UCID – Uncompressed Colour Image Database. http://www.staff.lboro.ac.uk/cogs/datasets/UCID/ucid.html, 2004.
  120. Computer vision test images. http://www.2.cs.cmu.edu/afs/cs/project/cil/www/v-images.html, 2005.
  121. ‘Image obsessed.’ National Geographic, 221, 2012, p. 35.
  122. Johnson, D., Chen, B., Chen, N., Pan, J., and Huynh, J., ‘Photo quality helper.’ https://play.google.com/store/ apps/details?id=net.dorianj.rockout &feature=search result#?t=W251bGwsMSwxLDEsIm5ldC5kb3JpYW5q LnJvY2tvdXQiXQ.
  123. Li, C. and Chen, T., ‘Aesthetic visual quality assessment of paintings.’ IEEE Journal of Selected Topics in Signal Processing, 3(2), 2009, 236–252.
  124. Datta, R., Joshi, D., Li, J., and Wang, J., ‘Studying aesthetics in photographic images using a computational approach.’ Lecture Notes in Computer Science, 3953, 2006, 288.
  125. Datta, R., Li, J., and Wang, J., ‘Algorithmic inferencing of aesthetics and emotion in natural images: An exposition.’ IEEE International Conference on Image Processing, 2008, pp. 105–108.
  126. Ke, Y., Tang, X., and Jing, F., ‘The design of high-level features for photo quality assessment.’ IEEE Conference on Computer Vision Pattern Recognition, Vol. 1, 2006.
  127. Luo, Y. and Tang, X., ‘Photo and video quality evaluation: Focusing on the subject.’ European Conference on Computer Vision, 2008, pp. 386–399.
  128. Moorthy, A.K., Obrador, P., Oliver, N., and Bovik, A.C., ‘Towards computational models of visual aesthetic appeal of consumer videos.’ European Conference on Computer Vision, 2010.
  129. Seshadrinathan, K. and Bovik, A.C., ‘Temporal hysteresis model of time varying subjective video quality.’ IEEE International Conference on Acoustics, Speech and Signal Processing, 2011, pp. 1153–1156.
  130. Park, J., Seshadrinathan, K., Lee, S., and Bovik, A., ‘Spatio-temporal quality pooling accounting for transient severe impairments and egomotion.’ IEEE International Conference on Image Processing, 2010.

Acronyms

BLIINDS

BLInd Image Notator using DCT Statistics

BRISQUE

Blind/Referenceless Image Spatial Quality Evaluator

CSIQ

Categorical Subjective Image Quality

DA

Distortion Aware

DIIVINE

Distortion Identification-based Image Verity and INtegrity Evaluation

DMOS

Differential Mean Opinion Score

DU

Distortion Unaware

FR

Full Reference

HDR

High Dynamic Range

IQA

Image Quality Assessment

LCC

Linear Correlation Coefficient

MOS

Mean Opinion Score

MVG

Multi-Variate Gaussian

NIQE

Naturalness Image Quality Evaluator

NR

No Reference

NSS

Natural Scene Statistics

OA

Opinion Aware

OU

Opinion Unaware

PSNR

Peak-Signal-to-Noise Ratio

QA

Quality Assessment

RR

Reduced Reference

SROCC

Spearman Rank-Ordered Correlation Coefficient

SSIM

Structural SIMilarity Index

SVR

Support Vector Regression

TMQI

Tone-Mapped Quality Index

VQA

Video Quality Assessment

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset