8.1. Overview of Chapter

The overview of the basic components of the content-based image retrieval scheme is given in Figure 8.1, and the corresponding dataflow process is shown in Figure 8.3. The sections in this chapter harmonize with the data as they flow from one computational component to another as follows:

  • Interactive query formulation: Interactive query formulation is offered either by querying images and subimages or by offering a pattern of feature values and weights. To achieve interactive query formulation, an image is sketched, recorded, or selected from an image repository. With the query formulation, the aim is to search for particular images in the database. The mode of search might be one of the following three categories: search by association, target search, category search. For search by association, the intention of the user is to browse through a large collection of images without a specific aim. Search by association tries to find interesting images and is often applied in an iterative way by means of relevance feedback. Target search is to find similar (target) images in the image database. Note that similar image may imply a (partially) identical image or a (partially) identical object in the image. The third class is category search, where the aim is to retrieve an arbitrary image that is typical for a specific class or genre (e.g., indoor images, portraits, city views). Because many image retrieval systems are assembled around one of these three search modes, it is important to have insight into these categories and their structures. Search modes are discussed in Section 8.2.1.

  • Image domains: The definition of image features depends on the repertoire of images under consideration. This repertoire can be ordered along the complexity of variations imposed by the imaging conditions, such as illumination and viewing geometry, going from narrow domains to broad domains. For images from a narrow domain, there is a restricted variability of their pictorial content. Examples of narrow domains are stamp collections and face databases. For broad domains, images may be taken from objects from unknown viewpoints and illumination. For example, two recordings taken from the same object from different viewpoints will yield different shadowing, shading, and highlighting cues, changing the intensity data fields considerably. Moreover, large differences in the illumination color will drastically change the photometric content of images even when they are taken from the same scene. Hence, images from broad domains have a large pictorial variety called the sensory gap, discussed in Section 8.2.2. Furthermore, low-level image features are often too restricted to describe images on a conceptual or semantic level. This semantic gap is a well-known problem in content-based image retrieval and is discussed in Section 8.2.3.

  • Image features: Image feature extraction is an important step for image indexing and search. Image feature extraction modules should take into account whether the image domain is narrow or broad. In fact, they should consider to which of the imaging conditions they should be invariant to such a change in viewpoint, object pose, and illumination. Further, image features should be concise and complete, and at the same time, have high discriminative power. In general, a tradeoff exists between the amount of invariance and selectivity. In Section 8.3, a taxonomy on feature extraction modules is given from an image processing perspective. The taxonomy can be used to select the proper feature extraction method for a specific application based on whether images come from broad domains and which search goals are at hand (target, category, associate search). In Section 8.3.1, we first focus on color content descriptors derived from image processing technology. Various color-based image search methods are discussed based on different representation schemes, such as color histograms, color moments, color edge orientation, and color correlograms. These image representation schemes are created on the basis of RGB and other color systems such as HSI and CIE L*a*b*. For example, the L*a*b* space has been designed to conform to the human perception of color similarity. If the appreciation of a human observer of an object is based on the perception of certain conspicuous items in the image, it is natural to direct the computation of broad domain features to these points and regions. Similarly, a biologically plausible architecture [84] of center-surround processing units is likely to select regions that humans would also focus on first. Further, we discuss color models which are robust to a change in viewing direction, object geometry, and illumination. Image processing for shape is outlined in Section 8.3.2. We focus on local shapes, which are image descriptors capturing salient details in images. Finally, in Section 8.3.3, we discuss texture, and a review is given on texture features describing local color characteristics and their spatial layout.

  • Representation and indexing: Representation and indexing are discussed in Section 8.4. In general, the image feature set is represented by vector space, probabilistic, or logical models. For example, for the vector space model, weights can be assigned corresponding to the feature frequency giving the well-known histogram form. Further, for accurate image search, it is often desirable to assign weights in accordance to the importance of the image features. The image feature weights used for both images and queries can be computed as the product of the features frequency multiplied by the inverse collection frequency factor. In this way, features are emphasized having high feature frequencies but low overall collection frequencies. Feature accumulation and representation is further discussed in Section 8.4.2. In addition to feature representation, indexing is required to speed up the search process. Indexing techniques include adaptive histogram binning, signature files, and hashing. Further, tree-based indexing schemes have been developed for indexing the stored images so that similar images can be identified efficiently at some additional costs in memory, such as a k-d tree, R*-tree, or SS-tree [69].

    Throughout the chapter, a distinction is made between weak and strong segmentation. Weak segmentation is a local grouping approach usually focusing on conspicuous regions such as edges, corners, and higher-order junctions. In Section 8.4.4, various methods are discussed to achieve weak segmentation. Strong segmentation is the extraction of the complete contour of an object in an image. Obviously, strong segmentation is far more difficult than weak segmentation and is hard to achieve, if not impossible, for broad domains.

  • Similarity and search: The actual matching process can be seen as a search for images in the stored image set closest to the query specification. As both the query and the image data set is captured in feature form, the similarity function operates between the weighted feature sets. To make the query effective, close attention has to be paid to the selection of the similarity function. A proper similarity function should be robust to object fragmentation, occlusion, and clutter by the presence of other objects in the view. For example, it is known that the mean square and the Euclidean similarity measure provide accurate retrieval without any object clutter [59, 162]. A detailed overview on similarity and search is given in Section 8.5.

  • Interaction and Learning: Visualization of the feature matching results gives the user insight into the importance of the different features. Windowing and information display techniques can be used to establish communications between system and user. In particular, new visualization techniques such as 3D virtual image clouds can used to designate certain images as relevant to the user's requirements. These relevant images are then further used by the system to construct subsequent (improved) queries. Relevance feedback is an automatic process designed to produce improved query formulations following an initial retrieval operation. Relevance feedback is needed for image retrieval where users find it difficult to formulate pictorial queries. For example, without any specific query image example, the user might find it difficult to formulate a query (e.g., to retrieve an image of a car) by image sketch or by offering a pattern of feature values and weights. This suggests that the first search is performed by an initial query formulation and a (new) improved query formulation is constructed based on the search results with the goal to retrieve more relevant images in the next search operations. Hence, from the user feedback giving negative/positive answers, the method can automatically learn which image features are most important. The system uses the feature weighting given by the user to find the images in the image database that are optimal with respect to the feature weighting. For example, search by association allows users to refine iteratively the query definition, the similarity, or the examples with which the search was started. Therefore, systems in this category are highly interactive. Interaction, relevance feedback, and learning are discussed in Section 8.6.

  • Testing: In general, image search systems are assessed in terms of precision, recall, query-processing time, and reliability of a negative answer. Further, the relevance feedback method is assessed in terms of the number of iterations to approach to the ground truth. Today, more and more images are archived, yielding a very large range of complex pictorial information. In fact, the average number of images used for experimentation, as reported in the literature, augmented from a few in 1995 to over a hundred thousand today. It is important that the data set should have ground truths, that is, images that are relevant and images that are nonrelevant to a given query. In general, it is hard to get these ground truths, especially for very large data sets. A discussion on system performance is given in Section 8.6.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset