8.2. Image Domains

In this section, we discuss patterns in image search applications, the repertoire of images, the influence of the image formation process, and the semantic gap between image descriptors and the user.

8.2.1. Search modes

We distinguish three broad categories of search modes when using a content-based image retrieval system; see Figure 8.4.

  • Search by association encompasses a broad variety of methods and systems designed to browse through a large set of images from unspecified sources. At the start, users of search by association have no specific aims other than to find interesting images. Search by association often implies iterative refinement of the search, the similarity, or the examples with which the search was initiated. Systems in this category are highly interactive, where the query specification may be defined by sketch [28] or by example images. The oldest realistic example of such a system is probably [91]. The result of the search can be manipulated interactively by relevance feedback [76]. To support the quest for relevant results, sources other than images are also employed, for example [163].

  • The purpose of target search is to find a specific image. The search may be for a precise copy of the image in mind, as in searching art catalogs, for example [47]. Target search may also be for another image of the same object the user has an image of. This is target search by example. Target search may also be applied when the user has a specific image in mind and the target is interactively specified as similar to a group of given examples, for instance [29]. These systems are suited to search for stamps, paintings, industrial components, textile patterns, and catalogs in general.

  • Category search is aimed at retrieving an arbitrary image representative of a specific class. This is the case when the user has an example and the search is for other elements of the same class or genre. Categories may be derived from labels or may emerge from the database [164, 105]. In category search, the user may have available a group of images and the search is for additional images of the same class [25]. A typical application of category search is catalogs of varieties. In [82, 88], systems are designed for classifying trademarks. Systems in this category are usually interactive with a domain-specific definition of similarity.

Figure 8.4. Three patterns in the purpose of content-based retrieval systems.


8.2.2. The sensory gap

In the repertoire of images under consideration (the image domain), there is a gradual distinction between narrow and broad domains [154]. At one end of the spectrum, we have the narrow domain:

A narrow domain has a limited and predictable variability in all relevant aspects of its appearance.

Hence, in a narrow domain, we find images with a reduced diversity in their pictorial content. Usually, the image formation process is similar for all recordings. When the object's appearance has limited variability, the semantic description of the image is generally well-defined and largely unique. An example of a narrow domain is a set of frontal views of faces recorded against a clear background. Although each face is unique and has large variability in the visual details, there are obvious geometrical, physical, and illumination constraints governing the pictorial domain. The domain would be wider if the faces had been photographed from a crowd or from an outdoor scene. In that case, variations in illumination, clutter in the scene, occlusion, and viewpoint will have a major impact on the analysis.

On the other end of the spectrum, we have the broad domain:

A broad domain has an unlimited and unpredictable variability in its appearance even for the same semantic meaning.

In broad domains, images are polysemic and their semantics are described only partially. It might be the case that there are conspicuous objects in the scene for which the object class is unknown or even that the interpretation of the scene is not unique. The broadest class available today is the set of images available on the Internet.

Many problems of practical interest have an image domain in between these extreme ends of the spectrum. The notions of broad and narrow are helpful in characterizing patterns of use, in selecting features, and in designing systems. In a broad image domain, the gap between the feature description and the semantic interpretation is generally wide. For narrow, specialized image domains, the gap between features and their semantic interpretation is usually smaller, so domain-specific models may be of help.

For broad image domains in particular, we must resort to generally valid principles. Is the illumination of the domain white or colored? Does it assume fully visible objects, or may the scene contain clutter and occluded objects as well? Is it a 2D recording of a 2D scene or a 2D recording of a 3D scene? The given characteristics of illumination, presence or absence of occlusion, clutter, and differences in camera viewpoint determine the demands on the methods of retrieval.

The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene.

The sensory gap makes the description of objects an ill-posed problem: it yields uncertainty in what is known about the state of the object. The sensory gap is particularly poignant when a precise knowledge of the recording conditions is missing. The 2D records of different 3D objects can be identical. Without further knowledge, one has to decide that they might represent the same object. Also, a 2D recording of a 3D scene contains information accidental for that scene and that sensing but one does not know what part of the information is scene-related. The uncertainty due to the sensory gap holds not only for the viewpoint, but also for occlusion (where essential parts telling two objects apart may be out of sight), clutter, and illumination.

8.2.3. The semantic gap

As stated in the previous sections, content-based image retrieval relies on multiple low-level features (e.g., color, shape, and texture) describing the image content. To cope with the sensory gap, these low-level features should be consistent and invariant to remain representative of the repertoire of images in the database. For image retrieval by query by example, the online image retrieval process consists of a query example image, given by the user on input, from which low-level image features are extracted. These image features are used to find images in the database that are most similar to the query image. A drawback, however, is that these low-level image features are often too restricted to describe images on a conceptual or semantic level. It is our opinion that ignoring the existence of the semantic gap is the cause of many disappointments on the performance of early image retrieval systems.

The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation.

A user wants to search for images on a conceptual level (e.g., images containing particular objects [target search] or conveying a certain message or genre [category search]). Image descriptions, on the other hand, are derived by low-level, data-driven methods. The semantic search by the user and the low-level syntactic image descriptors may be disconnected. Association of a complete semantic system to image data would entail, at least, solving the general object recognition problem. Since this problem is yet unsolved and is likely to stay unsolved in its entirety, research is focused on different methods to associate higher level semantics to data-driven observables.

Indeed, the most reasonable tool for semantic image characterization entails annotation by keywords or captions. This converts content-based image access to (textual) information retrieval [134]. Common objections to the practice of labeling are cost and coverage. On the cost side, labeling thousands of images is a cumbersome and expensive job to the degree that the deployment of the economic balance behind the database is likely to decrease. To solve the problem, systems presented in [140, 139] use a program that explores the Internet, collecting images and inserting them in a predefined taxonomy on the basis of the text surrounding them. A similar approach for digital libraries is taken by [19]. On the coverage side, labeling is seldom complete and context-sensitive. In any case, there is a significant fraction of requests whose semantics cannot be captured by labeling alone [7, 72]. Both methods cover the semantic gap only in isolated cases.

8.2.4. Discussion

We have discussed three broad types of search categories: target search, category search, and search by association. Target search is related to the classical methods in the field of pattern matching and computer vision such as object recognition and image matching. However, image retrieval differs from traditional pattern matching by considering more and more images in the database. Therefore, new challenges in content-based retrieval are in the huge number of images to search among, the query specification by multiple images, and the variability of imaging conditions and object states. Category search connects to statistical pattern recognition methods. However, compared to traditional pattern recognition, new challenges are in the interactive manipulation of results, the usually very large number of object classes, and the absence of an explicit training phase for feature and classifier tuning (active learning). Search by association is the most distant from the classical field of computer vision. It is severely hampered by the semantic gap. As long as the gap is there, use of content-based retrieval for browsing will not be within the grasp of the general public, as humans are accustomed to rely on the immediate semantic imprint the moment they see an image.

An important distinction we have discussed is that between broad and narrow domains. The broader the domain, the more browsing or search by association should be considered during system setup. The narrower the domain, the more target search should be taken as search mode.

The major discrepancy in content-based retrieval is that the user wants to retrieve images on a semantic level, but the image characterizations can provide similarity only on a low-level syntactic level. This is called the semantic gap. Furthermore, another discrepancy is that between the properties in an image and the properties of the object. This is called the sensory gap. Both the semantic and sensory gap play a serious limiting role in the retrieval of images based on their content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset