Chapter 3. Indexing

This chapter is all about indexes and how to create them in Sphinx. Indexes are the most important component when using Sphinx.

In this chapter we shall:

  • See what indexes are and how they help in searching. We will also learn how they are created by using Sphinx's indexer utility.
  • We will learn what data sources are and what different types are available in Sphinx.

So let's get on with it...

What are indexes?

Wikipedia defines a database index as follows:

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.

Let's use an example to understand this. A library has a catalog of all the books at its disposal. If you want to look for a particular book, you will quickly search through the catalog instead of searching through every isle or shelf for that book. The catalog acts as an index of all the books.

In the computing world an index is something similar. It saves you the trouble of having to search through every record in the database. Instead, you speed up your query by searching a subset of data that is highly optimized for quick reference. This set of data is called an index and it is separate from the original data stored in the database.

To give you a better picture of this, the following table relates a Library to a Database.

Library

Database

Library is a collection of books

Database is a collection of data

To find a book, you go through every row of the shelves

To find a match, you go through every record in the database table

To facilitate searching, a library maintains a catalog

To facilitate searching, a database maintains indexes

It is easy to refer to a catalog to figure out where to find a book

It is easy to refer to an index to find out a record

When a new book is added, the librarian has to update the catalog

When a new record is inserted, the index has to be updated

The drawback of creating an index is that it requires additional space to store the index and additional time to create it as well. However, the speed we gain while searching overshadows these drawbacks by miles.

Indexes in Sphinx

Indexes in Sphinx are a bit different from indexes we have in databases. The data that Sphinx indexes is a set of structured documents and each document has the same set of fields. This is very similar to SQL, where each row in the table corresponds to a document and each column to a field.

Sphinx builds a special data structure that is optimized for answering full-text search queries. This structure is called an index and the process of creating an index from the data is called indexing.

The indexes in Sphinx can also contain attributes that are highly optimized for filtering. These attributes are not full-text indexed and do not contribute to matching. However, they are very useful at filtering out the results we want based on attribute values.

There can be different types of indexes suited for different tasks. The index type, which has been implemented in Sphinx, is designed for maximum indexing and searching speed.

The indexes are stored in a file on the file system as specified in the Sphinx configuration file. In the previous chapter it was /usr/local/sphinx/var/data/test1.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset