Index attributes

Attributes in an index are used to perform additional filtering and sorting during search. They are basically additional values linked to each document in the index.

Let's try to understand the attributes using an example. Suppose you want to search through a catalog of books stored in the index. You want the results to be sorted by the date on which the book was published and then by the cost of the book. For this you need not put the date and cost of the book in the full-text index. You can specify these two values as attributes and then sort the results of you search query by these attributes. These attributes will play no role in searching but will play a major role in sorting the search results.

Note

Attributes play some role in relevancy when SPH_SORT _EXPR sort mode is used

Another use of attributes would be to filter the search results. You can filter your results for the specified date range so that only those books that were published in the given time period are returned.

Another good example to understand the attributes would be a blogging system. Typically only the title and content of a blog post needs to be full-text searchable, despite the fact that on many occasions we want the search to be limited to a certain author or category. For such cases we can use attributes to filter the search results, and we can return only those posts whose author (or category) attribute in the index is the same as the selected author or category filter.

So, full-text search results can not only be processed based on matching documents, but on many other document attributes as well. It is possible to sort the results purely based on attributes.

One other important characteristic of attributes is that they are returned in search results while the actual indexed data is not. When displaying search results, you may use the returned attribute values as it is, while for displaying the full-text data you need to get it from the original source.

Types of attributes

The data on the basis of which the documents should be filtered can be of various types. To cater to this and for more efficient filtering, attributes can be of the following types:

  • Unsigned integers (1 bit to 32 bit wide)
  • Floating point values (32 bit, IEEE 754 single precision)
  • String ordinals --enable-id64
  • UNIX timestamps
  • Multi-value attributes (MVA)

Attribute names are always case insensitive. They are stored in the index but cannot be searched as full-text.

Multi-value attributes (MVA)

MVAs are a special type of attribute in Sphinx that make it possible to attach multiple values to every document. These attributes are especially useful in cases where each document can have multiple values for the same property (field).

In our previous example of a blog post, each post can have multiple tags associated with it. Now if you want to filter the search based on tags, MVAs can be used in this case. For example, a post has php, programming, and opensource as tags, and if we use an MVA to hold these values, then filtering a search by any of those three tags would return the same post (and any other posts with the same tags).

MVAs are specified as lists and its entries are limited to unsigned 32-bit integers. The list itself is not limited and an MVA can hold any number of entries for each document, as long as RAM permits.

Note

Search results can be filtered or grouped by MVA but cannot be sorted by MVA.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset