Time for action - Adding an MVA to the index

  1. Once again modify the sphinx-blog.conf file to add/modify the options as highlighted next:
    source blog
    {
    type = mysql
    sql_host = localhost
    sql_user = root
    sql_pass =
    sql_db = myblog
    sql_query = 
    SELECT id, title, content, UNIX_TIMESTAMP(publish_date) 
    AS publish_date, author_id FROM posts
    sql_attr_uint = author_id
    sql_attr_multi = uint category_id from query; 
    SELECT post_id, category_id FROM posts_categories
    sql_attr_timestamp = publish_date
    sql_query_info = SELECT id, title FROM posts WHERE ID=$id
    }
    index posts
    {
    source = blog
    path = /usr/local/sphinx/var/data/blog
    docinfo = extern
    charset_type = sbcs
    }
    indexer
    {
    mem_limit = 32M
    }
    
  2. Run the indexer again to re-index the data:
    $ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx-blog.conf all
    
  3. Search for all posts containing the term "language" and having the category "Search" (category_id = 5):
    $ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 5 language
    
    Time for action - Adding an MVA to the index

What just happened?

We added a new option in source configuration called as sql_attr_multi. This option is used to declare the Multi valued attribute (MVA). This attribute only applies to SQL source types.

sql_attr_multi allows us to attach more than one value to an attribute. The format in which this option is declared is as follows:

sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]

The option's parameters are as follows:

  • ATTR-TYPE: Is uint or timestamp
  • SOURCE-TYPE: Is 'field', 'query', or 'ranged-query'
  • QUERY: Is an SQL query executed to fetch all docid, attribute value pairs
  • RANGE-QUERY: Is SQL query used to fetch min and max ID values, similar to sql_query_range, which we will see later on

We declared sql_attr_multi as:

sql_attr_multi = uint category_id from query; 
SELECT post_id, category_id FROM posts_categories

This means that category_id is a uint attribute and will hold multiple values as returned by the query. The query that follows will get all the categories associated with the current post, such as the current document ID.

Let's search for all posts with any of the words from the string "web games php" having category "Programming" (category_id 1) and written by Rita Chouhan (author_id 3).

$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -a -f category_id 1 -f author_id 3 web games php
What just happened?

The previous search query returned the document ID 2 as the result. In the search query we used filters on two fields: author_id and category_id, and also used an option a that specifically searched for any word from the given phrase.

The search result also shows us the number of documents Sphinx found for each word and the number of times (hits) each word appeared in those documents.

Filtering without searching for a specific phrase

At times we may want to only filter the results without performing a full-text search. For example: Find all posts with category PHP having any title or content, and written by any author. In such a case we don't have a specific search term, but we only want to filter by the category_id attribute. To achieve this we can issue a search command, as demonstrated in the following screenshot, without passing any search term:

$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 4
Filtering without searching for a specific phrase

As we wanted to search for all posts having category PHP, we didn't pass the search term and just filtered the results by category_id. This gave us two documents with category_id = 4.

xmlpipe data source

xmlpipe data source enables users to implement their own data source drivers. At times we cannot use an SQL data source because the data might be coming from sources such as text files, mailboxes, RSS feeds, and so on. In such cases we can implement a custom driver using xmlpipe data source.

Note

xmlpipe data source is limited to two fixed fields and two fixed attributes, and above all, it is deprecated in favor of the xmlpipe2 data source explained in the next section.

We won't be going into the nitty-gritty of xmlpipe data source, instead, we will see how xmlpipe2 data source works in greater detail.

xmlpipe2 data source

xmlpipe2 is preferred over the older xmlpipe data source. It lets you pass arbitrary full-text and attribute data to Sphinx in a custom XML format. When the data source is configured to use xmlpipe2, indexer executes the given command and opens a pipe to its stdout. A well formed XML stream is then expected as the output of the command.

The XML schema (set of fields and attributes) can be defined either in the configuration file or the XML stream itself.

Indexing with schema defined in XML stream

Let's see how the xmlpipe2 data source is configured and what XML the indexer expects at the stdout of the given command.

Note

We are going to continue with the same blog example as used in the SQL data sources section earlier in this chapter.

Firstly, let's see how to index the posts table without any attributes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset