Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Time for action - Adding an MVA to the index

Once again modify the sphinx-blog.conf file to add/modify the options as highlighted next:

source blog
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = myblog
sql_query = 
SELECT id, title, content, UNIX_TIMESTAMP(publish_date) 
AS publish_date, author_id FROM posts
sql_attr_uint = author_id
sql_attr_multi = uint category_id from query; 
SELECT post_id, category_id FROM posts_categories
sql_attr_timestamp = publish_date
sql_query_info = SELECT id, title FROM posts WHERE ID=$id
}
index posts
{
source = blog
path = /usr/local/sphinx/var/data/blog
docinfo = extern
charset_type = sbcs
}
indexer
{
mem_limit = 32M
}

Run the indexer again to re-index the data:

$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx-blog.conf all

Search for all posts containing the term "language" and having the category "Search" (category_id = 5):

$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 5 language

Time for action - Adding an MVA to the index

What just happened?

We added a new option in source configuration called as sql_attr_multi. This option is used to declare the Multi valued attribute (MVA). This attribute only applies to SQL source types.

sql_attr_multi allows us to attach more than one value to an attribute. The format in which this option is declared is as follows:

sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]

The option's parameters are as follows:

ATTR-TYPE: Is uint or timestamp
SOURCE-TYPE: Is 'field', 'query', or 'ranged-query'
QUERY: Is an SQL query executed to fetch all docid, attribute value pairs
RANGE-QUERY: Is SQL query used to fetch min and max ID values, similar to sql_query_range, which we will see later on

We declared sql_attr_multi as:

sql_attr_multi = uint category_id from query; 
SELECT post_id, category_id FROM posts_categories

This means that category_id is a uint attribute and will hold multiple values as returned by the query. The query that follows will get all the categories associated with the current post, such as the current document ID.

Let's search for all posts with any of the words from the string "web games php" having category "Programming" (category_id 1) and written by Rita Chouhan (author_id 3).

$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -a -f category_id 1 -f author_id 3 web games php

The previous search query returned the document ID 2 as the result. In the search query we used filters on two fields: author_id and category_id, and also used an option a that specifically searched for any word from the given phrase.

The search result also shows us the number of documents Sphinx found for each word and the number of times (hits) each word appeared in those documents.

Filtering without searching for a specific phrase

At times we may want to only filter the results without performing a full-text search. For example: Find all posts with category PHP having any title or content, and written by any author. In such a case we don't have a specific search term, but we only want to filter by the category_id attribute. To achieve this we can issue a search command, as demonstrated in the following screenshot, without passing any search term:

$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 4

Filtering without searching for a specific phrase

As we wanted to search for all posts having category PHP, we didn't pass the search term and just filtered the results by category_id. This gave us two documents with category_id = 4.

xmlpipe data source

xmlpipe data source enables users to implement their own data source drivers. At times we cannot use an SQL data source because the data might be coming from sources such as text files, mailboxes, RSS feeds, and so on. In such cases we can implement a custom driver using xmlpipe data source.

Note

xmlpipe data source is limited to two fixed fields and two fixed attributes, and above all, it is deprecated in favor of the xmlpipe2 data source explained in the next section.

We won't be going into the nitty-gritty of xmlpipe data source, instead, we will see how xmlpipe2 data source works in greater detail.

xmlpipe2 data source

xmlpipe2 is preferred over the older xmlpipe data source. It lets you pass arbitrary full-text and attribute data to Sphinx in a custom XML format. When the data source is configured to use xmlpipe2, indexer executes the given command and opens a pipe to its stdout. A well formed XML stream is then expected as the output of the command.

The XML schema (set of fields and attributes) can be defined either in the configuration file or the XML stream itself.

Indexing with schema defined in XML stream

Let's see how the xmlpipe2 data source is configured and what XML the indexer expects at the stdout of the given command.

Note

We are going to continue with the same blog example as used in the SQL data sources section earlier in this chapter.

Firstly, let's see how to index the posts table without any attributes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Time for action - Adding an MVA to the index

Create new playlist

Sign In

Sign Up

Time for action - Adding an MVA to the index

What just happened?

Filtering without searching for a specific phrase

xmlpipe data source

Note

xmlpipe2 data source

Indexing with schema defined in XML stream

Note

Table of Contents for
Time for action - Adding an MVA to the index