sphinx-blog.conf
file to add/modify the options as highlighted next:source blog
{
type = mysql
sql_host = localhost
sql_user = root
sql_pass =
sql_db = myblog
sql_query =
SELECT id, title, content, UNIX_TIMESTAMP(publish_date)
AS publish_date, author_id FROM posts
sql_attr_uint = author_id
sql_attr_multi = uint category_id from query;
SELECT post_id, category_id FROM posts_categories
sql_attr_timestamp = publish_date
sql_query_info = SELECT id, title FROM posts WHERE ID=$id
}
index posts
{
source = blog
path = /usr/local/sphinx/var/data/blog
docinfo = extern
charset_type = sbcs
}
indexer
{
mem_limit = 32M
}
indexer
again to re-index the data:$ /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx-blog.conf all
(category_id = 5):
$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 5 language
We added a new option in source configuration called as sql_attr_multi
. This option is used to declare the Multi valued attribute (MVA). This attribute only applies to SQL source types.
sql_attr_multi
allows us to attach more than one value to an attribute. The format in which this option is declared is as follows:
sql_attr_multi = ATTR-TYPE ATTR-NAME 'from' SOURCE-TYPE [;QUERY] [;RANGE-QUERY]
The option's parameters are as follows:
ATTR-TYPE:
Is uint
or timestamp
SOURCE-TYPE:
Is 'field', 'query', or 'ranged-query'QUERY:
Is an SQL query executed to fetch all docid, attribute
value pairsRANGE-QUERY:
Is SQL query used to fetch min and max ID values, similar to sql_query_range
, which we will see later onWe declared sql_attr_multi
as:
sql_attr_multi = uint category_id from query; SELECT post_id, category_id FROM posts_categories
This means that category_id
is a uint
attribute and will hold multiple values as returned by the query. The query that follows will get all the categories associated with the current post, such as the current document ID.
Let's search for all posts with any of the words from the string "web games php" having category "Programming" (category_id 1)
and written by Rita Chouhan (author_id 3)
.
$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -a -f category_id 1 -f author_id 3 web games php
The previous search query returned the document ID 2
as the result. In the search query we used filters on two fields: author_id
and category_id
, and also used an option a that specifically searched for any word from the given phrase.
The search result also shows us the number of documents Sphinx found for each word and the number of times (hits) each word appeared in those documents.
At times we may want to only filter the results without performing a full-text search. For example: Find all posts with category PHP having any title or content, and written by any author. In such a case we don't have a specific search term, but we only want to filter by the category_id
attribute. To achieve this we can issue a search command, as demonstrated in the following screenshot, without passing any search term:
$ /usr/local/sphinx/bin/ search --config /usr/local/sphinx/etc/sphinx-blog.conf -f category_id 4
As we wanted to search for all posts having category PHP, we didn't pass the search term and just filtered the results by category_id
. This gave us two documents with category_id = 4
.
xmlpipe data source enables users to implement their own data source drivers. At times we cannot use an SQL data source because the data might be coming from sources such as text files, mailboxes, RSS feeds, and so on. In such cases we can implement a custom driver using xmlpipe data source.
We won't be going into the nitty-gritty of xmlpipe data source, instead, we will see how xmlpipe2 data source works in greater detail.
xmlpipe2 is preferred over the older xmlpipe data source. It lets you pass arbitrary full-text and attribute data to Sphinx in a custom XML format. When the data source is configured to use xmlpipe2, indexer
executes the given command and opens a pipe to its stdout. A well formed XML stream is then expected as the output of the command.
The XML schema (set of fields and attributes) can be defined either in the configuration file or the XML stream itself.