CREATE TABLE `items` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 255 ) NOT NULL , `content` TEXT NOT NULL , `created` DATETIME NOT NULL ) ENGINE = MYISAM ; CREATE TABLE `last_indexed` ( `id` INT NOT NULL ) ENGINE = MYISAM ; INSERT INTO `last_indexed` ( `id` ) VALUES ( '0' );
/usr/local/sphinx/etc/sphinx-src-opt.conf
with the following content:source items { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_conf # Set the charset of returned data to utf8 sql_query_pre = SET NAMES utf8 # Turn of the query cache sql_query_pre = SET SESSION query_cache_type = OFF sql_query_range = SELECT MIN(id), MAX(id) FROM items WHERE id >= (SELECT id FROM last_indexed) sql_range_step = 200 # Pause for 1000 millisecond (1 sec) between each ranged fetch sql_ranged_throttle = 1000 sql_query = SELECT id, title, content, created FROM items WHERE id > (SELECT id FROM last_indexed) AND id >= $start AND id <= $end # Update the last indexed which will be $maxid sql_query_post_index = UPDATE last_indexed SET id = $maxid sql_attr_timestamp = created } index items { source = items path = /usr/local/sphinx/var/data/items charset_type = utf-8 }
indexer
command:$ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-src-opt.conf --all
We just created two database tables:
items—
to hold the actual datalast_indexed—
to hold the id of last indexed itemWe then created a configuration file with options, so that main data is fetched in steps of 200, and there is a pause of 1,000 milliseconds (1 second) between each step. This can be easily verified by the output of the indexer
command, which shows the time taken as 2.013 seconds. In my items
table I had 201 rows, so the indexer
paused once before starting for the first time and then after the first 200 records.
We used pre-query to set the encoding that the server will use for the rows it returns. We also turned off query caching in pre-query since indexer
is not going to run frequently.
Lastly, we used sql_query_post_index
to increment the id of the last indexed document in the last_indexed
database table. This is useful for the next indexing, where the main query only fetches those rows which are not already indexed.
The following are a few MS SQL server-specific options
This option specifies whether or not to use the account credentials of a currently logged in windows user. This is an optional Boolean option and its default value is 0.