Time for action - creating a configuration with advanced source options

  1. Create a database (or use an existing one) with the following structure and data:
    CREATE TABLE `items` (
    `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
    `title` VARCHAR( 255 ) NOT NULL ,
    `content` TEXT NOT NULL ,
    `created` DATETIME NOT NULL
    ) ENGINE = MYISAM ;
    CREATE TABLE `last_indexed` (
    `id` INT NOT NULL
    ) ENGINE = MYISAM ;
    INSERT INTO `last_indexed` (
    `id`
    )
    VALUES (
    '0'
    );
    
  2. Add a few rows to the items table so that we get some data to index.
  3. Create the Sphinx configuration file /usr/local/sphinx/etc/sphinx-src-opt.conf with the following content:
    source items
    {
    type = mysql
    sql_host = localhost
    sql_user = root
    sql_pass =
    sql_db = sphinx_conf
    # Set the charset of returned data to utf8
    sql_query_pre = SET NAMES utf8
    # Turn of the query cache
    sql_query_pre = SET SESSION query_cache_type = OFF
    sql_query_range = SELECT MIN(id), MAX(id) FROM items 
    WHERE id >= (SELECT id FROM last_indexed)
    sql_range_step = 200
    # Pause for 1000 millisecond (1 sec) between each ranged fetch
    sql_ranged_throttle = 1000
    sql_query = SELECT id, title, content, created FROM 
    items WHERE id > (SELECT id FROM last_indexed) 
    AND id >= $start AND id <= $end
    # Update the last indexed which will be $maxid
    sql_query_post_index = UPDATE last_indexed SET id = $maxid
    sql_attr_timestamp = created
    }
    index items
    {
    source = items
    path = /usr/local/sphinx/var/data/items
    charset_type = utf-8
    }
    
  4. Run the indexer command:
    $ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-src-opt.conf --all
    
    Time for action - creating a configuration with advanced source options

What just happened?

We just created two database tables:

  • items—to hold the actual data
  • last_indexed—to hold the id of last indexed item

We then created a configuration file with options, so that main data is fetched in steps of 200, and there is a pause of 1,000 milliseconds (1 second) between each step. This can be easily verified by the output of the indexer command, which shows the time taken as 2.013 seconds. In my items table I had 201 rows, so the indexer paused once before starting for the first time and then after the first 200 records.

We used pre-query to set the encoding that the server will use for the rows it returns. We also turned off query caching in pre-query since indexer is not going to run frequently.

Lastly, we used sql_query_post_index to increment the id of the last indexed document in the last_indexed database table. This is useful for the next indexing, where the main query only fetches those rows which are not already indexed.

MS SQL specific options

The following are a few MS SQL server-specific options

mssql_winauth

This option specifies whether or not to use the account credentials of a currently logged in windows user. This is an optional Boolean option and its default value is 0.

mssql_unicode

This option specifies whether to ask for Unicode or single byte data when querying the MS SQL server. This option must be same as specified in charset_type option in the index section of the configuration file.

Note

To index Unicode data, you must set charset_type in the index section to utf-8 and mssql_unicode to 1 in the source section.

This is an optional Boolean option and its default value is 0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset