Time for action - creating indexes for distributed searching

  1. Create the configuration file on the first server (192.168.1.1) at /usr/local/sphinx/etc/sphinx-distributed.conf with the following content:
    source items
    {
    type = mysql
    sql_host = localhost
    sql_user = root
    sql_pass =
    sql_db = sphinx_conf
    # Query to set MySQL variable @total
    #which holds total num of rows
    sql_query_pre = SELECT @total := count(id) FROM items
    # Set a variable to hold the sql query
    # We are using CONCAT to use a variable in limit clause
    # which is not possible in direct query execution
    sql_query_pre = SET @sql = CONCAT('SELECT * FROM items 
    limit 0,', CEIL(@total/2))
    # Prepare the sql statement
    sql_query_pre = PREPARE stmt FROM @sql
    # Execute the prepared statement. This will return rows
    sql_query = EXECUTE stmt
    # Once documents are fetched, drop the prepared statement
    sql_query_post = DROP PREPARE stmt
    sql_attr_timestamp = created
    }
    index items
    {
    source = items
    path = /usr/local/sphinx/var/data/items-distributed
    charset_type = utf-8
    }
    
  2. Run the indexer command to index the data:
    $ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-distributed.conf items
    
    Time for action - creating indexes for distributed searching
  3. Create the sphinx configuration on the second server (192.168.1.2) at /usr/local/sphinx/etc/sphinx-distributed-2.conf with the following content:
    source items
    {
    type = mysql
    # we will use remote host (first server)
    sql_host = 192.168.1.1
    sql_user = root
    sql_pass =
    sql_db = sphinx_conf
    # Query to set MySQL variable @total
    # which holds total num of rows
    sql_query_pre = SELECT @total := count(id) FROM items
    # Set a variable to hold the sql query
    # We are using CONCAT to use a variable in limit clause
    # which is not possible in direct query execution
    sql_query_pre = SET @sql = CONCAT('SELECT * FROM items 
    limit ', CEIL(@total/2), ',', CEIL(@total/2))
    # Prepare the sql statement
    sql_query_pre = PREPARE stmt FROM @sql
    # Execute the prepared statement. This will return rows
    sql_query = EXECUTE stmt
    # Once documents are fetched, drop the prepared statement
    sql_query_post = DROP PREPARE stmt
    sql_attr_timestamp = created
    }
    index items-2
    {
    source = items
    path = /usr/local/sphinx/var/data/items-2-distributed
    charset_type = utf-8
    }
    
  4. Run the indexer on the second server:
    $ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-distributed-2.conf items-2
    
    Time for action - creating indexes for distributed searching

What just happened?

Firstly, we created a configuration on our primary server. This same server has the database and will also host the distributed searching index. However, initially we defined a normal index which uses an SQL source for indexing.

In our source definition we fetched only half the rows from the items table. For this we fired a few pre-queries to set the limit and prepare an SQL statement. The following are an explanation of each pre-query:

  • First pre-query: Selects the total number of rows from the items table and puts the count in a variable @total.
  • Second pre-query: Sets a string variable holding the query to be executed. The @total variable is concatenated after the limit. The query string will look like—SELECT * FROM items LIMIT 0,101.
  • Third pre-query: Prepares the statement to be executed based on the query string formed above.

The statement is then executed in the main query, returning the required number of rows. In my case, I had 201 rows in the items table. The index created on the primary (192.168.1.1) server will contain the first 101 rows (as shown in the output of the indexer command).

We then ran the indexer thus creating an index with half the amount of data. The remaining half of the rows will be indexed on the second server.

Next, we created a similar configuration file on the second server (192.168.1.2). Since both servers will use the same database, we configured the source on the second server to use the database on the primary server . We did this by specifying 192.168.1.1 as the value to the host option.

Note

To connect to the MySQL server on 192.168.1.1 from 192.168.1.2, proper privileges should be assigned on 192.168.1.1. This can easily be done using a tool such as phpMyAdmin.

We used the same trick again(as in the configuration of the first server) to get the second half of the rows. The final query that got executed for fetching the data was SELECT * FROM items LIMIT 101, 101.

Tip

Use MySQL stored procedure

Instead of writing so many pre-queries, we could have created a stored procedure to fetch the data in the main query. The stored procedure would set the @total variable and prepare the statement, execute it, and return the results.

So, we created indexes on two different servers. The first server has half the data indexed and the second server has the remaining data. Now let's proceed and create a distributed index that will use two indexes to perform a search.

Set up the distributed index on the primary server

The next step is to add a distributed index to the configuration file of the primary server. Let's do it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset