192.168.1.1
) at /usr/local/sphinx/etc/sphinx-distributed.conf
with the following content:source items { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_conf # Query to set MySQL variable @total #which holds total num of rows sql_query_pre = SELECT @total := count(id) FROM items # Set a variable to hold the sql query # We are using CONCAT to use a variable in limit clause # which is not possible in direct query execution sql_query_pre = SET @sql = CONCAT('SELECT * FROM items limit 0,', CEIL(@total/2)) # Prepare the sql statement sql_query_pre = PREPARE stmt FROM @sql # Execute the prepared statement. This will return rows sql_query = EXECUTE stmt # Once documents are fetched, drop the prepared statement sql_query_post = DROP PREPARE stmt sql_attr_timestamp = created } index items { source = items path = /usr/local/sphinx/var/data/items-distributed charset_type = utf-8 }
indexer
command to index the data:$ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-distributed.conf items
/usr/local/sphinx/etc/sphinx-distributed-2.conf
with the following content:source items { type = mysql # we will use remote host (first server) sql_host = 192.168.1.1 sql_user = root sql_pass = sql_db = sphinx_conf # Query to set MySQL variable @total # which holds total num of rows sql_query_pre = SELECT @total := count(id) FROM items # Set a variable to hold the sql query # We are using CONCAT to use a variable in limit clause # which is not possible in direct query execution sql_query_pre = SET @sql = CONCAT('SELECT * FROM items limit ', CEIL(@total/2), ',', CEIL(@total/2)) # Prepare the sql statement sql_query_pre = PREPARE stmt FROM @sql # Execute the prepared statement. This will return rows sql_query = EXECUTE stmt # Once documents are fetched, drop the prepared statement sql_query_post = DROP PREPARE stmt sql_attr_timestamp = created } index items-2 { source = items path = /usr/local/sphinx/var/data/items-2-distributed charset_type = utf-8 }
indexer
on the second server:$ /usr/local/sphinx/bin/indexer c /usr/local/sphinx/etc/sphinx-distributed-2.conf items-2
Firstly, we created a configuration on our primary server. This same server has the database and will also host the distributed searching index. However, initially we defined a normal index which uses an SQL source for indexing.
In our source definition we fetched only half the rows from the items
table. For this we fired a few pre-queries to set the limit and prepare an SQL statement. The following are an explanation of each pre-query:
items
table and puts the count in a variable @total
. @total
variable is concatenated after the limit. The query string will look like—SELECT * FROM items LIMIT 0,101
.The statement is then executed in the main query, returning the required number of rows. In my case, I had 201 rows in the items
table. The index created on the primary (192.168.1.1) server will contain the first 101 rows (as shown in the output of the indexer
command).
We then ran the indexer
thus creating an index with half the amount of data. The remaining half of the rows will be indexed on the second server.
Next, we created a similar configuration file on the second server (192.168.1.2
). Since both servers will use the same database, we configured the source on the second server to use the database on the primary server . We did this by specifying 192.168.1.1
as the value to the host
option.
We used the same trick again(as in the configuration of the first server) to get the second half of the rows. The final query that got executed for fetching the data was SELECT * FROM items LIMIT 101, 101
.
So, we created indexes on two different servers. The first server has half the data indexed and the second server has the remaining data. Now let's proceed and create a distributed index that will use two indexes to perform a search.