/usr/local/sphinx/etc/properties.conf
with the following content:source properties-source { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_properties sql_query_range = SELECT MIN(id), MAX(id) FROM properties sql_range_step = 1000 sql_query = SELECT id, type, transaction_type, description, price, city_id, bedrooms, bathrooms, area, address, zip_code, built_year, (latitude * PI() / 180) AS latitude, (longitude * PI() / 180) AS longitude, UNIX_TIMESTAMP(date_added) AS date_added FROM properties WHERE id >= $start AND id <= $end sql_attr_uint = type sql_attr_uint = transaction_type sql_attr_uint = price sql_attr_uint = city_id sql_attr_uint = bedrooms sql_attr_uint = bathrooms sql_attr_float = area sql_attr_uint = built_year sql_attr_float = latitude sql_attr_float = longitude sql_attr_timestamp = date_added sql_attr_multi = uint amenity_id from query; SELECT property_id, amenity_id FROM amenities_properties } index properties { source = properties-source path = /usr/local/sphinx/var/data/properties charset_type = utf-8 } indexer { mem_limit = 32M } searchd { listen = localhost:9312 log = /usr/local/sphinx/var/log/searchd.log query_log = /usr/local/sphinx/var/log/query.log max_children = 30 pid_file = /usr/local/sphinx/var/log/searchd.pid }
indexer
command to create the index (as root):/usr/local/sphinx/bin/indexer -c /usr/local/sphinx/etc/properties.conf --all
We created the Sphinx configuration file and defined the source
and indexer
in it. We are using a MySQL database for this application and defined the source
options accordingly.
For the first time ever, we used the following options to configure the source. Let's have a look at what each of these options are:
sql_query_range:
Is used for ranged document queries. Ranged queries are very useful when indexing lots of data, that is, when the number of records to index goes into millions. This option takes a query that must fetch the minimum and maximum document IDs which will be used as range boundaries. It must return exactly two integer fields in the same order.It is because of the ranged query that Sphinx populates two macros, $start
and $end
, and we need to use: - in sql_query
, which fetches the actual data. These two macros help in setting up the right conditions to fetch the documents.
sql_range_step:
Specifies the steps in document IDs interval. For example, if min and max IDs returned by the sql_query_range
are 55 and 150 respectively, and if the sql_range_step
is 40, indexer
will call sql_query
three times with the following substitutions for the $start
and $end
macros:$start
= 55, $end
= 94$start
= 95, $end
= 134$start
= 135, $end
= 150So our data gets fetched in three queries instead of one. This is useful if you have lots of records and fetching them all at once may lock MyISAM tables for longer periods, or reduce the performance because of big result sets.
We fetched all data related to properties in sql_query.
The geo coordinates, latitude and longitude, are converted from decimal to radians in the query itself. We divide the decimal value by 180/pi
. Sphinx needs them in radians to perform a geo location search (explained more clearly in Geo distance search later in this chapter).
sql_attr_float
was used to specify the attribute with floating point value. We defined amenity as an MVA because each property can have one or more associated amenities.The remaining blocks in the configuration file are pretty straightforward and we had used the same settings in our earlier applications.