The source
section is used to define the data source in the configuration file. We learned about data sources in Chapter 3, Indexing. Now let's see different configuration options that can be specified in the source
section of the configuration file.
In this chapter, we will only see those options that are used more often than others and were not already covered in earlier chapters. For complete reference please visit http://sphinxsearch.com/docs/manual-0.9.9.html#conf-reference.
We have already seen how to use the basic options; such as sql_host, sql_user, sql_pass
, and sql_db
. There are a few more options that you may need sooner or later.
The following options can be used to establish the database connection.
If you are using a non-standard port for your database server, then this option is used to specify that port. The default values are; 3306 for mysql source type and 5432 for pgsql type.
For local database server, sql_host = localhost
, you can specify a UNIX socket name. The default value is empty
, which means that it will take the value from client library settings.
The DSN string used in ODBC connections. This only applies to odbc source types and is mandatory if that source type is being used.
Let's create a few sample configuration files with different connection settings.
Here's a sample source
configuration for MySQL with local server:
source src { type = mysql sql_host = localhost sql_user = abbas sql_pass = passwd sql_db = mydb sql_sock = /my/custom/path/mysql.sock }
As shown in the example, at times we may need to explicitly direct Sphinx to where we want it to look for the socket file.
Here is a sample with remote host:
source src { type = mysql sql_host = 192.168.1.5 sql_user = abbas sql_pass = passwd sql_db = mydb sql_port = 3006 }
We used a remote host and specified the IP address. In addition, the remote SQL server is running on a non-default port and, as a result, we were needed to specify the port in sql_port
.
In earlier chapters, we have seen the following options to fetch the data when using an SQL data source:
Now let's see some other advanced options.
The query specified against this option is executed before executing the main sql_query
. This query is called as pre-fetch query.
There can be multiple sql_query_pre
in a source definition. If more than one pre-fetch query is specified, then they are executed in the order of their appearance in the configuration file.
All results returned by sql_query_pre
are ignored. pre
query is generally used to set encoding that the server will use for the rows that it returns, and this encoding must be the same as specified in the index
section of the Sphinx configuration file.
Another common use of the pre
query is to mark the records that are going to be indexed, or to update some internal counter. If any errors are returned by the pre-fetch query, they are reported as errors and indexing is terminated.
This query gets executed immediately after the main sql_query
completes successfully. This query is called a post-fetch query. If any errors are returned by this query, they are reported as warnings and indexing is not terminated.
As with sql_query_pre
, post-fetch query's results are also ignored.
This query is executed after indexing is completed successfully. This query is called a post-index query. Any errors produced by this query are reported as warnings and indexing is not terminated. Results returned by this query are ignored.
A macro, $maxid
, can be used in this query, and this macro expands to the maximum document ID that was fetched during the indexing.
This is the time period (in milliseconds) for which the indexer
should sleep between ranged fetches. This option comes into play only when using sql_range_query
. By default no throttling is done.
This option is particularly useful in those cases where the indexer
may impose too much load on the database server. A carefully selected throttle period will cause the indexer
to sleep for that period of time after each ranged query step.