Data source configuration

The source section is used to define the data source in the configuration file. We learned about data sources in Chapter 3, Indexing. Now let's see different configuration options that can be specified in the source section of the configuration file.

Note

In this chapter, we will only see those options that are used more often than others and were not already covered in earlier chapters. For complete reference please visit http://sphinxsearch.com/docs/manual-0.9.9.html#conf-reference.

SQL related options

We have already seen how to use the basic options; such as sql_host, sql_user, sql_pass, and sql_db. There are a few more options that you may need sooner or later.

Connection options

The following options can be used to establish the database connection.

sql_port

If you are using a non-standard port for your database server, then this option is used to specify that port. The default values are; 3306 for mysql source type and 5432 for pgsql type.

sql_sock

For local database server, sql_host = localhost, you can specify a UNIX socket name. The default value is empty, which means that it will take the value from client library settings.

odbc_dsn

The DSN string used in ODBC connections. This only applies to odbc source types and is mandatory if that source type is being used.

Let's create a few sample configuration files with different connection settings.

Here's a sample source configuration for MySQL with local server:

source src {
type = mysql
sql_host = localhost
sql_user = abbas
sql_pass = passwd
sql_db = mydb
sql_sock = /my/custom/path/mysql.sock
}

As shown in the example, at times we may need to explicitly direct Sphinx to where we want it to look for the socket file.

Note

The MySQL client library will only connect over a UNIX socket if the host is specified as "localhost". If you have specified a remote host or 127.0.0.1, then the connection is established over TCP/IP and sql_sock is ignored. A UNIX socket connection is faster than TCP/IP.

Here is a sample with remote host:

source src {
type = mysql
sql_host = 192.168.1.5
sql_user = abbas
sql_pass = passwd
sql_db = mydb
sql_port = 3006
}

We used a remote host and specified the IP address. In addition, the remote SQL server is running on a non-default port and, as a result, we were needed to specify the port in sql_port.

Note

sql_port is not mandatory if using remote host. It should only be used if the host is running on a non-default (3306) port.

Options to fetch data (SQL data source)

In earlier chapters, we have seen the following options to fetch the data when using an SQL data source:

  • sql_query: Main query to fetch the data to be indexed.
  • sql_query_range: Used for ranged document fetches. This query must return the maximum and minimum document IDs.
  • sql_range_step: Specified the range query steps.

Now let's see some other advanced options.

sql_query_pre

The query specified against this option is executed before executing the main sql_query. This query is called as pre-fetch query.

There can be multiple sql_query_pre in a source definition. If more than one pre-fetch query is specified, then they are executed in the order of their appearance in the configuration file.

All results returned by sql_query_pre are ignored. pre query is generally used to set encoding that the server will use for the rows that it returns, and this encoding must be the same as specified in the index section of the Sphinx configuration file.

Another common use of the pre query is to mark the records that are going to be indexed, or to update some internal counter. If any errors are returned by the pre-fetch query, they are reported as errors and indexing is terminated.

sql_query_post

This query gets executed immediately after the main sql_query completes successfully. This query is called a post-fetch query. If any errors are returned by this query, they are reported as warnings and indexing is not terminated.

As with sql_query_pre, post-fetch query's results are also ignored.

Note

sql_query_post should not be used to make any permanent updates to the database. This is because, when this query is executed, indexing is still not complete and it may fail at a later point.

sql_query_post_index

This query is executed after indexing is completed successfully. This query is called a post-index query. Any errors produced by this query are reported as warnings and indexing is not terminated. Results returned by this query are ignored.

A macro, $maxid, can be used in this query, and this macro expands to the maximum document ID that was fetched during the indexing.

sql_ranged_throttle

This is the time period (in milliseconds) for which the indexer should sleep between ranged fetches. This option comes into play only when using sql_range_query. By default no throttling is done.

This option is particularly useful in those cases where the indexer may impose too much load on the database server. A carefully selected throttle period will cause the indexer to sleep for that period of time after each ranged query step.

Configuration file using advanced options

Let's see an example configuration file using the options that we discussed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset