In the earlier chapters we dealt with Sphinx and learnt how it works. We created several indexes and wrote different types of search applications. While doing so we saw the most frequently used Sphinx configuration options.
In this chapter, we will see some more configuration options that will allow you to tailor Sphinx to your needs. There are numerous configuration options available to make Sphinx work exactly the way you want it to. All these are defined in the heart of Sphinx, that is, its configuration file.
You're reading from Sphinx Search Beginner's Guide
Sphinx has to be configured before we can start using it to create indexes or search. This is done by creating a special configuration file that Sphinx reads while creating an index and searching. The configuration file can be placed anywhere in the file system. The file contains options written in a special format as follows:
section_type1 name { option11 = value11 option12 = value12 option13 = value13 } section_type2 name { option21 = value21 option22 = value22 option23 = value23 }
Each section has a name and some options, as seen in the previous code snippet. A configuration file can have the following types of sections:
The source
section is used to define the data source in the configuration file. We learned about data sources in Chapter 3, Indexing. Now let's see different configuration options that can be specified in the source
section of the configuration file.
Note
In this chapter, we will only see those options that are used more often than others and were not already covered in earlier chapters. For complete reference please visit http://sphinxsearch.com/docs/manual-0.9.9.html#conf-reference.
We have already seen how to use the basic options; such as sql_host, sql_user, sql_pass
, and sql_db
. There are a few more options that you may need sooner or later.
1. Create a database (or use an existing one) with the following structure and data:
CREATE TABLE `items` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 255 ) NOT NULL , `content` TEXT NOT NULL , `created` DATETIME NOT NULL ) ENGINE = MYISAM ; CREATE TABLE `last_indexed` ( `id` INT NOT NULL ) ENGINE = MYISAM ; INSERT INTO `last_indexed` ( `id` ) VALUES ( '0' );
2. Add a few rows to the items table so that we get some data to index.
3. Create the Sphinx configuration file
/usr/local/sphinx/etc/sphinx-src-opt.conf
with the following content:source items { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_conf # Set the charset of returned data to utf8 sql_query_pre = SET NAMES utf8 # Turn of the query cache sql_query_pre = SET SESSION query_cache_type = OFF sql_query_range = SELECT MIN(id), MAX(id) FROM items \ WHERE id >= (SELECT id FROM last_indexed) sql_range_step = 200...
The next mandatory section of the configuration file is the index
section. This section defines how to index the data and identifies certain properties to look for before indexing the data.
There can be multiple indexes in a single configuration file and an index can extend another index as was done in Chapter 5, Feed Search, when we created a main and delta indexing and searching schemes.
There is another powerful searching scheme that should be used if you are indexing billions of records and terabytes of data. This scheme is called distributed searching.
Distributed searching is useful in searching through a large amount of data, which if kept in one single index would cause high query latency (search time), and will serve a fewer number of queries per second.
In Sphinx, the distribution is done horizontally, that is, a search is performed across different nodes and processing is done in parallel.
To enable distributed searching you need to use type...
1. Create the configuration file on the first server (
192.168.1.1
) at/usr/local/sphinx/etc/sphinx-distributed.conf
with the following content:source items { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_conf # Query to set MySQL variable @total #which holds total num of rows sql_query_pre = SELECT @total := count(id) FROM items # Set a variable to hold the sql query # We are using CONCAT to use a variable in limit clause # which is not possible in direct query execution sql_query_pre = SET @sql = CONCAT('SELECT * FROM items \ limit 0,', CEIL(@total/2)) # Prepare the sql statement sql_query_pre = PREPARE stmt FROM @sql # Execute the prepared statement. This will return rows sql_query = EXECUTE stmt # Once documents are fetched, drop the prepared statement sql_query_post = DROP PREPARE stmt sql_attr_timestamp = created } index items { source = items path = /usr/local/sphinx/var/data/items-distributed...
1. Modify
/usr/local/sphinx/etc/sphinx-distributed.conf
on the primary server (192.168.1.1) and add a new index definition as shown:index master { type = distributed # Local index to be searched local = items # Remote agent (index) to be searched agent = 192.168.1.2:9312:items-2 }
2. Modify the configuration files on both 192.168.1.1 and 192.168.1.2 servers, and add the searchd section as shown:
searchd { log = /usr/local/sphinx/var/log/searchd-distributed.log query_log = /usr/local/sphinx/var/log/query-distributed.log max_children = 30 pid_file = /usr/local/sphinx/var/log/searchd-distributed.pid }
3. Start the
searchd
daemon on the primary server (make sure to stop any previous instance):$ /usrl/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/sphinx-distributed.conf
4. Start the
searchd
daemon on the second server (make sure to stop any previous instance):$ /usrl/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/sphinx-distributed...
1. Create the Sphinx configuration file
/path/to/sphinx-stem.conf
as follows:source items { type = mysql sql_host = localhost sql_user = root sql_pass = sql_db = sphinx_conf sql_query = SELECT id, title, content, created FROM items sql_attr_timestamp = created } index items { source = items path = /usr/local/sphinx/var/data/items-morph charset_type = utf-8 morphology = stem_en }
2. Run the
indexer
command:$/usr/local/sphinx/bin/indexer -c /path/to/sphinx-stem.conf items
3. Search for the word run using the command line
search
utility:$/usr/local/sphinx/bin/search -c /path/to/sphinx-stem.conf run
4. Search for the word running:
$/usr/local/sphinx/bin/search -c /path/to/sphinx-stem.conf running
When searching the index from your application, you will be using Sphinx Client API. This API is accessed using the search daemon, searchd
, that comes bundled with the Sphinx package. We have seen some basic searchd
options in earlier chapters. Now let's elaborate them.
As we have previously observed, this option lets you specify the IP and port where searchd
will listen on. The syntax for listen
is:
listen = ( address ":" port | path ) [ ":" protocol ]
Let's understand this with a few examples:
listen = localhost listen = 192.168.1.1 listen = 9313 listen = domain:9315 listen = /var/run/sphinx.s
In the first example only hostname was specified. In this case
searchd
will listen on your default port, that is9312
.The second example is also similar in that we replaced hostname with IP address.
In the third example we specified only the port number. Thus
searchd
will listen on port9313
on all available interfaces.In the fourth example,
searchd
will listen only...
These set of options are used when running the indexer
command to create indexes.
This option specifies the maximum RAM usage limit that the indexer
will not exceed. The default value is 32M
.
The memory limit can be specified either in bytes, kilo bytes, or mega bytes:
mem_limit = 33554432 # 32 MB
Here's an example (in kilo bytes):
mem_limit = 32768K
Here's an example (in mega bytes):
mem_limit = 32M
This option specifies the maximum IO calls per second for I/O throttling. The default value is 0
, which means unlimited:
max_iops = 50
This option specifies the maximum IO call size in bytes for I/O throttling. The default value is 0, which means unlimited:
max_iosize = 1048576
This option specifies the maximum length of an xmlpipe2
field. Default value is 2M:
max_xmlpipe2_field = 4M
With this we come to the end of configuration options. We left out some options intentionally and those can be referred to in the Sphinx manual (http...
In this chapter we learned:
The basics of creating a Sphinx configuration file
How to configure the data source to use SQL as well as xmlpipe2 sources
How to configure Sphinx for distributed searching
How to use morphology, wordforms, and other data processing options
How to configure the search daemon and get the most out of it