Sphinx Search Beginner's Guide

By Abbas Ali
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Free Chapter
    Setting Up Sphinx
About this book

Sphinx is an open-source full-text search server, designed from the ground up with a focus on performance, relevance, and integration simplicity. With Sphinx, you can either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as a database server.

Sphinx Search Beginner's Guide serves as a guide to everything you need to know about running a Sphinx Search Engine. In today's world, search is an integral part of any application and having a reliable search engine like Sphinx Search can be the difference between running a successful and unsuccessful business. What good is being on the Web if no one knows you are there? It is easy to build a proficient search engine with Sphinx Search Beginner's Guide to hand.

This practical guide provides insight into one of the most popular open source search engines, Sphinx. You will learn how to set up Sphinx on your own server, with the basics of how a search engine works explained in brief. You will learn how to create an index using Sphinx and then perform a search on that index using the client API, as well as learning how to configure Sphinx to get the most out of it. You will also be shown how Sphinx gives quality search results by relevance ranking. The book will help a beginner in all possible ways to create both simple and complex search forms in their applications. It's easy to use Sphinx Search engine, when you have the Sphinx Search Beginner's Guide to hand.

Publication date:
March 2011
Publisher
Packt
Pages
244
ISBN
9781849512541

 

Chapter 1. Setting Up Sphinx

Search is by far the most important feature of an application where data is stored and retrieved. If it hadn't been for search, Google wouldn't exist, so we can imagine the importance of search in the computing world.

Search can be found in the following types of applications:

  • Desktop applications: Where you are the primary, and most often, the only user

  • Web applications: Where the application or website is used and visited by many users

For desktop applications, search is a quick way of locating files. Most desktop applications are not data-oriented, that is, they are not meant to organize and display information. They are rather meant to perform certain tasks, making search a secondary feature.

When using a web application, more often than not, the search becomes a means to navigate the website and look for things that we are interested in, things which are otherwise hidden deep inside the site's structure. Search becomes more important if the web application is full of rich-text content such as blogs, articles, knowledge bases, and so on; where a user needs the search functionality to find a particular piece of information.

In this chapter we will:

  • Discuss different ways to search for data

  • See how Sphinx helps us in achieving our goal

  • Learn how to install Sphinx

So let's get on with it...

 

What you need to know


For this chapter, it is important that you know basic Linux commands (if you intend to install sphinx on a Linux machine). If you use Windows then you should have a basic idea of how to install programs in Windows.

 

Different ways of performing a search


Searching can be done in different ways but here we will take a look at the two most commonly used methods.

Searching on a live database

Whenever your application is dealing with some kind of data, a database is generally involved. There are many databases (both free and commercial) available in the market. Here are a few of the free and open source database servers available:

  • MySQL

  • PostgreSQL

  • SQLite

    Note

    We will be using MySQL throughout this book since Sphinx supports MySQL by default, and it's also the most popular database when it comes to web development.

A live database is one that is actively updated with the latest version of data. At times you may use one database for reading and another for writing, and in such cases you will sync both the databases occasionally. We cannot call such a database 'live', because when reading from one database, while data is being written to the other database, you won't be reading the latest data.

On the other hand, whenever reading from and writing to the database takes place in real-time, we call it a live database.

Let's take an example to understand how search works in the case of a live database.

Assume that we have two database tables in our MySQL database:

  • users

  • addresses

The users table holds data such as your name, e-mail, and password. The addresses table holds the addresses belonging to users. Each user can have multiple addresses. So the users and the addresses table are related to each other.

Let's say we want to search for users based on their name and address. The entered search term can be either the name or part of the address. While performing a search directly on the database, our MySQL query would look something like:

SELECT u.id, u.name
FROM users
AS u LEFT JOIN addresses AS a ON u.id = a.user_id
WHERE u.name LIKE '%search_term%'
OR a.address LIKE '%search_term%' GROUP BY u.id;

The given query will directly search the specified database tables and get the results. The main advantage of using this approach is that we are always performing a search on the latest version of the available data. Hence, if a new user's data has been inserted just before you initiated the search, you will see that user's data in your search results if it matches your search query.

However, one major disadvantage of this approach is that an SQL query to perform such a search is fired every time a search request comes in, and this becomes an issue when the number of records in the users table increases. With each search query, two tables are joined. This adds overhead and further hinders the performance of the query.

Searching an index

In this approach, a query is not fired directly on a database table. Rather, an index is created from the data stored in the database. This index contains data from all the related tables. The index can itself be stored in a database or on a file system.

The advantage of using this approach is that we need not join tables in SQL queries each time a search request comes in, and the search request would not scan every row stored in the database. The search request is directed towards the index which is highly optimized for searching.

The disadvantage would be the additional storage required to store the index and the time required to build the index. However, these are traded off for the time saved during an actual search request.

 

Sphinx—a full-text search engine


No, we will not discuss The Great Sphinx of Giza here, we're talking about the other Sphinx, popular in the computing world. Sphinx stands for SQL Phrase Index.

Sphinx is a full-text search engine (generally standalone) which provides fast, relevant, efficient full-text search functionality to third-party applications. It was especially created to facilitate searches on SQL databases and integrates very well with scripting languages; such as PHP, Python, Perl, Ruby, and Java.

At the time of writing this book, the latest stable release of Sphinx was v0.9.9.

Features

Some of the major features of Sphinx include (taken from http://sphinxsearch.com):

  • High indexing speed (up to 10 MB/sec on modern CPUs)

  • High search speed (average query is under 0.1 sec on 2 to 4 GB of text collection)

  • High scalability (up to 100 GB of text, up to 100 Million documents on a single CPU)

  • Supports distributed searching (since v.0.9.6)

  • Supports MySQL (MyISAM and InnoDB tables are both supported) and PostgreSQL natively

  • Supports phrase searching

  • Supports phrase proximity ranking, providing good relevance

  • Supports English and Russian stemming

  • Supports any number of document fields (weights can be changed on the fly)

  • Supports document groups

  • Supports stopwords, that is, that it indexes only what's most relevant from a given list of words

  • Supports different search modes ("match extended", "match all", "match phrase" and "match any" as of v.0.9.5)

  • Generic XML interface which greatly simplifies custom integration

  • Pure-PHP (that is, NO module compiling and so on) search client API

A brief history

Back in 2001, there weren't many good solutions for searching in web applications. Andrew Aksyonoff, a Russian developer, was facing difficulties in finding a search engine with features such as good search quality (relevance), high searching speed, and low resource requirements - for example, disk usage and CPU.

He tried a few available solutions and even modified them to suit his needs, but in vain. Eventually he decided to come up with his own search engine, which he later named Sphinx.

After the first few releases of Sphinx, Andrew received good feedback from users. Over a period of time, he decided to continue developing Sphinx and founded Sphinx Technologies Inc.

Today Andrew is the primary developer for Sphinx, along with a few others who joined the wagon. At the time of writing, Sphinx was under heavy development, with regular releases.

License

Sphinx is a free and open source software which can be distributed or modified under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 2 or any later version.

However, if you intend to use or embed Sphinx in a project but do not want to disclose the source code as required by GPL, you will need to obtain a commercial license by contacting Sphinx Technologies Inc. at http://sphinxsearch.com/contacts.html

 

Installation


Enough talking, let's get on to some real action. The first step is to install Sphinx itself.

System requirements

Sphinx was developed and tested mostly on UNIX based systems. All modern UNIX based operating systems with an ANSI compliant compiler should be able to compile and run Sphinx without any issues. However, Sphinx has also been found running on the following operating systems without any issues.

  • Linux (Kernel 2.4.x and 2.6.x of various distributions)

  • Microsoft Windows 2000 and XP

  • FreeBSD 4.x, 5.x, 6.x

  • NetBSD 1.6, 3.0

  • Solaris 9, 11

  • Mac OS X

Note: The Windows version of Sphinx is not meant to be used on production servers. It should only be used for testing and debugging. This is the primary reason that all examples given in this book will be for Linux-based systems.

Sphinx on a Unix-based system

If you intend to install Sphinx on a UNIX based system, then you need to check the following:

  • C++ compiler (GNU GCC works fine)

  • A make program (GNU make works fine)

  • The XML libraries libexpat1 (name may be different on non Ubuntu distro) and libexpat1-dev (If you intend to use the xmlpipe2 data source)

 

Time for action - installation on Linux


  1. 1. Download the latest stable version of the sphinx source from http://sphinxsearch.com/downloads.html.

  2. 2. Extract it anywhere on your file system and go inside the extracted sphinx directory:

    $ tar -xzvf sphinx-0.9.9.tar.gz
    $ cd sphinx-0.9.9
  3. 3. Run the configure utility:

    $ ./configure --prefix=/usr/local/sphinx
  4. 4. Build from the source:

    $ make

    Note

    It will take a while after you run the make command as it builds the binaries from the source code.

  5. 5. Install the application (run as root):

    $ make install

What just happened?

We downloaded the latest release of Sphinx and extracted it using the tar command. We then ran the configure command which gets the details of our machine and also checks for all dependencies. If any of the dependency is missing, it will throw an error. We will take a look at possible dependency issues in a while.

Once we are done with configure, the make command will build (compile) the source code. After that, make install will actually install the binaries to respective location as specified in --prefix option to the configure.

Options to the configure command

There are many options that can be passed to the configure command but we will take a look at a few important ones:

  • --prefix=/path: This option specifies the path to install the sphinx binaries. In this book it is assumed that sphinx was configured with --prefix=/usr/local/sphinx so it is recommended that you configure your path with the same prefix.

  • --with-mysql=/path: Sphinx needs to know where to find MySQL's include and library files. It auto-detects this most of the time but if for any reason it fails, you can supply the path here.

  • --with-pgsql=/path: Same as -with-mysql but for PostgreSQL.

Most of the common errors you would find while configuring sphinx are related to missing MySQL include files.

This can be caused either because Sphinx's auto detection for MySQL include path failed, or MySQL's devel package has not been installed on your machine. If MySQL's devel package is not installed, you can install it using the Software Package Manager (apt or yum) of your operating system. In case of Ubuntu, the package is called libmysqlclient16-dev.

Note

If you intend to use Sphinx without MySQL then you can use the configure option --without-mysql.

You need to follow pretty much the same steps if PostgreSQL include files are missing. In this book we will be primarily using MySQL for all examples.

Known issues during installation

Listed next are a few errors or issues that may arise during Sphinx's installation make can sometimes fail with the following error:

/bin/sh: g++: command not found
make[1]: *** [libsphinx_a-sphinx.o] Error 127

This may be because of a missing gcc-c++ package. Try installing it.

At times you might get compile-time errors like:

sphinx.cpp:67: error: invalid application of `sizeof' to
incomplete type `Private::SizeError<false>'

To fix the above error try editing sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t.

#define STDOUT_FILENO fileno(stdout)
#else
typedef DWORD SphOffset_t;
#endif

One drawback of doing this would be that you won't be able to use full-text indexes larger than 2 GB.

Sphinx on Windows

Installing on a Windows system is easier than on a Linux system as you can use the pre-compiled binaries.

 

Time for action - installation on Windows


  1. 1. Download the Win32 binaries of Sphinx from http://www.sphinxsearch.com/downloads.html. Choose the binary depending on whether you want MySQL support, or PostgreSQL support, or both.

  2. 2. Extract the downloaded ZIP to any suitable location. Let's assume it is extracted to C:\>sphinx.

  3. 3. Install the searched system as a Windows service by issuing the following command in the Command Prompt:

    C:\sphinx\bin\searchd -install -config C:\sphinx\sphinx.conf -servicename SphinxSearch
    

    This will install searchd as a service but it won't be started yet. Before starting the Sphinx service we need to create the sphinx.conf file and create indexes. This will be done in the next few chapters.

What just happened?

Installing Sphinx on windows is a straight-forward task. We have pre-compiled binaries for the windows platform, which can be used directly.

After extracting the ZIP, we installed the Sphinx service. We need not install anything else since binaries for indexer and search are readily available in the C:\sphinx\bin directory.

The use of binaries to create indexes and the use of the searchd service to search will be covered in the next few chapters.

Note

At the time of writing this book, the Windows version of Sphinx is not meant to be used in production environment. It is highly recommended to use the Linux version of Sphinx in your production environment.

Sphinx on Mac OS X

Installation on a Mac is very similar to how it is done on Linux systems. You need to build it from source and then install the generated binaries.

 

Time for action - installation on a Mac


  1. 1. Download the latest stable version of the sphinx source from http://sphinxsearch.com/downloads.html.

    $ tar -xzvf sphinx-0.9.9.tar.gz
    $ cd sphinx-0.9.9
    
  2. 2. Run the configure utility:

    $ ./configure -prefix=/usr/local/sphinx
    
  3. 3. If you are on a 64 bit Mac then use the following command to configure:

    LDFLAGS="-arch x86_64" ./configure --prefix=/usr/local/sphinx
    $ make
    $ sudo make install
    
  4. 4. Next, run the make command:

    $ make
    
  5. 5. Finally, run the following command to complete your configuration:

    $ sudo make install
    

What just happened?

We downloaded the Sphinx source and extracted it using the tar command. We then configured Sphinx and built it using the make command. The options to configure are the same as we used while installing Sphinx in Linux.

The only notable difference between installation on Linux and Mac is that if your Mac is 64 bit, your configure command is changed slightly as given above.

Other supported systems

Above we learned how to install Sphinx on Linux, Windows, and Mac. However, these are not the only systems on which Sphinx can be installed. Sphinx is also supported on the following systems:

  • FreeBSD 4.x, 5.x, 6.x

  • NetBSD 1.6, 3.0

  • Solaris 9, 11

    Note

    Installation procedure for the above mentioned systems is more or less similar to how it is done on a Linux system.

 

Summary


In this chapter:

  • We saw the different ways to perform search

  • We got to know about Sphinx and how it helps in performing searches

  • We took a look at some of Sphinx's features and its brief history

  • We learned how to install Sphinx on different operating systems

By now you should have installed Sphinx on your system and laid the foundation for Chapter 2, Getting Started, where we will get started with Sphinx and some basic usage.

About the Author
  • Abbas Ali

    Abbas Ali has over 15 years of experience in Web Development and is a Zend Certified PHP 5 Engineer. A Mechanical Engineer by education, Abbas turned to software development just after finishing his engineering degree. He is a member of the core development team for the Coppermine Photo Gallery, an open source project which is one of the most popular photo gallery applications in the world. Fascinated with both machines and knowledge, Abbas is always learning new programming techniques and technologies. Amongst various technologies, some of his favorites are Laravel, VueJS, and Sphinx. He got acquainted with Sphinx in 2009 and has been using it in most of his commercial projects ever since. He loves open source and believes in contributing back to the community. Abbas is married to Tasneem and has two daughters, Munira and Zahra. He has lived in Nagpur (India) all his life and is in no rush to move to any other city in the world. In his free time he loves to watch movies and television. He is also an amateur photographer and cricketer. Abbas founded Ranium Systems, a web and mobile development company in 2012 and is currently working as its Chief Executive Officer. The company specializes in development of enterprise level, high performance and scalable web and mobile applications

    Browse publications by this author
Latest Reviews (1 reviews total)
Sphinx Search Beginner's Guide
Unlock this book and the full library FREE for 7 days
Start now