Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Sphinx Search Beginner's Guide
Sphinx Search Beginner's Guide

Sphinx Search Beginner's Guide: Implement full-text search with lightning speed and accuracy using Sphinx

By Abbas Ali
Can$44.99 Can$30.99
Book Mar 2011 244 pages 1st Edition
eBook
Can$44.99 Can$30.99
Print
Can$55.99
Subscription
Free Trial
eBook
Can$44.99 Can$30.99
Print
Can$55.99
Subscription
Free Trial

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Mar 16, 2011
Length 244 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781849512541
Category :
Table of content icon View table of contents Preview book icon Preview Book

Sphinx Search Beginner's Guide

Chapter 1. Setting Up Sphinx

Search is by far the most important feature of an application where data is stored and retrieved. If it hadn't been for search, Google wouldn't exist, so we can imagine the importance of search in the computing world.

Search can be found in the following types of applications:

  • Desktop applications: Where you are the primary, and most often, the only user

  • Web applications: Where the application or website is used and visited by many users

For desktop applications, search is a quick way of locating files. Most desktop applications are not data-oriented, that is, they are not meant to organize and display information. They are rather meant to perform certain tasks, making search a secondary feature.

When using a web application, more often than not, the search becomes a means to navigate the website and look for things that we are interested in, things which are otherwise hidden deep inside the site's structure. Search becomes more important if the web application is full of rich-text content such as blogs, articles, knowledge bases, and so on; where a user needs the search functionality to find a particular piece of information.

In this chapter we will:

  • Discuss different ways to search for data

  • See how Sphinx helps us in achieving our goal

  • Learn how to install Sphinx

So let's get on with it...

What you need to know


For this chapter, it is important that you know basic Linux commands (if you intend to install sphinx on a Linux machine). If you use Windows then you should have a basic idea of how to install programs in Windows.

Different ways of performing a search


Searching can be done in different ways but here we will take a look at the two most commonly used methods.

Searching on a live database

Whenever your application is dealing with some kind of data, a database is generally involved. There are many databases (both free and commercial) available in the market. Here are a few of the free and open source database servers available:

  • MySQL

  • PostgreSQL

  • SQLite

    Note

    We will be using MySQL throughout this book since Sphinx supports MySQL by default, and it's also the most popular database when it comes to web development.

A live database is one that is actively updated with the latest version of data. At times you may use one database for reading and another for writing, and in such cases you will sync both the databases occasionally. We cannot call such a database 'live', because when reading from one database, while data is being written to the other database, you won't be reading the latest data.

On the other hand, whenever reading from and writing to the database takes place in real-time, we call it a live database.

Let's take an example to understand how search works in the case of a live database.

Assume that we have two database tables in our MySQL database:

  • users

  • addresses

The users table holds data such as your name, e-mail, and password. The addresses table holds the addresses belonging to users. Each user can have multiple addresses. So the users and the addresses table are related to each other.

Let's say we want to search for users based on their name and address. The entered search term can be either the name or part of the address. While performing a search directly on the database, our MySQL query would look something like:

SELECT u.id, u.name
FROM users
AS u LEFT JOIN addresses AS a ON u.id = a.user_id
WHERE u.name LIKE '%search_term%'
OR a.address LIKE '%search_term%' GROUP BY u.id;

The given query will directly search the specified database tables and get the results. The main advantage of using this approach is that we are always performing a search on the latest version of the available data. Hence, if a new user's data has been inserted just before you initiated the search, you will see that user's data in your search results if it matches your search query.

However, one major disadvantage of this approach is that an SQL query to perform such a search is fired every time a search request comes in, and this becomes an issue when the number of records in the users table increases. With each search query, two tables are joined. This adds overhead and further hinders the performance of the query.

Searching an index

In this approach, a query is not fired directly on a database table. Rather, an index is created from the data stored in the database. This index contains data from all the related tables. The index can itself be stored in a database or on a file system.

The advantage of using this approach is that we need not join tables in SQL queries each time a search request comes in, and the search request would not scan every row stored in the database. The search request is directed towards the index which is highly optimized for searching.

The disadvantage would be the additional storage required to store the index and the time required to build the index. However, these are traded off for the time saved during an actual search request.

Sphinx—a full-text search engine


No, we will not discuss The Great Sphinx of Giza here, we're talking about the other Sphinx, popular in the computing world. Sphinx stands for SQL Phrase Index.

Sphinx is a full-text search engine (generally standalone) which provides fast, relevant, efficient full-text search functionality to third-party applications. It was especially created to facilitate searches on SQL databases and integrates very well with scripting languages; such as PHP, Python, Perl, Ruby, and Java.

At the time of writing this book, the latest stable release of Sphinx was v0.9.9.

Features

Some of the major features of Sphinx include (taken from http://sphinxsearch.com):

  • High indexing speed (up to 10 MB/sec on modern CPUs)

  • High search speed (average query is under 0.1 sec on 2 to 4 GB of text collection)

  • High scalability (up to 100 GB of text, up to 100 Million documents on a single CPU)

  • Supports distributed searching (since v.0.9.6)

  • Supports MySQL (MyISAM and InnoDB tables are both supported) and PostgreSQL natively

  • Supports phrase searching

  • Supports phrase proximity ranking, providing good relevance

  • Supports English and Russian stemming

  • Supports any number of document fields (weights can be changed on the fly)

  • Supports document groups

  • Supports stopwords, that is, that it indexes only what's most relevant from a given list of words

  • Supports different search modes ("match extended", "match all", "match phrase" and "match any" as of v.0.9.5)

  • Generic XML interface which greatly simplifies custom integration

  • Pure-PHP (that is, NO module compiling and so on) search client API

A brief history

Back in 2001, there weren't many good solutions for searching in web applications. Andrew Aksyonoff, a Russian developer, was facing difficulties in finding a search engine with features such as good search quality (relevance), high searching speed, and low resource requirements - for example, disk usage and CPU.

He tried a few available solutions and even modified them to suit his needs, but in vain. Eventually he decided to come up with his own search engine, which he later named Sphinx.

After the first few releases of Sphinx, Andrew received good feedback from users. Over a period of time, he decided to continue developing Sphinx and founded Sphinx Technologies Inc.

Today Andrew is the primary developer for Sphinx, along with a few others who joined the wagon. At the time of writing, Sphinx was under heavy development, with regular releases.

License

Sphinx is a free and open source software which can be distributed or modified under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation, either version 2 or any later version.

However, if you intend to use or embed Sphinx in a project but do not want to disclose the source code as required by GPL, you will need to obtain a commercial license by contacting Sphinx Technologies Inc. at http://sphinxsearch.com/contacts.html

Installation


Enough talking, let's get on to some real action. The first step is to install Sphinx itself.

System requirements

Sphinx was developed and tested mostly on UNIX based systems. All modern UNIX based operating systems with an ANSI compliant compiler should be able to compile and run Sphinx without any issues. However, Sphinx has also been found running on the following operating systems without any issues.

  • Linux (Kernel 2.4.x and 2.6.x of various distributions)

  • Microsoft Windows 2000 and XP

  • FreeBSD 4.x, 5.x, 6.x

  • NetBSD 1.6, 3.0

  • Solaris 9, 11

  • Mac OS X

Note: The Windows version of Sphinx is not meant to be used on production servers. It should only be used for testing and debugging. This is the primary reason that all examples given in this book will be for Linux-based systems.

Sphinx on a Unix-based system

If you intend to install Sphinx on a UNIX based system, then you need to check the following:

  • C++ compiler (GNU GCC works fine)

  • A make program (GNU make works fine)

  • The XML libraries libexpat1 (name may be different on non Ubuntu distro) and libexpat1-dev (If you intend to use the xmlpipe2 data source)

Time for action - installation on Linux


  1. 1. Download the latest stable version of the sphinx source from http://sphinxsearch.com/downloads.html.

  2. 2. Extract it anywhere on your file system and go inside the extracted sphinx directory:

    $ tar -xzvf sphinx-0.9.9.tar.gz
    $ cd sphinx-0.9.9
  3. 3. Run the configure utility:

    $ ./configure --prefix=/usr/local/sphinx
  4. 4. Build from the source:

    $ make

    Note

    It will take a while after you run the make command as it builds the binaries from the source code.

  5. 5. Install the application (run as root):

    $ make install

What just happened?

We downloaded the latest release of Sphinx and extracted it using the tar command. We then ran the configure command which gets the details of our machine and also checks for all dependencies. If any of the dependency is missing, it will throw an error. We will take a look at possible dependency issues in a while.

Once we are done with configure, the make command will build (compile) the source code. After that, make install will actually install the binaries to respective location as specified in --prefix option to the configure.

Options to the configure command

There are many options that can be passed to the configure command but we will take a look at a few important ones:

  • --prefix=/path: This option specifies the path to install the sphinx binaries. In this book it is assumed that sphinx was configured with --prefix=/usr/local/sphinx so it is recommended that you configure your path with the same prefix.

  • --with-mysql=/path: Sphinx needs to know where to find MySQL's include and library files. It auto-detects this most of the time but if for any reason it fails, you can supply the path here.

  • --with-pgsql=/path: Same as -with-mysql but for PostgreSQL.

Most of the common errors you would find while configuring sphinx are related to missing MySQL include files.

This can be caused either because Sphinx's auto detection for MySQL include path failed, or MySQL's devel package has not been installed on your machine. If MySQL's devel package is not installed, you can install it using the Software Package Manager (apt or yum) of your operating system. In case of Ubuntu, the package is called libmysqlclient16-dev.

Note

If you intend to use Sphinx without MySQL then you can use the configure option --without-mysql.

You need to follow pretty much the same steps if PostgreSQL include files are missing. In this book we will be primarily using MySQL for all examples.

Known issues during installation

Listed next are a few errors or issues that may arise during Sphinx's installation make can sometimes fail with the following error:

/bin/sh: g++: command not found
make[1]: *** [libsphinx_a-sphinx.o] Error 127

This may be because of a missing gcc-c++ package. Try installing it.

At times you might get compile-time errors like:

sphinx.cpp:67: error: invalid application of `sizeof' to
incomplete type `Private::SizeError<false>'

To fix the above error try editing sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t.

#define STDOUT_FILENO fileno(stdout)
#else
typedef DWORD SphOffset_t;
#endif

One drawback of doing this would be that you won't be able to use full-text indexes larger than 2 GB.

Sphinx on Windows

Installing on a Windows system is easier than on a Linux system as you can use the pre-compiled binaries.

Time for action - installation on Windows


  1. 1. Download the Win32 binaries of Sphinx from http://www.sphinxsearch.com/downloads.html. Choose the binary depending on whether you want MySQL support, or PostgreSQL support, or both.

  2. 2. Extract the downloaded ZIP to any suitable location. Let's assume it is extracted to C:\>sphinx.

  3. 3. Install the searched system as a Windows service by issuing the following command in the Command Prompt:

    C:\sphinx\bin\searchd -install -config C:\sphinx\sphinx.conf -servicename SphinxSearch
    

    This will install searchd as a service but it won't be started yet. Before starting the Sphinx service we need to create the sphinx.conf file and create indexes. This will be done in the next few chapters.

What just happened?

Installing Sphinx on windows is a straight-forward task. We have pre-compiled binaries for the windows platform, which can be used directly.

After extracting the ZIP, we installed the Sphinx service. We need not install anything else since binaries for indexer and search are readily available in the C:\sphinx\bin directory.

The use of binaries to create indexes and the use of the searchd service to search will be covered in the next few chapters.

Note

At the time of writing this book, the Windows version of Sphinx is not meant to be used in production environment. It is highly recommended to use the Linux version of Sphinx in your production environment.

Sphinx on Mac OS X

Installation on a Mac is very similar to how it is done on Linux systems. You need to build it from source and then install the generated binaries.

Time for action - installation on a Mac


  1. 1. Download the latest stable version of the sphinx source from http://sphinxsearch.com/downloads.html.

    $ tar -xzvf sphinx-0.9.9.tar.gz
    $ cd sphinx-0.9.9
    
  2. 2. Run the configure utility:

    $ ./configure -prefix=/usr/local/sphinx
    
  3. 3. If you are on a 64 bit Mac then use the following command to configure:

    LDFLAGS="-arch x86_64" ./configure --prefix=/usr/local/sphinx
    $ make
    $ sudo make install
    
  4. 4. Next, run the make command:

    $ make
    
  5. 5. Finally, run the following command to complete your configuration:

    $ sudo make install
    

What just happened?

We downloaded the Sphinx source and extracted it using the tar command. We then configured Sphinx and built it using the make command. The options to configure are the same as we used while installing Sphinx in Linux.

The only notable difference between installation on Linux and Mac is that if your Mac is 64 bit, your configure command is changed slightly as given above.

Other supported systems

Above we learned how to install Sphinx on Linux, Windows, and Mac. However, these are not the only systems on which Sphinx can be installed. Sphinx is also supported on the following systems:

  • FreeBSD 4.x, 5.x, 6.x

  • NetBSD 1.6, 3.0

  • Solaris 9, 11

    Note

    Installation procedure for the above mentioned systems is more or less similar to how it is done on a Linux system.

Summary


In this chapter:

  • We saw the different ways to perform search

  • We got to know about Sphinx and how it helps in performing searches

  • We took a look at some of Sphinx's features and its brief history

  • We learned how to install Sphinx on different operating systems

By now you should have installed Sphinx on your system and laid the foundation for Chapter 2, Getting Started, where we will get started with Sphinx and some basic usage.

Left arrow icon Right arrow icon

Key benefits

What you will learn

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Buy Now

Product Details


Publication date : Mar 16, 2011
Length 244 pages
Edition : 1st Edition
Language : English
ISBN-13 : 9781849512541
Category :

Table of Contents

15 Chapters
Sphinx Search Chevron down icon Chevron up icon
Credits Chevron down icon Chevron up icon
About the Author Chevron down icon Chevron up icon
Acknowledgement Chevron down icon Chevron up icon
About the Reviewers Chevron down icon Chevron up icon
www.PacktPub.com Chevron down icon Chevron up icon
Preface Chevron down icon Chevron up icon
Setting Up Sphinx Chevron down icon Chevron up icon
Getting Started Chevron down icon Chevron up icon
Indexing Chevron down icon Chevron up icon
Searching Chevron down icon Chevron up icon
Feed Search Chevron down icon Chevron up icon
Property Search Chevron down icon Chevron up icon
Sphinx Configuration Chevron down icon Chevron up icon
What Next? Chevron down icon Chevron up icon

Customer reviews

Filter icon Filter
Top Reviews
Rating distribution
Empty star icon Empty star icon Empty star icon Empty star icon Empty star icon 0
(0 Ratings)
5 star 0%
4 star 0%
3 star 0%
2 star 0%
1 star 0%

Filter reviews by


No reviews found
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.