Reader small image

You're reading from  Apache Solr PHP Integration

Product typeBook
Published inNov 2013
Reading LevelIntermediate
PublisherPackt
ISBN-139781782164920
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

Chapter 2. Inserting, Updating, and Deleting Documents from Solr

We will start this chapter by discussing the Solr schema. We will explore the default schema provided by Solr. Further, we will explore:

  • Pushing sample data into Solr

  • Adding sample documents to the Solr index

  • Using PHP to add documents to the Solr index

  • Updating documents in Solr using PHP

  • Deleting documents in Solr using PHP

  • Using commit, rollback, and index optimization

The Solr schema


The Solr schema mostly consists of fields and field types. It defines the fields that are to be stored in the Solr index and the processing that should happen on data being indexed or searched in those fields. Internally, the schema is used to assign properties to the fields used for creating a document that is to be indexed using the Lucene API. The default schema available with Solr can be located in <solr_home>/example/solr/collection1/conf/schema.xml. Here, collection1 is the name of the core.

Note

A Solr server can have multiple cores and each core can have its own schema.

Let us open up the schema.xml file and go through it. In the XML file, we can see that there is a section for fields inside which there are multiple fields. Also, there is another section for types. The types section contains different entries of fieldType, which define the type of field in terms of how the field will be processed during indexing and during query. Let us understand how to create...

Adding sample documents to the Solr index


Let us push in some sample data into Solr. Go to <solr_dir>/example/exampledocs. Execute the following commands to add all sample documents into our Solr index:

java -Durl=http://localhost:8080/solr/update -Dtype=application/csv -jar post.jar books.csv
java -Durl=http://localhost:8080/solr/update  -jar post.jar *.xml
java -Durl=http://localhost:8080/solr/update -Dtype=application/json -jar post.jar books.json

To check how many documents have been indexed go to the following URL:

http://localhost:8080/solr/collection1/select/?q=*:*

This is a query to Solr that asks to return all the documents in the index. The numFound field in the XML output specifies the number of documents in our Solr index.

We are working with the default schema. To check the schema, go to the following URL:

http://localhost:8080/solr/#/collection1/schema

The following screenshot shows the content of a sample schema file schema.xml:

We can see that there are multiple fields: id...

Using PHP to add documents to the Solr index


Let us see the code to add documents to Solr using the Solarium library. When we execute the following query we can see that there are three books of the author George R R Martin in our Solr index:

http://localhost:8080/solr/collection1/select/?q=martin

Let us add the remaining two books, which have also been published to our index:

  1. Create a solarium client using the following code:

    $client = new Solarium\Client($config);
  2. Create an instance of the update query using the following code:

    $updateQuery = $client->createUpdate();
  3. Create the documents you want to add and add fields to the document.

    $doc1 = $updateQuery->createDocument();
    $doc1->id = 112233445;
    $doc1->cat = 'book';
    $doc1->name = 'A Feast For Crows';
    $doc1->price = 8.99;
    $doc1->inStock = 'true';
    $doc1->author = 'George R.R. Martin';
    $doc1->series_t = '"A Song of Ice and Fire"';
    $doc1->sequence_i = 4;
    $doc1->genre_s = 'fantasy';
  4. Similarly, another document $doc2...

Updating documents in Solr using PHP


Let us see how we can use PHP code along with Solarium library to update documents in Solr.

  1. First check if there are any documents with the word smith in our index.

    http://localhost:8080/solr/collection1/select/?q=smith
  2. We can see numFound=0, which means that there are no such documents. Let us add a book to our index with the last name of the author as smith.

    $updateQuery = $client->createUpdate();
    $testdoc = $updateQuery->createDocument();
    $testdoc->id = 123456789;
    $testdoc->cat = 'book';
    $testdoc->name = 'Test book';
    $testdoc->price = 5.99;
    $testdoc->author = 'Hello Smith';
    $updateQuery->addDocument($testdoc);
    $updateQuery->addCommit();
    $client->update($updateQuery);
  3. If we run the same select query again, we can see that now there is one document in our index with the author as Smith. Let us now update the author's name to Jack Smith and the price tag to 7.59:

    $testdoc = $updateQuery->createDocument();
    $testdoc->id = 123456789...

Deleting documents in Solr using PHP


Now let us go ahead and delete this document from Solr.

$deleteQuery = $client->createUpdate();
$deleteQuery->addDeleteQuery('author:Smith');
$deleteQuery->addCommit();
$client->update($deleteQuery);

Now, if we run the following query on Solr, the document is not found:

http://localhost:8080/solr/collection1/select/?q=smith

What we did here was that we created a query in Solr to search for all documents where the author field contains the smith word and then passed it as a delete query.

We can add multiple delete queries via the addDeleteQueries method. This can be used to delete multiple sets of documents in a single call.

$deleteQuery->addDeleteQuery(array('author:Burst', 'author:Alexander'));

When this query is executed, all documents where the author field is either Burst or Alexander are deleted from the index.

In addition to deleting by a query, we can also delete by ID. Each book that we have added to our index has an id field, which we...

Commit, rollback, and index optimization


The commitWithin parameter that we have been passing as arguments to our addDocument() function specifies the time for the commit to happen for this add document operation. This leaves the control of when to do the commit to Solr itself. Solr optimizes the number of commits to a minimum while still fulfilling the update latency requirements.

The rollback option is exposed via the addRollback() function. Rollback can be done since the last commit and before current commit. Once a commit has been done, the changes cannot be rolled back.

$rollbackQuery = $client->createUpdate();
$rollbackQuery->addRollback();

Index optimization is one of the tasks that is not necessarily required. But an optimized index has better performance than a non-optimized index. To optimize an index using the PHP code, we can use the addOptimize(boolean $softCommit, boolean $waitSearcher, int $maxSegments) function. It has parameters to enable soft commit, wait until a new...

Summary


In this chapter we started off by discussing the Solr schema. We got a basic understanding of how the Solr schema works. We then added some sample documents to our Solr index. Then we saw multiple pieces of code to add, update, and delete documents to our Solr index. We also saw how to use cURL to delete documents. We discussed how commit and rollback work on the Solr index. We also saw an example of how to use rollback in our code. We discussed index optimization using PHP code and the benefits of optimizing the Solr index.

In the next chapter we will see how to execute search queries on Solr using PHP code and explore different query modes available with Solr.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Apache Solr PHP Integration
Published in: Nov 2013Publisher: PacktISBN-13: 9781782164920
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar