Reader small image

You're reading from  Lucene 4 Cookbook

Product typeBook
Published inJun 2015
Reading LevelExpert
Publisher
ISBN-139781782162285
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Edwood Ng
Edwood Ng
author image
Edwood Ng

Edwood Ng is a technologist with over a decade of experience in building scalable solutions from proprietary implementations to client-facing web-based applications. Currently, he's the director of DevOps at Wellframe, leading infrastructure and DevOps operations. His background in search engine began at Endeca Technologies in 2004, where he was a technical consultant helping numerous clients to architect and implement faceted search solutions. After Endeca, he drew on his knowledge and began designing and building Lucene-based solutions. His first Lucene implementation that went to production was the search engine behind http://UpDown.com. From there on, he continued to create search applications using Lucene extensively to deliver robust and scalable systems for his clients. Edwood is a supporter of an open source software. He has also contributed to the plugin sfI18NGettextPluralPlugin to the Symphony project.
Read more about Edwood Ng

Vineeth Mohan
Vineeth Mohan
author image
Vineeth Mohan

Vineeth Mohan is an architect and developer. He currently works as the CTO at Factweavers Technologies and is also an Elasticsearch-certified trainer. He loves to spend time studying emerging technologies and applications related to data analytics, data visualizations, machine learning, natural language processing, and developments in search analytics. He began coding during his high school days, which later ignited his interest in computer science, and he pursued engineering at Model Engineering College, Cochin. He was recruited by the search giant Yahoo! during his college days. After 2 years of work at Yahoo! on various big data projects, he joined a start-up that dealt with search and analytics. Finally, he started his own big data consulting company, Factweavers. Under his leadership and technical expertise, Factweavers is one of the early adopters of Elasticsearch and has been engaged with projects related to end-to-end big data solutions and analytics for the last few years. There, he got the opportunity to learn various big-data-based technologies, such as Hadoop, and high-performance data ingress systems and storage. Later, he moved to a start-up in his hometown, where he chose Elasticsearch as the primary search and analytic engine for the project assigned to him. Later in 2014, he founded Factweavers Technologies along with Jalaluddeen; it is consultancy that aims at providing Elasticsearch-based solutions. He is also an Elasticsearch-certified corporate trainer who conducts trainings in India. Till date, he has worked on numerous projects that are mostly based on Elasticsearch and has trained numerous multinationals on Elasticsearch.
Read more about Vineeth Mohan

View More author details
Right arrow

Enumerating results


We have already previewed how the results are enumerated from the previous sample code. You might have noticed that the major component in search results is TopDocs. Now, we will show you how to leverage this object to paginate results. Lucene does not provide pagination functionality, but we can still build pagination easily using what's available in TopDocs.

How to do it...

Here is a sample implementation on pagination:

public List<Document> getPage(int from , int size){
  List<Document> documents = new ArraList<Document>();
  Query query = parser.parse(searchTerm);
  TopDocs hits = searcher.search(query, maxNumberOfResults);
  int end = Math.min(hits.totalHits, size);
  for (int i = from; i < end; i++) {
    int docId = hits.scoreDocs[i].doc;
    //load the document
    Document doc = searcher.doc(docId);
    documents.add(doc);
  }
  return documents;
}

How it works…

When we perform search in Lucene, actual results are not preloaded immediately. In TopDocs, we only get back an array of ranked pointers. It's called ranked pointers because they are not actual documents, but a list of references (DocId). By default, results are scored by the scoring mechanism. We will see more about scoring in detail in Introduction section Chapter 7, Flexible Scoring. For paging, we can calculate position offset, apply pagination ourselves, and leverage something like what we showed in the sample code to return results by page. Developers at Lucene actually recommend re-executing a search on every page, instead of storing the initial search results (refer to http://wiki.apache.org/lucene-java/LuceneFAQ#How_do_I_implement_paging.2C_i.e._showing_result_from_1-10.2C_11-20_etc.3F). The reasoning is that people are usually only interested in top results and they are confident in Lucene's performance.

Note

This code assumes that parser (QueryParser), searcher (IndexSearcher), and maxNumberOfResults are already initialized. Note that this sample is for illustrative purpose only and it's not optimized.

Previous PageNext Chapter
You have been reading a chapter from
Lucene 4 Cookbook
Published in: Jun 2015Publisher: ISBN-13: 9781782162285
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Edwood Ng

Edwood Ng is a technologist with over a decade of experience in building scalable solutions from proprietary implementations to client-facing web-based applications. Currently, he's the director of DevOps at Wellframe, leading infrastructure and DevOps operations. His background in search engine began at Endeca Technologies in 2004, where he was a technical consultant helping numerous clients to architect and implement faceted search solutions. After Endeca, he drew on his knowledge and began designing and building Lucene-based solutions. His first Lucene implementation that went to production was the search engine behind http://UpDown.com. From there on, he continued to create search applications using Lucene extensively to deliver robust and scalable systems for his clients. Edwood is a supporter of an open source software. He has also contributed to the plugin sfI18NGettextPluralPlugin to the Symphony project.
Read more about Edwood Ng

author image
Vineeth Mohan

Vineeth Mohan is an architect and developer. He currently works as the CTO at Factweavers Technologies and is also an Elasticsearch-certified trainer. He loves to spend time studying emerging technologies and applications related to data analytics, data visualizations, machine learning, natural language processing, and developments in search analytics. He began coding during his high school days, which later ignited his interest in computer science, and he pursued engineering at Model Engineering College, Cochin. He was recruited by the search giant Yahoo! during his college days. After 2 years of work at Yahoo! on various big data projects, he joined a start-up that dealt with search and analytics. Finally, he started his own big data consulting company, Factweavers. Under his leadership and technical expertise, Factweavers is one of the early adopters of Elasticsearch and has been engaged with projects related to end-to-end big data solutions and analytics for the last few years. There, he got the opportunity to learn various big-data-based technologies, such as Hadoop, and high-performance data ingress systems and storage. Later, he moved to a start-up in his hometown, where he chose Elasticsearch as the primary search and analytic engine for the project assigned to him. Later in 2014, he founded Factweavers Technologies along with Jalaluddeen; it is consultancy that aims at providing Elasticsearch-based solutions. He is also an Elasticsearch-certified corporate trainer who conducts trainings in India. Till date, he has worked on numerous projects that are mostly based on Elasticsearch and has trained numerous multinationals on Elasticsearch.
Read more about Vineeth Mohan