Packt+ | Advance your knowledge in tech

You're reading from Apache Solr Search Patterns

Product typeBook

Published inApr 2015

Reading LevelIntermediate

Publisher

ISBN-139781783981847

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Author (1)

Jayant Kumar

Chapter 4. Solr for Big Data

In the previous chapter, we learned about Solr internals and the creation of custom queries. We understood the algorithms behind the working of AND and OR clauses in Solr and the internals of the eDisMax parser. We implemented our own plugin in Solr for running a proximity search by using SWAN queries. We understood the internals of how filters work.

In this chapter, we will discuss how and why Solr is an appropriate choice for churning out analytical reports. We will understand the concept of big data and how Solr can be used to solve the problems that come along with running queries on big data. We will discuss different faceting concepts and see how distributed pivot faceting works.

The topics that we will cover in this chapter are:

Introduction to big data
Getting data points using facets
Radius faceting for location-based data
Data analysis using pivot faceting
Introduction to graphical representation of analytical reports

Introduction to big data

Big data can simply be defined as data too large to be processed by a single machine. Let us say that we have 1 TB of data and the reports that need to be generated from it cannot be processed on a single machine in a time span acceptable to us. Let us take the example of click stream analysis. Internet companies such as Yahoo or Google keep an eye on the activity of the user by capturing each click that the user does on their website. Sometimes the complete page by page flow is also captured. Google, for example, captures the position from the top of a search result page for a search on a particular keyword or phrase. The amount of data generated and captured is huge and may be running into exabytes every day. This data needs to be processed on a day-to-day basis for analytical purposes. The analytical reports that are generated from this data are used to improve the experience of the user visiting the website.

Is it possible to process an exabyte of data? Of course...

Getting data points using facets

Let us refresh our memory about facets. Simply put, faceting refers to the method of categorizing data. A facet on a search result will contain categories and the number of documents in each category. The purpose of facets is to help the user narrow down his or her search result on the basis of some categories. Let us take an example to understand this better.

A search on mobile a phone would bring up a few of the following facets on the Amazon website:

Facet for Brand: We can see a facet for Brand in the following screenshot:

The brand facet is purely intended to help the user shortlist his or her preferences. The count of cell phones for each brand is not displayed, although this information is readily available and can be used for display.

Facet for display size: We can see the facet for display size in the following image:

The display size category shows facets based on the range of display sizes. Phones having sizes of less than 3.9 inches are grouped together...

Radius faceting for location-based data

Location-based data can be represented in Solr using latitudes and longitudes. Applications can combine other data with location information to provide more insight into the data pertaining to a certain location. In analytics, location-based data is very important. Whether we are dealing with sales information, statistical information of any kind, or information pertaining to visits to a website, having a location in addition to the numbers that we already have provides an additional insight with a regional perspective.

We will delve into how geospatial searches happen in Solr in Chapter 6, Solr for Spatial Search. For the current chapter, let us understand the different types of location filters available with Solr.

For spatial filters, the following parameters are used in Solr:

d: Radial distance in kilometers
pt: Center point in the format of latitude and longitude
sfield: Refers to a spatial indexed field

Tip

In order to run queries, we would need...

Data analysis using pivot faceting

As per the definition of pivoting in the Solr wiki, it is a summarization tool that lets you automatically sort, count, total, or average data stored in a table. Pivot faceting lets you create a summary table of the results from a query across numerous documents.

The output of pivot faceting can be referred to as decision trees. This means the output of pivot faceting is represented by a hierarchy of all sub-facets under a facet with counts both for individual facets and sub-facets. We can constrain the previous facet with a new sub-facet and get counts of the sub-sub-facets inside it. Let us see an example to understand pivot faceting.

Facet A has constraints as X,Y with counts M for X and N for Y. We could go ahead and constrain facet A by X and get a new sub-facet B with constraints W,Z and counts O for W and P for Z.

To understand better how pivot faceting works and hence how it could be helpful in analytics, let us see an example. Our index contains some...

Graphs for analytics

Once we know which queries to execute for getting the facets and hierarchical information, we need a graphical representation of the same. There are a few open source graph engines, mostly JavaScript based, that can be used for this. Most of these engines take JSON data and use it to display the graphs. Let us see some of the engines:

chart.js: This is an HTML5 based graph engine. It can be downloaded from http://www.chartjs.org.
D3.js: This is another JavaScript library that brings data to life using HTML and CSS. D3 can be used to generate an HTML table from an array of numbers or the same numbers can be used to draw an interactive bar chart. It is available for download at http://d3js.org.
Google charts: This is another library provided by Google. It can be used to draw graphs based on data from Solr. Google charts provide a large range of graphs from simple line charts to complex hierarchical tree maps. Most of the charts are ready to use. Google charts can be downloaded...

Summary

In this chapter, we learnt how Solr can be used to churn out data for analytics purposes. We also understood big data and learnt how to use different faceting concepts, such as radius faceting and pivot faceting, for data analytics purposes. We saw some codes that can be used for generating graphs and discussed the different libraries available for this. We discussed that, with SolrCloud, we can build our own data warehouse and get graphs of not only historical data but also real-time data.

In the next chapter, we will learn about the problems that we normally face during the implementation of Solr on an e-commerce platform. We will also discuss how to debug such problems along with tweaks to further optimize the instance(s). Additionally, we will learn about semantic search and its implementation in e-commerce scenarios.

The rest of the chapter is locked

You have been reading a chapter from

Apache Solr Search Patterns

Published in: Apr 2015Publisher: ISBN-13: 9781783981847

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages