Packt+ | Advance your knowledge in tech

You're reading from Mastering Apache Solr 7.x

Product typeBook

Published inFeb 2018

Reading LevelExpert

PublisherPackt

ISBN-139781788837385

Edition1st Edition

Languages

Java

Tools

Solr

Concepts

Enterprise Search

Authors (3):

Sandeep Nair

Chintan Mehta

Dharmesh Vasoya

View More author details

Chapter 3. Designing Schemas

Now that we have seen how to install Solr on our machine, let's dive deeper and understand the nitty-gritty of Solr.

Assume that you are building a home that you have always desired. How would you start? Will you just get all the bricks, cement, windows, doors, beams, and so on and ask the builders to start building? Nope! You would want to make sure you go through various designs based on the area that you have and decide on a design that you think will not only look good but also last long. Creating any application follows the same principle and demands proper schema design.

In this chapter, we will traverse through schema design. We will understand how to design a schema using documents and fields. We will also see various field types and get an understanding of the Schema API. We will finally look at schemaless mode.

How Solr works

The easiest way to understand how Solr works is to see how a telephone directory helps you to look something up. A telephone directory, or yellow pages as it is called in some places, is a book containing lots of phone numbers. It has lots of pages. Now, to find information in it would be a humongous task unless it had some sort of indexing and categorizing. For example, we can easily find all the restaurants by just navigating to the category of restaurants and finding the locality that we are living nearby.

Similarly, Solr can be imagined as a huge directory that has been fed data as per our requirement, and it can be queried to get the relevant data by using an appropriate search criteria that was indexed while feeding in the data. Let's have a look at the following diagram and understand how Solr search platform works:

As you can see, the way to look at Solr is like this—it is basically fed with lots of information, which is correctly indexed. Then, in order to retrieve...

Understanding field types

As discussed earlier, we are able to tell Solr how it should interpret the incoming data in a field and how we can query a field using the information specified in field types.

Definitions and properties of field types

Before going to the definitions and properties, we will see what field analysis means.

What Solr should do or how it should interpret data whenever data is indexed is important. For example, a description of a book can contain lots of useless words: helping verbs such as is, was, and are; pronouns such as they, we, and so on; and other general words such as the, a, this, and so on. Querying these words will bring all the data. Similarly what should we do with words that have capital letters?

All of these problems can be catered using field analysis to ignore common words or casing while indexing or querying. We will dive deep into field analysis in the next chapter.

Now, coming back to field types, all analyses on a field are done by the field type, whether...

Field management

Once your primary work of field types setup is done, field definition is a small task. Just as with field types, the fields element of schema.xml holds the field definition.

Field properties

Let's first see a sample field definition:

<field name="weight" type="float" default=”0.0” indexed="true" stored="true"/>

In the preceding example, we have defined a field named weight, whose field type is float with a default value of 0.0. Moreover, the indexed as well as stored properties are explicitly set to true.

Field definitions will have these properties:

name: The field name. This has to be alphanumeric and can include underscore characters. It cannot begin with a digit. Reserved names should start and end with underscores (for example, _root_). Every field must have a name.
type: The name of the fieldType. All the fields should have a type.
default: The default value to be used for the field.

Fields and field types share many of the optional properties here. If there are two different...

Mastering Schema API

Schema API is the one-stop shop for most operations on your schema. It provides a REST-like HTTP API for doing all these operations.

You can read, write, or delete dynamic fields, fields, copy field rules, and field types.

Note

Do not manually write any changes into the managed-schema file yourself. This will work only as long as you don't use Schema API. If you use Schema API by mistake, all your changes might be overwritten. So, it is highly recommend that you leave your managed-schema file alone.

The response of the API call is of either JSON or XML format.

Assuming that you are using the gettingstarted collection, the base address of API will be http://localhost:8983/solr/gettingstarted.

Note

Always reindex once you use Schema API for modifications. Only then will the changes that you have applied to the schema be reflected for existing documents that are already indexed.

Schema API in detail

Let's see some of the important schema endpoints. We will do all the examples...

Deciphering schemaless mode

Schemaless mode is used when we want to quickly create a useful schema by indexing sample data. It does not involve any manual editing of the data.

All of its features are managed by solrconfig.xml.

The features that we are particularly interested in are:

Managed schema: All modifications in the schema are made via Solr API at runtime using schemaFactory, which supports these changes.
Field value class guessing: This is a technique of using a cascading set of parsers on fields that have not been seen before. It then guesses whether the field is an Integer, Long, Float, Double, Boolean, or Date.

And finally used for automatic schema field addition that is based on field value classes.

Creating a schemaless example

All of the preceding three features are already configured in the Solr bundle. To start using schemaless mode, run the following command:

bin/solr start -e schemaless

This will start a single Solr server with the collection gettingstarted.

In order to see the schema...

Summary

In this chapter, we got an overview of how Solr works and saw schema design. We then jumped into Solr field types and saw how to define fields, copy fields, and create dynamic fields. We moved on to the Schema API, and finally we saw what schemaless mode is all about.

In the next chapter, we will get our hands dirty and learn all about analyzers, tokenizers, and filters.

The rest of the chapter is locked

You have been reading a chapter from

Mastering Apache Solr 7.x

Published in: Feb 2018Publisher: PacktISBN-13: 9781788837385

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages