Reader small image

You're reading from  Solr Cookbook - Third Edition

Product typeBook
Published inJan 2015
Reading LevelIntermediate
Publisher
ISBN-139781783553150
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Rafal Kuc
Rafal Kuc
author image
Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc

Right arrow

Using position increment to divide sentences


Imagine that we want to search in the short notes created by our users. We want to have two possibilities—searching inside a single sentence and searching inside the whole content of the note. We also know that our users don't write notes longer than 100 sentences, and each sentence has a maximum of 100 words, giving us a maximum of 10,000 words per note. To achieve this, we will use position increments that allow us to control how data is divided in the same field.

How to do it...

The following steps will allow us to fulfill our requirements:

  1. We start with example data, which will look like this:

    <add>
     <doc>
      <field name="id">1</field>
      <field name="note_line">Support meeting at Monday.</field>
      <field name="note_line">Need to prepare presentation.</field>
     </doc>
    </add>
  2. Now, we need to create an index structure. To do this, we need to add the fields that will be used. We do this by adding...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Solr Cookbook - Third Edition
Published in: Jan 2015Publisher: ISBN-13: 9781783553150

Author (1)

author image
Rafal Kuc

Rafał Kuć is a software engineer, trainer, speaker and consultant. He is working as a consultant and software engineer at Sematext Group Inc. where he concentrates on open source technologies such as Apache Lucene, Solr, and Elasticsearch. He has more than 14 years of experience in various software domains—from banking software to e–commerce products. He is mainly focused on Java; however, he is open to every tool and programming language that might help him to achieve his goals easily and quickly. Rafał is also one of the founders of the solr.pl site, where he tries to share his knowledge and help people solve their Solr and Lucene problems. He is also a speaker at various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, Lucene/Solr Revolution, Velocity, and DevOps Days. Rafał began his journey with Lucene in 2002; however, it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and that was it. He started working with Elasticsearch in the middle of 2010. At present, Lucene, Solr, Elasticsearch, and information retrieval are his main areas of interest. Rafał is also the author of the Solr Cookbook series, ElasticSearch Server and its second edition, and the first and second editions of Mastering ElasticSearch, all published by Packt Publishing.
Read more about Rafal Kuc