Reader small image

You're reading from  Mastering Apache Solr 7.x

Product typeBook
Published inFeb 2018
Reading LevelExpert
PublisherPackt
ISBN-139781788837385
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Sandeep Nair
Sandeep Nair
author image
Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

Chintan Mehta
Chintan Mehta
author image
Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

Dharmesh Vasoya
Dharmesh Vasoya
author image
Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya

View More author details
Right arrow

Understanding tokenizers


We have previously seen that an analyzer may be a single class or a set of defined tokenizer and filter classes.

The analyzer executes the analysis process in two steps:

  • Tokenization (parsing): Using configured tokenizer classes
  • Filtering (transformation): Using configured filter classes

We can also do preprocessing on a character stream before tokenization; we can do this with the help of CharFilters (we will see this later in the chapter). An analyzer knows its configured field, but a tokenizer doesn't have any idea about the field. The job of the tokenizer is only to read from a character stream, apply a tokenization mechanism based on its behavior, and produce a new sequence of a token stream. 

What is a tokenizer?

A tokenizer is a tool provided by Solr that runs a tokenization process, breaks a stream of text into tokens at some delimiter, and generates a token stream. Tokenizers are configured by their Java implementation factory class in the managed-schema.xml file...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Apache Solr 7.x
Published in: Feb 2018Publisher: PacktISBN-13: 9781788837385

Authors (3)

author image
Sandeep Nair

Sandeep has been working in Liferay technology for more than 8 years and has more than 10 years' of overall experience in Java and Java EE technologies. He has executed projects using Liferay across various verticals such as construction, financial, and medical domains, providing solutions for collaboration, enterprise content management, and Web content Management systems. He has created a free and open source Google Chartlet plugin for Liferay which has been downloaded and used by people across 90 countries according to sourceforge statistics. Besides development, consulting, and implementing solutions he has also been involved in giving training on Liferay in other countries. Before he jumped into Liferay he had experience in Java and Java EE Technologies. He has authored "Liferay Beginner's Guide" and "Instant Liferay Portal 6 Starter" with Packt Publishing. When he is not coding, he loves to read books and travel.
Read more about Sandeep Nair

author image
Chintan Mehta

Chintan Mehta is a co-founder of KNOWARTH Technologies and heads the cloud/RIMS/DevOps team. He has rich, progressive experience in server administration of Linux, AWS Cloud, DevOps, RIMS, and on open source technologies. He is also an AWS Certified Solutions Architect. Chintan has authored MySQL 8 for Big Data, Mastering Apache Solr 7.x, MySQL 8 Administrator's Guide, and Hadoop Backup and Recovery Solutions. Also, he has reviewed Liferay Portal Performance Best Practices and Building Serverless Web Applications.
Read more about Chintan Mehta

author image
Dharmesh Vasoya

Dharmesh Vasoya is a Liferay 6.2 certified developer. He has 5.5 years of experience in application development with technologies such as Java, Liferay, Spring, Hibernate, Portlet, and JSF. He has successfully delivered projects in various domains, such as healthcare, collaboration, communication, and enterprise CMS, using Liferay. Dharmesh has good command of the configuration setup of servers such as Solr, Tomcat, JBOSS, and Apache Web Server. He has good experience of clustering, load balancing and performance tuning. He completed his MCA at Ahmedabad University.
Read more about Dharmesh Vasoya