Search icon
Subscription
0
Cart icon
Close icon
You have no products in your basket yet
Save more on your purchases!
Savings automatically calculated. No voucher code required
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Administrating Solr

You're reading from  Administrating Solr

Product type Book
Published in Oct 2013
Publisher Packt
ISBN-13 9781783283255
Pages 120 pages
Edition 1st Edition
Languages
Author (1):
Surendra Mohan Surendra Mohan
Profile icon Surendra Mohan

Language Detection


In this section, we will learn about language detections, and how to set up and configure so as to make it functional.

Solr has a unique capability to identify languages and map them with their respective fields while indexing. To do so, it uses langid, which is a UpdateRequestProcessor. This language detection feature can be implemented in Solr using the following:

  • Tika language detection

  • LangDetect language detection

  • Compact Language Detector (CLD)

Now, we will have a look at the comparison between these three implementations.

Parameter

CLD

Apache Tika

LangDetect

Language count supported

21

17

21

Languages not supported

N/A

Bulgarian, Czech, Lithuanian, and Latvian

N/A

Languages detected

> 76

27

53

Accuracy

Medium

Low

High

Confusing Languages

 

Danish confused with Norwegian

Danish confused with Norwegian

Incorrect results (Probability)

Low

Medium

High

Performance

Fast

Slow

Slower

In the given comparative study, we can conclude that Compact...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}