Reader small image

You're reading from  Apache Solr Search Patterns

Product typeBook
Published inApr 2015
Reading LevelIntermediate
Publisher
ISBN-139781783981847
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Jayant Kumar
Jayant Kumar
author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar

Right arrow

The information gain model


The information gain model is a type of machine learning concept that can be used in place of the inverse document frequency approach. The concept being used here is the probability of observing two terms together on the basis of their occurrence in an index. We use an index to evaluate the occurrence of two terms x and y and calculate the information gain for each term in the index:

  • P(x): Probability of a term x appearing in a listing

  • P(x|y): Probability of the term x appearing given a term y also appears

The information gain value of the term y can be computed as follows:

Information gain equation

This equation says that the more number of times term y appears with term x with respect to the total occurrence of term x, the higher is the information gain for that y.

Let us take a few examples to understand the concept.

In the earlier example, if the term unique appears with jacket a large number of times as compared to the total occurrence of the term jacket, then...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Apache Solr Search Patterns
Published in: Apr 2015Publisher: ISBN-13: 9781783981847

Author (1)

author image
Jayant Kumar

Jayant Kumar is an experienced software professional with a bachelor of engineering degree in computer science and more than 14 years of experience in architecting and developing large-scale web applications. Jayant is an expert on search technologies and PHP and has been working with Lucene and Solr for more than 11 years now. He is the key person responsible for introducing Lucene as a search engine on www.naukri.com, the most successful job portal in India. Jayant is also the author of the book Apache Solr PHP Integration, Packt Publishing, which has been very successful. Jayant has played many different important roles throughout his career, including software developer, team leader, project manager, and architect, but his primary focus has been on building scalable solutions on the Web. Currently, he is associated with the digital division of HT Media as the chief architect responsible for the job site www.shine.com. Jayant is an avid blogger and his blog can be visited at http://jayant7k.blogspot.in. His LinkedIn profile is available at http://www.linkedin.com/in/jayantkumar.
Read more about Jayant Kumar