Reader small image

You're reading from  Learning Bayesian Models with R

Product typeBook
Published inOct 2015
Reading LevelBeginner
PublisherPackt
ISBN-139781783987603
Edition1st Edition
Languages
Right arrow
Author (1)
Hari Manassery Koduvely
Hari Manassery Koduvely
author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely

Right arrow

Topic modeling using Bayesian inference


We have seen the supervised learning (classification) of text documents in Chapter 6, Bayesian Classification Models, using the Naïve Bayes model. Often, a large text document, such as a news article or a short story, can contain different topics as subsections. It is useful to model such intra-document statistical correlations for the purpose of classification, summarization, compression, and so on. The Gaussian mixture model learned in the previous section is more applicable for numerical data, such as images, and not for documents. This is because words in documents seldom follow normal distribution. A more appropriate choice would be multinomial distribution.

A powerful extension of mixture models to documents is the work of T. Hofmann on Probabilistic Semantic Indexing (reference 6 in the References section of this chapter) and that of David Blei, et. al. on Latent Dirichlet allocation (reference 7 in the References section of this chapter). In...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Learning Bayesian Models with R
Published in: Oct 2015Publisher: PacktISBN-13: 9781783987603

Author (1)

author image
Hari Manassery Koduvely

Dr. Hari M. Koduvely is an experienced data scientist working at the Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to joining Samsung, the author has worked for Amazon and Infosys Technologies, developing machine learning-based applications for their products and platforms. He also has several publications on Bayesian inference and its applications in areas such as recommendation systems and predictive health monitoring. His current interest is in developing large-scale machine learning methods, particularly for natural language understanding.
Read more about Hari Manassery Koduvely