Reader small image

You're reading from  Mastering Clojure Data Analysis

Product typeBook
Published inMay 2014
Reading LevelBeginner
Publisher
ISBN-139781783284139
Edition1st Edition
Languages
Right arrow
Author (1)
Eric Richard Rochester
Eric Richard Rochester
author image
Eric Richard Rochester

Eric Richard Rochester Studied medieval English literature and linguistics at UGA. Dissertated on lexicography. Now he programs in Haskell and writes. He's also a husband and parent.
Read more about Eric Richard Rochester

Right arrow

Summary


This has been an interesting dive into natural-language processing and topic modeling, and hopefully we've learned a little US history at the same time. I know I have.

However, it seems that the larger takeaway is something that we all know, but likely forget: Freeform, unstructured, text data is messy, messy, messy. In fact, what we have been working with here is exceptionally clean, as these things go. Topics don't often stand out clearly, and the relationships between subjects as opposed to the topics identified by LDA are often complex and difficult to tease apart.

However, we've also seen some interesting technologies and algorithms to help us deal with the messiness. Topic modeling doesn't—and possibly shouldn't—completely sweep the ambiguities and messiness of texts under the rug, but it does help us get a handle on what's inside large collections of documents.

In the next chapter, we'll head in a different direction and apply Bayesian classification to reports of UFO sightings...

lock icon
The rest of the page is locked
Previous PageNext Chapter
You have been reading a chapter from
Mastering Clojure Data Analysis
Published in: May 2014Publisher: ISBN-13: 9781783284139

Author (1)

author image
Eric Richard Rochester

Eric Richard Rochester Studied medieval English literature and linguistics at UGA. Dissertated on lexicography. Now he programs in Haskell and writes. He's also a husband and parent.
Read more about Eric Richard Rochester