Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Hunk

You're reading from  Learning Hunk

Product type Book
Published in Dec 2015
Publisher
ISBN-13 9781782174820
Pages 156 pages
Edition 1st Edition
Languages
Authors (2):
Dmitry Anoshin Dmitry Anoshin
Profile icon Dmitry Anoshin
Sergey Sheypak Sergey Sheypak
Profile icon Sergey Sheypak
View More author details

The elegant solution


What if we could take Splunk and put it on top of all the data stored in Hadoop? This is what Splunk actually did. The following shows the names Hunk was derived from:

Let's discuss some goals that Hunk's inventors were thinking about when they were planning Hunk:

  • Splunk can take data from Hadoop via the Splunk Hadoop Connection App. However, it is a bad idea to copy massive amounts of data from Hadoop to Splunk, it is much better to process data in-place, because Hadoop provides both storage and computation; why not take advantage of both of them?

  • Splunk has the extremely powerful Splunk Processing Language (SPL) and it has a wide range of analytic functions. That's why it is a good idea to keep SPL in the new product.

  • Splunk has a true on-the-fly schema. Data that we store in Hadoop changes constantly. So, Hunk has to be able build a schema on-the-fly independently of the data format.

  • It is a very good idea to provide the ability to make a preview. As you know, when searching you can get incremental results. It can dramatically reduce outage. For example, we don't want to wait till a map reduce job finishes; we can look at the incremental result and, in the event of a wrong result, we can restart the search query.

  • Deployment of Hadoop is not easy, and Splunk tries to make the installation and configuration of Hunk easy for us.

Supporting SPL

Let's discuss more closely the reasons for supporting SPL. You are probably familiar with Splunk and SPL and know how powerful and flexible this language is. These are some of the advantages of SPL:

  • Naturally suitable for MapReduce

  • Reduces adoption time for people who are already familiar with Splunk

There are some challenges in integrating SPL and Hadoop. Hadoop is written in Java but all SPL code is in C++. Does SPL need to convert to Java or reuse what Splunk has provided? Finally, it was decided to reuse C++ code entirely.

Intermediate results

No one likes to look at a blank screen. A lot of people using other tools such as Pig or Hive have to wait until the query is finished and you have no idea what the query is actually retrieving for you. Maybe you made a mistake, but you didn't know about it; you will have to wait till the job is completed. It is a kind of frustration—running queries and waiting for hours.

That's why the Hunk team gave their users the ability to preview the result. You will be able to play with this in future chapters.

You have been reading a chapter from
Learning Hunk
Published in: Dec 2015 Publisher: ISBN-13: 9781782174820
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}