Reader small image

You're reading from  Learning Hunk

Product typeBook
Published inDec 2015
Reading LevelIntermediate
Publisher
ISBN-139781782174820
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Dmitry Anoshin
Dmitry Anoshin
author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Sergey Sheypak
Sergey Sheypak
author image
Sergey Sheypak

Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
Read more about Sergey Sheypak

View More author details
Right arrow

Chapter 4. Adding Speed to Reports

One of the attributes of big data analytics is its velocity. In the modern world of information technology, speed is one of the crucial factors of any successful organization because even delays measured in seconds can cost money. Big data must move at extremely high velocities no matter how much we scale or what workloads our store must handle. The data handling hoops of Hadoop or NoSQL solutions put a serious drag on performance. That's why Hunk has a powerful feature that can speed up analytics and help immediately derive business insight from a vast amount of data.

In this chapter, we will learn about the report acceleration technique of Hunk, create new virtual indexes, and compare the performance of the same search with and without acceleration.

Big data performance issues


Despite the fact that, with modern technology, we can handle any big data issue, we still to have spend some time waiting for our questions to be answered. For example, we collect data and store it in Hadoop, then we deploy Hunk and configure a data provider, create a virtual index, and start to ask business questions by creating a query and running search commands. We should wait before the MapReduce job is finished. The following diagram illustrates this situation:

Moreover, if we want to ask the question over and over again by modifying the initial query, we will lose much time and money.

It would be superb if we could just run the search and immediately get the answer, as in the following diagram:

Yes, this is possible with Hunk, because it allows us to accelerate the report and get an answer to our business question very quickly. Let's learn how to do it.

Hunk report acceleration


We can easily accelerate our searches, which is critical for business. The idea behind Hunk is easy: the same search on the same data always gives the same result. In other words, same search + same data = same results. In the case of acceleration, Hunk caches the results and returns them on demand. Moreover, it gives us the opportunity to choose a data range for a particular data summary. In other words, if the data change is due to a fresh portion of events, then the accelerated report will rebuild the data summary in order to meet the requirements of the particular data range. Technically, we just cache the map phase in HDFS. When we run the accelerated search, Hunk just returns straight to us. There are four main steps in running an accelerated search:

  1. The scheduled job builds a cache.

  2. Find cache hits.

  3. Stream the results to a search head.

  4. Reduce on the search head.

Tip

There is more information about search heads at: http://docs.splunk.com/Splexicon:Searchhead.

The...

Hunk accelerations limits


Hunk is superb, but it still has some drawbacks:

  • Hardware limitations related to memory consumption by the search head. They are solved by adjusting the memory configuration.

  • Software limitations related to the cache. Sometimes we should delete the old cache using command rm -rf <vix.splunk.home.hdfs>/cash.

  • Human factor—this is a popular issue, especially in analytics. We learnt that, for Hunk, acceleration means: same search + same data = same result. But it won't work if we change the KV extraction rules. Be careful with this.

Summary


In this chapter we learnt how to accelerate searches in Hunk. This feature is easy to use and maintain. It helps to reduce resources and improve user experience. In addition, we learnt how acceleration works and created our own accelerated report. Moreover, we compared it with a normal report and figured out how it became faster. Finally, we learnt how to manage Hunk report summaries.

We also learnt much about the base functionality of Hunk. In the next chapter, we are going to extend Hunk functionality via Hunk SDK and Rest API.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Hunk
Published in: Dec 2015Publisher: ISBN-13: 9781782174820
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

author image
Sergey Sheypak

Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
Read more about Sergey Sheypak