Reader small image

You're reading from  Learning Hunk

Product typeBook
Published inDec 2015
Reading LevelIntermediate
Publisher
ISBN-139781782174820
Edition1st Edition
Languages
Tools
Right arrow
Authors (2):
Dmitry Anoshin
Dmitry Anoshin
author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

Sergey Sheypak
Sergey Sheypak
author image
Sergey Sheypak

Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
Read more about Sergey Sheypak

View More author details
Right arrow

Chapter 6. Discovering Hunk Integration Apps

Hunk can be used not only for doing analytics on data stored in Hadoop. We will discover other options using special integration applications. These come from the https://splunkbase.splunk.com/ portal, which has hundreds of published applications. This chapter is devoted to integration schemes between the popular NoSQL document-oriented Mongo and Hunk stores.

What is Mongo?


Mongo is a popular NoSQL solution. There are many pros and cons for using Mongo. It's a great choice when you want to get simple and rather fast persistent key-value storage with a nice JavaScript interface for querying stored data. We recommend you start with Mongo if you don't really need a strict SQL schema and your data volumes are estimated in terabytes. Mongo is amazingly simple compared to the whole Hadoop ecosystem; probably it's the right option to start exploring the denormalized NoSQL world.

Installation

Mongo is already installed and ready to use. Mongo installation is not described. We use Mongo version 3.0.

You will install the special Hunk app that integrates Mongo and Hunk.

Installing the Mongo app

Visit https://splunkbase.splunk.com/app/1810/#/documentation and download the app. You should use the VM browser to download it:

  1. Click on Splunk Apps:

  2. Click on Manage Apps:

  3. Choose Install app from file:

  4. Select the downloaded app and install it.

  5. You should see the Mongo app...

Counting by shop in a single collection


We want to see the number of clicks during the day for each shop.

Use this expression to get the result:

index=clicks_2015_02_01 | stats count by shop_id

We see that the shop_id with ID 173 has the most clicks. I'll let you in on a secret: this shop has many more visitors than the others:

Counting events in all collections


We can access our daily data stored in separated collections and virtual indexes using a pattern. Let's count the events in each collection and sort by the collection size:

Use this expression:

index=clicks_2015_* | stats count by index | sort – count

We can see the trend: users visit shops during working days more often (the 1st of February is Sunday, the 5th is Thursday) so we get more clicks from them:

Next is the query related to metadata. We don't query the exact index; we use a wildcard to query several indexes at once:

index=clicks_2015_*

Note

Metadata is data that describes data. Index name is the data description. We have virtual indexes based on Mongo collections that hold click events. Each virtual index has a name. So the virtual index name is metadata.

Counting events in shops for observed days

Let's count how many events happen during observed days in each shop:

index=clicks_2015_* | stats count by index, shop_id | sort +index, -count

We sort by index...

Summary


We learned how to connect to MongoDB and create virtual indexes based on Mongo collections. We got examples of data partitioning on the Mongo side and had run queries touching several partitions represented as virtual indexes on the Hunk side.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Hunk
Published in: Dec 2015Publisher: ISBN-13: 9781782174820
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Dmitry Anoshin

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.
Read more about Dmitry Anoshin

author image
Sergey Sheypak

Sergey Sheypak started his so-called big data practice in 2010 as a Teradata PS consultant. His was leading the Teradata Master Data Management deployment in Sberbank, Russia (which has 110 billion customers). Later Sergey switched to AsterData and Hadoop practices. Sergey joined the Research and Development team at MegaFon (one of the top three telecom companies in Russia with 70 billion customers) in 2012. While leading the Hadoop team at MegaFon, Sergey built ETL processes from existing Oracle DWH to HDFS. Automated end-to-end tests and acceptance tests were introduced as a mandatory part of the Hadoop development process. Scoring geospatial analysis systems based on specific telecom data were developed and launched. Now, Sergey works as independent consultant in Sweden.
Read more about Sergey Sheypak