Reader small image

You're reading from  Advanced Splunk

Product typeBook
Published inJun 2016
Publisher
ISBN-139781785884351
Edition1st Edition
Tools
Right arrow
Author (1)
Ashish Kumar Tulsiram Yadav
Ashish Kumar Tulsiram Yadav
author image
Ashish Kumar Tulsiram Yadav

Ashish Kumar Tulsiram Yadav is a BE in computers and has around four and a half years of experience in software development, data analytics, and information security, and around four years of experience in Splunk application development and administration. He has experience of creating Splunk applications and add-ons, managing Splunk deployments, machine learning using R and Python, and analytics and visualization using various tools, such as Tableau and QlikView. He is currently working with the information security operations team, handling the Splunk Enterprise security and cyber security of the organization. He has worked as a senior software engineer at Larsen & Toubro Technology Services in the telecom consumer electronics and semicon unit providing data analytics on a wide variety of domains, such as mobile devices, telecom infrastructure, embedded devices, Internet of Things (IOT), Machine to Machine (M2M), entertainment devices, and network and storage devices. He has also worked in the area of information, network, and cyber security in his previous organization. He has experience in OMA LWM2M for device management and remote monitoring of IOT and M2M devices and is well versed in big data and the Hadoop ecosystem. He is a passionate ethical hacker, security enthusiast, and Linux expert and has knowledge of Python, R, .NET, HTML5, CSS, and the C language. He is an avid blogger and writes about ethical hacking and cyber security on his blogs in his free time. He is a gadget freak and keeps on writing reviews on various gadgets he owns. He has participated in and has been a winner of hackathons, technical paper presentations, white papers, and so on.
Read more about Ashish Kumar Tulsiram Yadav

Right arrow

Chapter 10. Tweaking Splunk

We have already learned some important features of Splunk, creating analytics and visualizations, along with various dashboard customization techniques. Now we will learn about various ways we can tweak Splunk so that we can get the most out of it and that to efficiently. In this chapter we will learn various management and customization techniques for using Splunk in the best possible way.

In this chapter, we will cover the following topics in detail, along with example and uses.

  • Index replication

  • Indexer auto-discovery

  • Sourcetype manager

  • Field extractor

  • Search history

  • Event pattern detection

  • Data acceleration

  • Splunk buckets

  • Search optimizations

  • Splunk health

Index replication


Splunk supports a distributed environment. Now, when it is said that Splunk supports a distributed environment, what does this actually mean? What is the use of Splunk being deployed in a distributed environment?

Splunk can be deployed in a standalone environment and in a distributed environment as well. Let us understand what a standalone environment, a distributed environment, and index replication are.

Standalone environment

In a standalone environment, various components of Splunk, like the indexer or search head are available on a single machine, which handles everything from on-boarding data on Splunk, indexing the data, analytics and visualization, reporting, and so on. Generally, standalone is used for development and testing purposes; it is not at all recommended for deployment scenarios.

Distributed environment

In a distributed environment, various components of Splunk (the indexer, search head, and others) are deployed in clusters. Deploying in a clustered environment...

Indexer auto-discovery


Splunk 6.3 introduced a very usable and important feature for distributed environments. This feature simplifies forwarder management, which automatically detects new peer nodes in a cluster, and thus, load balancing is handled by itself.

Example

Let us understand the use of indexer auto-discovery using the following cluster example image. The following image shows forwarders sending data to peer nodes. The peer node list and other relevant messages are being communicated from the cluster master to the forwarders:

The following are the uses/advantages of indexer auto-discovery:

  • There is no need for configuration on forwarders specifying the number of peer nodes in the given cluster. The forwarder is automatically informed with the updated list of peer nodes by the master. Thus, when a peer node fails or new peer nodes are added in a cluster, there is no configuration requirement on forwarders.

  • There is no need to know the number of peer nodes when adding or removing a forwarder...

Sourcetype manager


Sourcetype manager is another very useful provision added in Splunk 6.3, which can be used to manage the sourcetype for on-boarding the data on Splunk. It can be used to manage (create, modify, and delete) sourcetype configurations independent of getting data in and searching within the sourcetype picker. We have already learned in the Chapter 2, Developing Application on Splunk about how to assign and configure sourcetype while uploading the data on Splunk.

Sourcetype manager enlists all the sourcetype configured in the Splunk instance along with the inbuilt default sourcetypes. The sourcetype manager can be accessed by navigating in the Splunk Web console to Settings | Data | Sourcetype.

Now let us learn what can be done from the sourcetype manager:

  • Create a sourcetype: In previous versions of Splunk when sourcetype manager was not present to create a sourcetype, first we needed to add data to Splunk or else the inputs.conf file needed to be configured manually.

    Using the...

Field extractor


In Splunk, for any kind of analytics and visualizations, fields play a very important role. Splunk automatically tries to extract and make them available for use for known and properly configured data sources. Since there are a wide variety of sources for data, there could be many fields which do not get automatically extracted. Splunk also provides the Splunk command rex, which can be used to extract the fields, but this command requires a good understanding of regular expressions to efficiently extract fields from the data. So Splunk provides a very easy to use field extractor to extract fields using an interactive field extractor tool via the Splunk Web interface.

Accessing field extractor

Let us learn to access the field extractor to extract fields from the data, which in turn can be used to create analytics and visualizations in Splunk.

The field extractor can be accessed via the following options:

  • Splunk Web Console | Settings | Fields | Field Extractions | Open Field...

Search history


Search history is another useful feature introduced in Splunk 6.3 which can be used to view and interact with history of the search command. This feature can be used to get the complete list of search queries executed on Splunk over time.

The search history feature can be accessed via the Splunk Web console by clicking on "Search & Reporting" App | Search. It takes the user to the search summary dashboard with the option to run search queries.

The following image shows the search summary dashboard from where the search history can be accessed:

The Search History option enables the following information on the screen:

  • The exhaustive list of search queries run on the Splunk instance along with the time of the last run

  • The Action option to directly copy the respective search query in the Search bar so as to run the search query right away

  • The Filter option to choose the list of queries shown on the basis of time defined in the time range picker or some specific word/string which...

Event pattern detection


Event pattern detection is a feature in Splunk which helps in increasing the speed of analytics by automatically grouping similar events to discover meaningful insight in the given machine data. It helps users to quickly discover relationship, patterns, and anomalies in the given data, to build meaningful analytics on top of it.

In simpler terms, event pattern detection not only helps to find out the common patterns in the data but also highlights those events which are rare and could be anomalies. The event pattern detection feature of Splunk can be helpful in the following ways:

  • Auto discover meaningful patterns in the given dataset

  • Search data without the need to know what to search for

  • Detection of anomalies, rare events, and so on

The following image shows a sample of data events when queried on Splunk. The sample data has mostly numbers in it, and if not much domain information is available about the data it would be difficult to get insight from it:

Now we will see...

Data acceleration


Splunk is a big data tool and hence, it is obvious that the reports and dashboards created on Splunk will have large datasets/events. So data acceleration is very much necessary to get real-time analytics and visualizations.

Need for data acceleration

Let's understand the need for data acceleration in reports and dashboards with the help of the following image. The following image is an example screenshot of a dashboard with many panels and thus, many searches. When there are many searches running concurrently in a report/dashboard then it takes time to show the analytics or visualization on the dashboard. Thus for real-time analytics, data acceleration will be required:

Splunk is a very powerful big data tool, so why does it takes time to populate the results on the dashboard/report? The reason behind why some searches complete quickly and some take too much time can be explained with the help of the following facts:

  • Splunk is very fast at finding a keyword or set of keywords...

Splunk buckets


The Splunk Enterprise stores its index's data into buckets organized by age. Basically, it is a directory containing events of a specific period. There can be several buckets at the same time in the various stages of the bucket life cycle.

A bucket moves from one stage to another depending upon its age, size, and so on, as per the defined conditions. The Splunk bucket stages are Hot, Warm, Cold, Frozen, and Thawed. Splunk buckets play a very important role in the performance of search results and hence they should be properly configured as per the requirements.

The following image shows the life cycle of Splunk buckets:

Let us understand the Splunk bucket life cycle, taking the above image as a reference. The Indexes.conf file can be modified to configure the aging and the conditions to move from one stage to another:

  • Hot bucket: Whenever any new data gets indexed on Splunk Enterprise, it is stored in a hot bucket. There can be more than one hot bucket for each index. The data...

Search optimizations


We have already learned data acceleration and the bucket life cycle in the preceding section. Let us now see how we can make the best use of search queries for better and more efficient results. Splunk search queries can be optimized depending upon the requirements and conditions. Generally, the search queries which need to be optimized are those which are used most frequently. Let us learn a few tricks to optimize the search for faster results.

Time range

We have already learned about Splunk buckets, which organize events based on time. The shorter the time span, the less buckets will be accessed to get the information of the search result. It has always been a common practice to use All time in the time range picker for any search, irrespective of whether the result is required for all of the duration or some limited duration.

So one of the best search optimization methods is to use the time range picker to specify the time domain on which the search should run to get...

Splunk health


It is very important to keep track of Splunk's health status. Splunk Enterprise keeps logging various important information which can be helpful in the various stages of Splunk usage. Splunk's log and Splunk Enterprise can be used together to keep track of Splunk's health and various other important measures related to Splunk Enterprise. The Splunk logs can be useful in troubleshooting, system maintenance and tuning, and so on.

The following activities can be tracked by using Splunk's inbuilt logging mechanism:

  • Resource utilization and Splunk license usage

  • Data indexing, searching, analytics-related information, warnings, and errors

  • User activities and application usage information

  • Splunk component performance-related information

Splunk logging ranges from a wide variety of sources like audit log, kvstore log, conf log, crash log, license log, splunkd log, and many more.

splunkd log

Of all the sources for Splunk logs, one of the most important and useful logs is splunkd.log. This log...

Summary


In this chapter we have read about various features of Splunk which can be used to utilize Splunk for better, more efficient, and faster analytics. We have learned various tools like sourcetype manager, field extractor, event pattern detection, and so on. We also had a look at data acceleration, efficient search queries, and various other important tweaks of Splunk Enterprise. In the next chapter we will learn about enterprise integration of Splunk with various other analytics and visualization tools.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Splunk
Published in: Jun 2016Publisher: ISBN-13: 9781785884351
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Kumar Tulsiram Yadav

Ashish Kumar Tulsiram Yadav is a BE in computers and has around four and a half years of experience in software development, data analytics, and information security, and around four years of experience in Splunk application development and administration. He has experience of creating Splunk applications and add-ons, managing Splunk deployments, machine learning using R and Python, and analytics and visualization using various tools, such as Tableau and QlikView. He is currently working with the information security operations team, handling the Splunk Enterprise security and cyber security of the organization. He has worked as a senior software engineer at Larsen & Toubro Technology Services in the telecom consumer electronics and semicon unit providing data analytics on a wide variety of domains, such as mobile devices, telecom infrastructure, embedded devices, Internet of Things (IOT), Machine to Machine (M2M), entertainment devices, and network and storage devices. He has also worked in the area of information, network, and cyber security in his previous organization. He has experience in OMA LWM2M for device management and remote monitoring of IOT and M2M devices and is well versed in big data and the Hadoop ecosystem. He is a passionate ethical hacker, security enthusiast, and Linux expert and has knowledge of Python, R, .NET, HTML5, CSS, and the C language. He is an avid blogger and writes about ethical hacking and cyber security on his blogs in his free time. He is a gadget freak and keeps on writing reviews on various gadgets he owns. He has participated in and has been a winner of hackathons, technical paper presentations, white papers, and so on.
Read more about Ashish Kumar Tulsiram Yadav