Reader small image

You're reading from  Advanced Splunk

Product typeBook
Published inJun 2016
Publisher
ISBN-139781785884351
Edition1st Edition
Tools
Right arrow
Author (1)
Ashish Kumar Tulsiram Yadav
Ashish Kumar Tulsiram Yadav
author image
Ashish Kumar Tulsiram Yadav

Ashish Kumar Tulsiram Yadav is a BE in computers and has around four and a half years of experience in software development, data analytics, and information security, and around four years of experience in Splunk application development and administration. He has experience of creating Splunk applications and add-ons, managing Splunk deployments, machine learning using R and Python, and analytics and visualization using various tools, such as Tableau and QlikView. He is currently working with the information security operations team, handling the Splunk Enterprise security and cyber security of the organization. He has worked as a senior software engineer at Larsen & Toubro Technology Services in the telecom consumer electronics and semicon unit providing data analytics on a wide variety of domains, such as mobile devices, telecom infrastructure, embedded devices, Internet of Things (IOT), Machine to Machine (M2M), entertainment devices, and network and storage devices. He has also worked in the area of information, network, and cyber security in his previous organization. He has experience in OMA LWM2M for device management and remote monitoring of IOT and M2M devices and is well versed in big data and the Hadoop ecosystem. He is a passionate ethical hacker, security enthusiast, and Linux expert and has knowledge of Python, R, .NET, HTML5, CSS, and the C language. He is an avid blogger and writes about ethical hacking and cyber security on his blogs in his free time. He is a gadget freak and keeps on writing reviews on various gadgets he owns. He has participated in and has been a winner of hackathons, technical paper presentations, white papers, and so on.
Read more about Ashish Kumar Tulsiram Yadav

Right arrow

Chapter 3. On-boarding Data in Splunk

This chapter will detail the most important aspect of Splunk, that is, adding data to Splunk. We will go through the newly added feature in Splunk 6.3 of JSON and REST API format of IoT event collections, HTTP Event Collector, and then, we will cover the various interfaces and options to on-board data on Splunk. We will also study how to manage event segmentation and improvise the data input process.

The following topics will be covered in this chapter:

  • Deep diving into various input methods and sources

  • Adding data to Splunk—new interfaces

  • Data processing

  • Managing event segmentation

  • Improving the data input process

Deep diving into various input methods and sources


Splunk supports numerous ways to ingest data on its server. Any data generated from a human-readable machine from various sources can be uploaded using data input methods such as files, directories, and TCP/UDP scripts which can be indexed on the Splunk Enterprise server and analytics and insights can be derived from them.

Data sources

Uploading data on Splunk is one of the most important parts of analytics and visualizations of data. If data is not properly parsed, timestamped, or broken into events, then it can be difficult to analyze and get proper insight on the data. Splunk can be used to analyze and visualize data ranging from various domains, such as IT security, networking, mobile devices, telecom infrastructure, media and entertainment devices, storage devices, and many more. The machine-generated data from different sources can be of different formats and types, and hence, it is very important to parse data in the best format to...

Adding data to Splunk – new interfaces


Splunk Enterprise introduced new interfaces to accept data that is compatible with constrained resources and lightweight devices for Internet of Things. Splunk Enterprise version 6.3 supports HTTP Event Collector and REST and JSON APIs for data collection on Splunk.

HTTP Event Collector is a very useful interface that can be used to send data without using any forwarder from your existing application to the Splunk Enterprise server. HTTP APIs are available in .NET, Java, Python, and almost all the programming languages. So, forwarding data from your existing application that is based on a specific programming language becomes a cake walk.

Let's take an example, say, you are a developer of an Android application, and you want to know what all features the user uses that are the pain areas or problem-causing screens. You also want to know the usage pattern of your application. So, in the code of your Android application, you can use REST APIs to forward...

Data processing


Data processing plays a very important role in parsing and enriching data to create insights faster and visualize data with the required analytics. Data processing basically includes event, timestamp, and host configuration.

Event configuration

Any data uploaded on Splunk is termed as an event. An event can be anything from a log activity, error logs, usage logs, to machine-generated data from devices, servers, or from any other sources. Events are used to create visualization and get insight about the source in the Splunk environment. So, it is required to process the events properly, depending on the data and source. The processed events' settings and configurations can be stored later in a source type.

Character encoding

Splunk supports many languages to support internationalization of Splunk Enterprise. The default character's set encoding on Splunk Enterprise is UTF-8, whereas it has inbuilt support for various other encoding available internationally. If the data is not...

Managing event segmentation


Splunk breaks the uploaded data into events. Events are the key elements of Splunk search that are further segmented on index time and search time. Basically, segmentation is breaking of events into smaller units classified as major and minor. Segmentation can be explained with the help of the following example.

The complete IP address is a major segment, and a major segment can be further broken down into many minor segments, as shown in the following screenshot:

It is very important to configure event segmentation, as index-time segmentation affects storage size and indexing speed, and search-time segmentation affects the search speed and ability to create searches based on the result of searches on Splunk Web; depending on the need, specific types of segmentation can be configured. Splunk even provides the facility to apply event segmentation on a specific host, source, or source type.

The following are three types of event segmentation that can be configured...

Improving the data input process


Data input is a very important process before you generate insight and visualizations from data. So, it is very important that the data is indexed, parsed, processed, and segmented properly. It may not be the case that the first approach/setting the user applies is the best, and there may be a need for a trial-and-error method to find the best settings for the data of those types for which settings are not available, by default, in Splunk.

It is always advisable to first upload small amount of data on a test index on a development server of Splunk. Once the data is available on Splunk in the correct format of events in which queries can result in the required visualizations, then the input can be forwarded to the correct index and source on the production server.

Many times, it happens that when you are testing and trying to upload the same file more than once to try different settings of event configuration, Splunk may not index the file, as the filename or...

Summary


In this chapter, we walked through various data input methods along with various data sources supported by Splunk. We also looked at HTTP Event Collector, which is a new feature added in Splunk 6.3 for data collection via REST to encourage the usage of Splunk for IoT. We studied data processing, event segmentation, and ways by which we can improve the data input process. In the next chapter, we will cover how to create analytics and provide meaningful insight over the data uploaded on Splunk.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Advanced Splunk
Published in: Jun 2016Publisher: ISBN-13: 9781785884351
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Ashish Kumar Tulsiram Yadav

Ashish Kumar Tulsiram Yadav is a BE in computers and has around four and a half years of experience in software development, data analytics, and information security, and around four years of experience in Splunk application development and administration. He has experience of creating Splunk applications and add-ons, managing Splunk deployments, machine learning using R and Python, and analytics and visualization using various tools, such as Tableau and QlikView. He is currently working with the information security operations team, handling the Splunk Enterprise security and cyber security of the organization. He has worked as a senior software engineer at Larsen & Toubro Technology Services in the telecom consumer electronics and semicon unit providing data analytics on a wide variety of domains, such as mobile devices, telecom infrastructure, embedded devices, Internet of Things (IOT), Machine to Machine (M2M), entertainment devices, and network and storage devices. He has also worked in the area of information, network, and cyber security in his previous organization. He has experience in OMA LWM2M for device management and remote monitoring of IOT and M2M devices and is well versed in big data and the Hadoop ecosystem. He is a passionate ethical hacker, security enthusiast, and Linux expert and has knowledge of Python, R, .NET, HTML5, CSS, and the C language. He is an avid blogger and writes about ethical hacking and cyber security on his blogs in his free time. He is a gadget freak and keeps on writing reviews on various gadgets he owns. He has participated in and has been a winner of hackathons, technical paper presentations, white papers, and so on.
Read more about Ashish Kumar Tulsiram Yadav