Packt+ | Advance your knowledge in tech

You're reading from Splunk Essentials - Second Edition

Product typeBook

Published inSep 2016

Publisher

ISBN-139781785889462

Edition2nd Edition

Tools

Splunk

Concepts

Operational Intelligence

Authors (3):

Betsy Page Sigman

Somesh Soni

Erickson Delgado

View More author details

Chapter 2. Bringing in Data

The process of collecting data with Splunk is enhanced, as its system makes it easy to get data from many types of computerized systems, which are responsible for much of the data produced today. Such data is frequently referred to as machine data. And since much of this is streaming data, Splunk is especially useful, as it can handle streaming data quickly and efficiently. Additionally, Splunk can collect data from many other sources.

In this chapter, you will learn about Splunk and its role in big data, as well as the most common methods of ingesting data into Splunk. The chapter will also introduce essential concepts such as forwarders, indexes, events, event types, fields, sources, and source types. It is paramount that you learn this early on as it will empower you to gather more data. In this chapter we will cover the following topics:

Splunk and big data
Splunk data sources
Splunk indexes
Inputting data into Splunk
Splunk events and fields

Splunk and big data

Splunk is useful for datasets of all types, and it allows you to use big data tools on datasets of all sizes. But with the recent focus on big data, its usefulness becomes even more apparent. Big data is a term used everywhere these days, but one that few people understand. In this part of the chapter, we will discuss aspects of big data and the terms that describe those aspects.

Streaming data

Much of the data that is large and comes quickly does not need to be kept. For instance, consider a mechanical plant; there can be many sensors that collect data on all parts of the assembly line. The significance of this data is primarily to be able to alert someone to a possible upcoming problem (through noticing a bad trend) or to a current problem (by drawing attention to a metric that has exceeded some designated level); and much of it does not need to be kept for a long period of time. Often this type of data loses its importance once its timeliness expires and its main usefulness...

Splunk data sources

Splunk was invented as a way to keep track of and analyze machine data coming from a variety of computerized systems. It is a powerful platform for doing just that. But since its invention, it has been used for a myriad of different data types, including machine data, log data (which is a type of machine data), and social media data. The various types of data that Splunk is often used for are explained in the next few sections.

Machine data

As mentioned previously, much of Splunk's data is machine data. Machine data is data created each time a machine does something, even if it is as seemingly insignificant as a tick on a clock. Each tick has information about its exact time (down to the second) and source, and each of these becomes a field associated with the event (the tick). The term "machine data" can be used in reference to a wide variety of data coming from computerized machines, from servers to operating systems to controllers for robotic assembly arms. Almost all...

Creating indexes

Indexes are where Splunk Enterprise stores all the data it has processed. It is essentially a collection of databases that is, by default, located at $SPLUNK_HOME/var/lib/splunk. Before data can be searched, it needs to be indexed, a process we describe here.

There are two ways to create an index, through the Splunk portal or by creating an indexes.conf file. You will be shown here how to create an index using the Splunk portal, but you should realize that when you do that, it simply generates an indexes.conf file.

You will be creating an index called wineventlogs to store Windows Event Logs. To do this, take the following steps:

In the Splunk navigation bar, go to Settings.
In the Data section, click on Indexes, which will take you to the Indexes page.
Click on New Index.
Now fill out the information for this new index as seen in the following screenshot, carefully going through steps 1 to 4.
Be sure to Save when you are done.

You will now see the new index in the list as shown...

Buckets

You may have noticed that there is a certain pattern in this configuration file, in which folders are broken into three locations: coldPath, homePath, and thawedPath. This is a very important concept in Splunk. An index contains compressed raw data and associated index files that can be spread out into age-designated directories. Each piece of this index directory is called a bucket.

A bucket moves through several stages as it ages. In general, as your data gets older (think colder) in the system, it is pushed to the next bucket. And, as you can see in the following list, the thawed bucket contains data that has been resurrected from an archive. Here is a breakdown of the buckets in relation to each other:

hot: This is newly indexed data and open for writing (hotPath)
warm: This is data rolled from hot with no active writing (warmPath)
cold: This is data rolled from warm (coldPath)
frozen: This is data rolled from cold and deleted by default but it can be archived (frozenPath)
thawed:...

Data inputs

As you may have noticed, any configuration you make in the Splunk portal corresponds to a *.conf file written to the disk. The same goes for the creation of data inputs; it creates a file called inputs.conf. Now that you have an index to store your machine's Windows Event Logs, let us go ahead and create a data input for it, with the following steps:

Go to the Splunk home page.
Click on your Destinations app. Make sure you are in the Destinations app before you execute the next steps.
In the Splunk navigation bar, select Settings.
Under the Data section, click on Data inputs.
On the Data inputs page, click on the Local event log collection type as shown in the following screenshot:
In the next page select the Application and System log types.
Change the index to wineventlog. Compare your selections with the following screenshot:
Click Save.
On the next screen, confirm that you have successfully created the data input, as shown in the following screenshot:

Before we proceed further,...

Splunk events and fields

All throughout this chapter you have been running Splunk search queries that have returned data. It is important to understand what events and fields are before we go any further, for an understanding of these is essential to comprehending what happens when you run Splunk on the data.

In Splunk, a single piece of data is known as an event and is like a record, such as a log file or other type of input data. An event can have many different attributes or fields or just a few attributes or fields. When you run a successful search query, you will see that it brings up events from the data source. If you are looking at live streaming data, events can come in very quickly through Splunk.

Every event is given a number of default fields. For a complete listing, go to http://docs.splunk.com/Documentation/Splunk/6.3.2/Data/Aboutdefaultfields. We will now go through some of these default fields.

Timestamp: A timestamp is applied at the exact time the event is indexed in Splunk...

Extracting new fields

Most raw data that you will encounter will have some form of structure. Just like a CSV (comma-separated value file) or a web log file, it is assumed that each entry in the log corresponds to some sort of format. Splunk 6.3+ makes custom field extraction very easy, especially for delimited files. Let's take the case of our Eventgen data and look at the following example. If you look closely, the _raw data is actually delimited by white spaces:

2016-01-21 21:19:20:013632 130.253.37.97 GET /home - 80 - 10.2.1.33 "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J3 Safari/6533.18.5" 200 0 0 186 3804

Since there is a distinct separation of fields in this data, we can use Splunk's out-of-the-box field extraction tool to automatically classify these fields. In your Destinations app Search page, run the following search command:

 SPL> index=main sourcetype=access_custom

This sourcetype access_custom...

Summary

In this chapter, we learned about some terms that need to be understood about big data, such as what the terms streaming data, data latency, and data sparseness mean. We also covered the types of data that can be brought into Splunk. Then we studied what an index is, made an index for our data, and put in data from our Destinations app. We talked about what fields and events are. And finally, we saw how to extract fields from events and name them so that they can be more useful to us. In the chapters to come, we'll learn more about these important features of Splunk.

The rest of the chapter is locked

You have been reading a chapter from

Splunk Essentials - Second Edition

Published in: Sep 2016Publisher: ISBN-13: 9781785889462

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Betsy Page Sigman

Betsy Page Sigman is a distinguished professor at the McDonough School of Business at Georgetown University in Washington, D.C. She has taught courses in statistics, project management, databases, and electronic commerce for the last 16 years, and has been recognized with awards for teaching and service. She has also worked at George Mason University in the past. Her recent publications include a Harvard Business case study and a Harvard Business review article. Additionally, she is a frequent media commentator on technological issues and big data.
Read more about Betsy Page Sigman

Somesh Soni

Somesh Soni is a Splunk Consultant with over 11 years of IT experience. He has bachelor degree in Computer Science (Hons.) and has been a interested in exploring and learning new technologies throughout his whole life. He has extensive experience in Consulting, Architecture, Administration and Development in Splunk. He's proficient in various programming languages and tools including C#.NET/VB.NET, SSIS, and SQL Server etc. Somesh is currently working as a Splunk Master with Randstad Technologies. His activities are focused on Consulting, Implementation, Admin, Architecture and support related activities for Splunk. He started his career with the one of the Top 3 Indian IT giant He has executed projects for major fortune 500 companies like Coca-Cola, Wells Fargo, Microsoft, Capital Group etc. He has performed in various capacities of Technical Architect, Technical Lead, Onsite Coordinator, Technology Analyst etc. Somesh has been a great contributor in the Splunk Community work and has consistently been on the top of the list. He is a member of Splunk Trust 2015-16 and overall one of the topmost contributor to Splunk Answers community. Acknowledgement: I would like to thank my family and colleagues who have always encouraged and supported me to follow my dreams, my friends who put up with all my crazy antics while I went on a Splunk exploratory Journey and listened with patience on all the tips and tricks of Splunk which I shared with them. Last but not the least I would like to express my gratitude to the entire team of Packt Publishing Ltd for giving me this opportunity.
Read more about Somesh Soni

Erickson Delgado

Erickson Delgado is an enterprise architect who loves to mine and analyze data. He began using Splunk in version 4.0 and has pioneered the use of the application in his current work. In the earlier parts of his career, he worked with start-up companies in the Philippines to help build their open source infrastructure. He then worked in the cruise industry as a shipboard IT manager, and he loved it. From there, he was recruited to work at the company's headquarters as a software engineer.
Read more about Erickson Delgado

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages