Reader small image

You're reading from  Splunk Essentials - Second Edition

Product typeBook
Published inSep 2016
Publisher
ISBN-139781785889462
Edition2nd Edition
Tools
Right arrow
Authors (3):
Betsy Page Sigman
Betsy Page Sigman
author image
Betsy Page Sigman

Betsy Page Sigman is a distinguished professor at the McDonough School of Business at Georgetown University in Washington, D.C. She has taught courses in statistics, project management, databases, and electronic commerce for the last 16 years, and has been recognized with awards for teaching and service. She has also worked at George Mason University in the past. Her recent publications include a Harvard Business case study and a Harvard Business review article. Additionally, she is a frequent media commentator on technological issues and big data.
Read more about Betsy Page Sigman

Somesh Soni
Somesh Soni
author image
Somesh Soni

Somesh Soni is a Splunk Consultant with over 11 years of IT experience. He has bachelor degree in Computer Science (Hons.) and has been a interested in exploring and learning new technologies throughout his whole life. He has extensive experience in Consulting, Architecture, Administration and Development in Splunk. He's proficient in various programming languages and tools including C#.NET/VB.NET, SSIS, and SQL Server etc. Somesh is currently working as a Splunk Master with Randstad Technologies. His activities are focused on Consulting, Implementation, Admin, Architecture and support related activities for Splunk. He started his career with the one of the Top 3 Indian IT giant He has executed projects for major fortune 500 companies like Coca-Cola, Wells Fargo, Microsoft, Capital Group etc. He has performed in various capacities of Technical Architect, Technical Lead, Onsite Coordinator, Technology Analyst etc. Somesh has been a great contributor in the Splunk Community work and has consistently been on the top of the list. He is a member of Splunk Trust 2015-16 and overall one of the topmost contributor to Splunk Answers community. Acknowledgement: I would like to thank my family and colleagues who have always encouraged and supported me to follow my dreams, my friends who put up with all my crazy antics while I went on a Splunk exploratory Journey and listened with patience on all the tips and tricks of Splunk which I shared with them. Last but not the least I would like to express my gratitude to the entire team of Packt Publishing Ltd for giving me this opportunity.
Read more about Somesh Soni

Erickson Delgado
Erickson Delgado
author image
Erickson Delgado

Erickson Delgado is an enterprise architect who loves to mine and analyze data. He began using Splunk in version 4.0 and has pioneered the use of the application in his current work. In the earlier parts of his career, he worked with start-up companies in the Philippines to help build their open source infrastructure. He then worked in the cruise industry as a shipboard IT manager, and he loved it. From there, he was recruited to work at the company's headquarters as a software engineer.
Read more about Erickson Delgado

View More author details
Right arrow

Chapter 2.  Bringing in Data

The process of collecting data with Splunk is enhanced, as its system makes it easy to get data from many types of computerized systems, which are responsible for much of the data produced today. Such data is frequently referred to as machine data. And since much of this is streaming data, Splunk is especially useful, as it can handle streaming data quickly and efficiently. Additionally, Splunk can collect data from many other sources.

In this chapter, you will learn about Splunk and its role in big data, as well as the most common methods of ingesting data into Splunk. The chapter will also introduce essential concepts such as forwarders, indexes, events, event types, fields, sources, and source types. It is paramount that you learn this early on as it will empower you to gather more data. In this chapter we will cover the following topics:

  • Splunk and big data

  • Splunk data sources

  • Splunk indexes

  • Inputting data into Splunk

  • Splunk events and fields

Splunk and big data


Splunk is useful for datasets of all types, and it allows you to use big data tools on datasets of all sizes. But with the recent focus on big data, its usefulness becomes even more apparent. Big data is a term used everywhere these days, but one that few people understand. In this part of the chapter, we will discuss aspects of big data and the terms that describe those aspects.

Streaming data

Much of the data that is large and comes quickly does not need to be kept. For instance, consider a mechanical plant; there can be many sensors that collect data on all parts of the assembly line. The significance of this data is primarily to be able to alert someone to a possible upcoming problem (through noticing a bad trend) or to a current problem (by drawing attention to a metric that has exceeded some designated level); and much of it does not need to be kept for a long period of time. Often this type of data loses its importance once its timeliness expires and its main usefulness...

Splunk data sources


Splunk was invented as a way to keep track of and analyze machine data coming from a variety of computerized systems. It is a powerful platform for doing just that. But since its invention, it has been used for a myriad of different data types, including machine data, log data (which is a type of machine data), and social media data. The various types of data that Splunk is often used for are explained in the next few sections.

Machine data

As mentioned previously, much of Splunk's data is machine data. Machine data is data created each time a machine does something, even if it is as seemingly insignificant as a tick on a clock. Each tick has information about its exact time (down to the second) and source, and each of these becomes a field associated with the event (the tick). The term "machine data" can be used in reference to a wide variety of data coming from computerized machines, from servers to operating systems to controllers for robotic assembly arms. Almost all...

Creating indexes


Indexes are where Splunk Enterprise stores all the data it has processed. It is essentially a collection of databases that is, by default, located at $SPLUNK_HOME/var/lib/splunk. Before data can be searched, it needs to be indexed, a process we describe here.

There are two ways to create an index, through the Splunk portal or by creating an indexes.conf file. You will be shown here how to create an index using the Splunk portal, but you should realize that when you do that, it simply generates an indexes.conf file.

You will be creating an index called wineventlogs to store Windows Event Logs. To do this, take the following steps:

  1. In the Splunk navigation bar, go to Settings.

  2. In the Data section, click on Indexes, which will take you to the Indexes page.

  3. Click on New Index.

  4. Now fill out the information for this new index as seen in the following screenshot, carefully going through steps 1 to 4.

  5. Be sure to Save when you are done.

You will now see the new index in the list as shown...

Buckets


You may have noticed that there is a certain pattern in this configuration file, in which folders are broken into three locations: coldPath, homePath, and thawedPath. This is a very important concept in Splunk. An index contains compressed raw data and associated index files that can be spread out into age-designated directories. Each piece of this index directory is called a bucket.

A bucket moves through several stages as it ages. In general, as your data gets older (think colder) in the system, it is pushed to the next bucket. And, as you can see in the following list, the thawed bucket contains data that has been resurrected from an archive. Here is a breakdown of the buckets in relation to each other:

  • hot: This is newly indexed data and open for writing (hotPath)

  • warm: This is data rolled from hot with no active writing (warmPath)

  • cold: This is data rolled from warm (coldPath)

  • frozen: This is data rolled from cold and deleted by default but it can be archived (frozenPath)

  • thawed:...

Data inputs


As you may have noticed, any configuration you make in the Splunk portal corresponds to a *.conf file written to the disk. The same goes for the creation of data inputs; it creates a file called inputs.conf. Now that you have an index to store your machine's Windows Event Logs, let us go ahead and create a data input for it, with the following steps:

  1. Go to the Splunk home page.

  2. Click on your Destinations app. Make sure you are in the Destinations app before you execute the next steps.

  3. In the Splunk navigation bar, select Settings.

  4. Under the Data section, click on Data inputs.

  5. On the Data inputs page, click on the Local event log collection type as shown in the following screenshot:

  6. In the next page select the Application and System log types.

  7. Change the index to wineventlog. Compare your selections with the following screenshot:

  8. Click Save.

  9. On the next screen, confirm that you have successfully created the data input, as shown in the following screenshot:

Before we proceed further,...

Splunk events and fields


All throughout this chapter you have been running Splunk search queries that have returned data. It is important to understand what events and fields are before we go any further, for an understanding of these is essential to comprehending what happens when you run Splunk on the data.

In Splunk, a single piece of data is known as an event and is like a record, such as a log file or other type of input data. An event can have many different attributes or fields or just a few attributes or fields. When you run a successful search query, you will see that it brings up events from the data source. If you are looking at live streaming data, events can come in very quickly through Splunk.

Every event is given a number of default fields. For a complete listing, go to http://docs.splunk.com/Documentation/Splunk/6.3.2/Data/Aboutdefaultfields. We will now go through some of these default fields.

  • Timestamp: A timestamp is applied at the exact time the event is indexed in Splunk...

Extracting new fields


Most raw data that you will encounter will have some form of structure. Just like a CSV (comma-separated value file) or a web log file, it is assumed that each entry in the log corresponds to some sort of format. Splunk 6.3+ makes custom field extraction very easy, especially for delimited files. Let's take the case of our Eventgen data and look at the following example. If you look closely, the _raw data is actually delimited by white spaces:

2016-01-21 21:19:20:013632 130.253.37.97 GET /home - 80 - 10.2.1.33 "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J3 Safari/6533.18.5" 200 0 0 186 3804 

Since there is a distinct separation of fields in this data, we can use Splunk's out-of-the-box field extraction tool to automatically classify these fields. In your Destinations app Search page, run the following search command:

 SPL> index=main sourcetype=access_custom

This sourcetype access_custom...

Summary


In this chapter, we learned about some terms that need to be understood about big data, such as what the terms streaming data, data latency, and data sparseness mean. We also covered the types of data that can be brought into Splunk. Then we studied what an index is, made an index for our data, and put in data from our Destinations app. We talked about what fields and events are. And finally, we saw how to extract fields from events and name them so that they can be more useful to us. In the chapters to come, we'll learn more about these important features of Splunk.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Splunk Essentials - Second Edition
Published in: Sep 2016Publisher: ISBN-13: 9781785889462
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Betsy Page Sigman

Betsy Page Sigman is a distinguished professor at the McDonough School of Business at Georgetown University in Washington, D.C. She has taught courses in statistics, project management, databases, and electronic commerce for the last 16 years, and has been recognized with awards for teaching and service. She has also worked at George Mason University in the past. Her recent publications include a Harvard Business case study and a Harvard Business review article. Additionally, she is a frequent media commentator on technological issues and big data.
Read more about Betsy Page Sigman

author image
Somesh Soni

Somesh Soni is a Splunk Consultant with over 11 years of IT experience. He has bachelor degree in Computer Science (Hons.) and has been a interested in exploring and learning new technologies throughout his whole life. He has extensive experience in Consulting, Architecture, Administration and Development in Splunk. He's proficient in various programming languages and tools including C#.NET/VB.NET, SSIS, and SQL Server etc. Somesh is currently working as a Splunk Master with Randstad Technologies. His activities are focused on Consulting, Implementation, Admin, Architecture and support related activities for Splunk. He started his career with the one of the Top 3 Indian IT giant He has executed projects for major fortune 500 companies like Coca-Cola, Wells Fargo, Microsoft, Capital Group etc. He has performed in various capacities of Technical Architect, Technical Lead, Onsite Coordinator, Technology Analyst etc. Somesh has been a great contributor in the Splunk Community work and has consistently been on the top of the list. He is a member of Splunk Trust 2015-16 and overall one of the topmost contributor to Splunk Answers community. Acknowledgement: I would like to thank my family and colleagues who have always encouraged and supported me to follow my dreams, my friends who put up with all my crazy antics while I went on a Splunk exploratory Journey and listened with patience on all the tips and tricks of Splunk which I shared with them. Last but not the least I would like to express my gratitude to the entire team of Packt Publishing Ltd for giving me this opportunity.
Read more about Somesh Soni

author image
Erickson Delgado

Erickson Delgado is an enterprise architect who loves to mine and analyze data. He began using Splunk in version 4.0 and has pioneered the use of the application in his current work. In the earlier parts of his career, he worked with start-up companies in the Philippines to help build their open source infrastructure. He then worked in the cruise industry as a shipboard IT manager, and he loved it. From there, he was recruited to work at the company's headquarters as a software engineer.
Read more about Erickson Delgado