Home

Data

Data Analytics Using Splunk 9.x

By Dr. Nadine Shillingford

Book + AI Assistant

eBook + AI Assistant $31.99 $21.99

Print $39.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

Gain access to our AI Assistant (beta) for an exclusive selection of 500 books, available during your subscription period. Enjoy a personalized, interactive, and narrative experience to engage with the book content on a deeper level.

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Along with your Print book purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook + AI Assistant $31.99 $21.99

Print $39.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Along with your eBook purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Along with your Print book purchase, enjoy AI Assistant (beta) access in our online reader for a personalized, interactive reading experience.

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Splunk 9 improves on the existing Splunk tool to include important features such as federated search, observability, performance improvements, and dashboarding. This book helps you to make the best use of the impressive and new features to prepare a Splunk installation that can be employed in the data analysis process. Starting with an introduction to the different Splunk components, such as indexers, search heads, and forwarders, this Splunk book takes you through the step-by-step installation and configuration instructions for basic Splunk components using Amazon Web Services (AWS) instances. You’ll import the BOTS v1 dataset into a search head and begin exploring data using the Splunk Search Processing Language (SPL), covering various types of Splunk commands, lookups, and macros. After that, you’ll create tables, charts, and dashboards using Splunk’s new Dashboard Studio, and then advance to work with clustering, container management, data models, federated search, bucket merging, and more. By the end of the book, you’ll not only have learned everything about the latest features of Splunk 9 but also have a solid understanding of the performance tuning techniques in the latest version.

Publication date:: January 2023
Publisher: Packt
Pages: 336
ISBN: 9781803249414
Download code from GitHub

Introduction to Splunk and its Core Components

A few years ago, I was hired by the IT security team of a large healthcare company to work as a security engineer. At the time, the company had a homegrown Security Information and Event Management (SIEM) system and was at the initial stages of rolling in a brand new Splunk deployment. Physical servers were ordered and scheduled to be delivered and licensing paperwork was completed. A Splunk Education instructor conducted on-site core Splunk and Splunk Enterprise Security training, and we were ready to go. The thought of working with Splunk was so exciting. At the time, we were getting ready to install Splunk 6.x with one of the earlier releases of Splunk Enterprise Security. Before my arrival, the team had worked with Splunk Professional Services to estimate storage and license requirements. The next step was to focus on the data. “How do you get value from data? How do you configure Splunk Enterprise to get value from your company’s data?” We will explore how to use Splunk Enterprise to gain insight into your data in this book. In this chapter, we will introduce how the big data tool called Splunk Enterprise can be used to explore data.

In this chapter, we will cover the following topics:

Splunking big data
Exploring Splunk components
Introducing the case study – splunking the BOTS Dataset v1

Splunking big data

Splunk is a big data tool. In this book, we will introduce the idea of using Splunk to solve problems that involve large amounts of data. When I worked on the IT security team, the problem was obvious – we needed to use security data to identify malicious activity. Defining the problem you are trying to solve will determine what kind of data you collect and how you analyze that data. Not every problem requires a big data solution. Sometimes, a traditional database solution might work just as well and with less cost. So, how do you know if you’re dealing with a big data problem? There are three V’s that help define big data:

High Volume: A big data problem usually involves large volumes of data. Most times, the amount of data is greater than what can fit into traditional database solutions.
High Velocity: Traditional database solutions are usually not able to handle the speed at which modern data enters a system. Imagine trying to store and manage data from user clicks on a website such as amazon.com in a traditional database. Databases are not designed to support that many operations.
High Variety: A problem requiring analysis of big data involves a variety of data sources of varying formats. An IT security SIEM may have data being logged from multiple data sources, including firewall devices, email traces, DNS logs, and access logs. Each of these logs has a different format and correlating all the logs requires a heavy-duty system.

Here are some cases that can be solved using big data:

A retail company wants to determine how product placement in stores affects sales. For example, research may show that placing packs of Cheetos near the Point Of Sale (POS) devices increases sales for customers with small children. The target assigns a guest ID number to every customer. They correlate this ID number with the customer’s credit card number and transactions.
A rental company wants to measure the times of year that are busiest to ensure that there is a sufficient inventory of vehicles at different locations. Even so, they may realize that a certain type of vehicle is more suitable for a particular area of town.
A public school district wants to explore data pulled from multiple district schools to determine the effect of remote classes on certain demographics.
An online shop wants to use customer traffic to determine the peak time for posting ads or giving discounts.
An IT security team may use datasets containing firewall logs, DNS logs, and user access to hunt down a malicious actor on the network.

Now, let’s look at how big data is generated.

How is big data generated?

Infographics published by FinancesOnline (https://financesonline.com) indicated that humans created, captured, copied, and consumed about 74 zettabytes of data in 2021. That number is estimated to grow to 149 zettabytes in 2024.

The volume of data seen in the last few years can be attributed to increases in three types of data:

Machine data: Data generated by machines such as operating systems and application logs
Social data: Data generated by social media systems
Transactional data: Data generated by e-commerce systems

We are surrounded by digital devices, and as the capacity and capabilities of these devices increase, the amount of data generated also increases. Modern devices such as phones, laptops, watches, smart speakers, cars, sensors, POS devices, and household appliances all generate large volumes of machine data in a wide variety of formats. Many times, this data stays untouched because the data owners do not have the ability, time, or money to analyze it.

The prevalence of smartphones is possibly another contributor to the exponential increase in data. IBM’s Simon Personal Communicator, the first mainstream mobile telephone introduced in 1992, had very limited capability. It cost a whopping $899 with a service contract. Out of the box, a user could use the Simon to make calls and send and receive emails, faxes, and pages. It also contained a notebook, address book, calendar, world clock, and scheduler features. IBM sold approximately 50,000 units (https://time.com/3137005/first-smartphone-ibm-simon/).

Figure 1.1 shows the first smartphone to have the functions of a phone and a Personal Digital Assistant (PDA):

Figure 1.1 – The IBM Simon Personal Communicator released in 1992

The IBM Simon Personal Communicator is archaic compared to the average cellphone today. Apple sold 230 million iPhones in 2020 (https://www.businessofapps.com/data/apple-statistics/). iPhone users generate data when they browse the web, listen to music and podcasts, stream television and movies, conduct business transactions, and post to and browse social media feeds. This is in addition to the features that were found in the IBM Simon, such as sending and receiving emails. Each of these applications generates volumes of data. Just one application such as Facebook running on an iPhone involves a variety of data – posts, photos, videos, transactions from Facebook Marketplace, and so much more. Figure 1.2 shows data from OurWorldData.org (https://ourworldindata.org/internet) that illustrates the rapid increase in users of social media:

Figure 1.2 – Number of people using social media platforms, 2005 to 2019

In the next section, we’ll explore how we can use Splunk to process all this data.

Understanding Splunk

Now that we understand what big data is, its applications, and how it is generated, let’s talk about Splunk Enterprise and how Splunk can be used to manage big data. For simplicity, we will refer to Splunk Enterprise as Splunk.

Splunk was founded in 2003 by Michael Baum, Rob Das, and Erik Swan. Splunk was designed to search, monitor, and analyze machine-generated data. Splunk can handle high volume, high variety data being generated at high velocity. This makes it a perfect tool for dealing with big data. Splunk works on various platforms, including Windows (32- and 64-bit), Linux (64-bit), and macOS. Splunk can be installed on physical devices, virtual machines such as VirtualBox and VMWare, and virtual cloud instances such as Amazon Web Services (AWS) and Microsoft Azure. Customers can also sign up for the Splunk Cloud Platform, which supplies the user with a Splunk deployment hosted virtually. Using AWS instances and Splunk Cloud frees the user from having to deploy and maintain physical servers. There is a free version 60-day trial of Splunk that allows the user to index 500 MB of data daily. Once the user has used the product for 60 days, they can use a perpetual free license or purchase a Splunk license. The 60-day version of Splunk is a great way to get your feet wet. Traditionally, the paid version of Splunk was billed at a volume rate – that is, the more data you index, the more you pay. However, new pricing models such as workload and ingest pricing have been introduced in recent years.

In addition to the core Splunk tool, there are various free and paid applications, such as Splunk Enterprise Security, Splunk Soar, and various observability solutions such as Splunk User Behavior Analytics (UBA) and Splunk Observability Cloud.

Splunk was designed to index a variety of data. This is accomplished via pre-defined configurations that allow Splunk to recognize the format of different data sources. In addition, splunkbase.com is a constantly growing repository of 1,000+ apps and Technical Add-Ons (TAs) developed by Splunk, Splunk partners, and the Splunk community. One of the most important features of these TAs includes configurations for automatically extracting fields from raw data. Unlike traditional databases, Splunk can index large volumes of data. A dedicated Splunk Enterprise indexer can index over 20 MB of data per second or 1.7 per day. The amount of data that Splunk is capable of indexing can be increased with additional indexers. There are many use cases for which Splunk is a great solution.

Table 1.1 highlights how Splunk improved processes at The University of Arizona, Honda, and Lenovo:

Use Case	Company	Details
Security	The University of Arizona	The University of Arizona used Splunk Remote Work Insights (RWI) to help with the challenges of remote learning during the pandemic (https://www.splunk.com/en_us/customers/success-stories/university-of-arizona.html)
IT Operations	Honda	Honda used predictive analytics to increase efficiency and solve problems before they became machine failures or interruptions in their production line (https://tinyurl.com/5n7f7naz)
DevOps	Lenovo	Lenovo reduced the amount of time spent in troubleshooting by 50% and maintained 100% uptime despite a 300% increase in web traffic (https://tinyurl.com/yactu398)

Table 1.1 – Examples of success stories from Splunk customers

We will look at some of the major components of Splunk in the next section.

Exploring Splunk components

A Splunk deployment consists of three key components:

Forwarders
Indexers
Search heads

Forwarders are the data consumers of Splunk. Forwarders run on the source of the data or an intermediate device. Configurations on the forwarder device collect data and pass them on to the indexers. There are two types of forwarders – universal and heavy forwarders. Universal forwarders merely pass on the data to the indexers. Heavy forwarders, however, perform additional tasks, such as parsing and field extractions.

The indexer is the component responsible for indexing incoming data and searching indexed data. Indexers should have a good input/output capacity as they do a lot of reading and writing from disk. Multiple indexers can be combined to form clusters to increase data availability, data fidelity, data recovery, disaster recovery, and search affinity. Users access data in Splunk using search heads. They access data indexed by Splunk by running search queries using a language called Search Processing Language (SPL).

Search heads coordinate searches across the indexers. Like indexers, multiple search heads can be combined to form search head clusters. There are other roles that devices can play in a Splunk deployment. These include deployment servers, deployers, license masters, and cluster masters. The Splunk forwarders send data to the indexers. It’s a one-way transfer of data. The search head interacts with the indexers by sending search requests in the form of bundles. The indexers find the data that fits the search criteria and send the results back to the search heads. Figure 1.3 shows how the three main components interact in a Splunk deployment:

Figure 1.3 – The major Splunk components

We will discuss the different Splunk components in detail in the following sections.

Forwarders

A Splunk deployment can have the magnitude of tens of thousands of universal forwarders. As mentioned in the Exploring Splunk components section, there are two kinds of forwarders – the lightweight universal forwarders and the heavy forwarders. Both universal and heavy forwarders perform the following tasks:

Assign metadata to incoming data (source, sourcetype, and host)
Buffer and compress data
Run local scripted inputs
Break the data into 64 KB blocks

The universal forwarder is a low-footprint process that is used to forward raw or unparsed data to the indexer layer. However, if you need to do any filtering of the data before it arrives at the indexer layer, it is best to use a heavy forwarder. In a single instance of a Splunk deployment, the forwarder sits on the same device as the indexer and search head.

The universal forwarder can be installed on multiple platforms, including Windows (32- and 64-bit), Linux (64-bit, ARM, s390x, and PPCLE), macOS (Intel and M1/Intel), 64-bit FreeBSD, Solaris (Sparc and 64-bit), and AIX. Heavy forwarders run on the same platforms as Splunk Enterprise. You can install a universal forwarder using a universal forwarder install file, while heavy forwarders are installed using the regular Splunk Enterprise install file.

Both universal and heavy forwarders collect data by using inputs. A Splunk administrator configures inputs using the CLI commands, by editing a configuration file called inputs.conf, or by using Splunk Web (Settings | Add Data). A Splunk forwarder can be configured to accept the following inputs using different settings, such as the following:

Files and directories: Monitor new data coming into files and directories. Splunk also has an upload or one-shot option for uploading single files.
Network events: Monitor TCP and UDP ports, syslog feeds, and SNMP events.
Windows sources: Monitor Windows Event Logs, Perfmon, WMI, registries, and Active Directory.
Other sources: Monitor First In, First Out (FIFO) queues, changes to filesystems, and receive data from APIs through scripted inputs.

Important note

HTTP Event Collectors (HEC) inputs allow users to send data events over HTTP and HTTPS using a token-based authentication model. This does not require a Splunk forwarder.

The following code shows a sample of the inputs.conf file from the Splunk add-on for Microsoft Windows:

###### OS Logs ######
[WinEventLog://Application]
disabled = 1
###### DHCP ######
[monitor://$WINDIR\System32\DHCP]
disabled = 1
whitelist = DhcpSrvLog*
[powershell://generate_windows_update_logs]
script =."$SplunkHome\etc\apps\Splunk_TA_windows\bin\powershell\generate_windows_update_logs.ps1"
schedule = 0 */24 * * *
[script://.\bin\win_listening_ports.bat]
disabled = 1
## Run once per hour
interval = 3600
sourcetype = Script:ListeningPorts

Data from the forwarders are sent to the indexers. We will explore indexers in the next section.

Indexers

Splunk forwarders forward data to Splunk indexers. Think of the indexer as the brain of the Splunk deployment. It is the heavy input/output device that not only transforms and stores data but also searches the data based on queries passed down by the search heads. Indexers transform data into Splunk events. These events are then stored in an index, a repository for Splunk data. There are two types of indexes – events and metrics.

Splunk indexes time series data either by extracting timestamps from data or assigning a current datetime. A Splunk index is a collection of directories and subdirectories on the filesystem. These subdirectories are referred to as buckets. Data that arrives at an indexer is passed through pipelines and queues. A pipeline is a thread running on the indexer, while a queue is a memory buffer that holds data between pipelines.

We access data indexed on the indexers using search heads. We will look at search heads in the next section.

Search heads

A search head is a Splunk instance that allows users to search events indexed on the indexers (also referred to as search peers). The average user only interacts with the search head on a Splunk deployment. The user accesses the search head using a browser interface called Splunk Web. Users access data in Splunk using search queries in the Splunk search bar or view dashboards, reports, and other visualizations.

Figure 1.4 is an example of a Splunk bar graph:

Figure 1.4 – Sample Splunk bar graph

Search heads do not index data. Rather, search heads distribute searches to the indexers. The search head parses search queries and decides what accompanying files, called knowledge objects, need to be sent to the indexers. Why is this important? Some files may exist only on the search head. By combining all these files into a knowledge bundle, the search head equips the indexer with all the information (configuration files and assets) it needs to perform the search. It’s almost like the search head offloads its work to the indexers and says, “here are the files that you need to get the work done.” Sometimes, the knowledge bundle contains almost all the search head’s apps. The indexers search their indexes for the data that match the search query and send the results back to the search heads. The search heads then merge the results and present them to the user.

Search queries are written with Splunk’s SPL. Figure 1.5 shows a screenshot of an SPL query typed in the Splunk search bar:

Figure 1.5 – An SPL query

In the next section, we’ll talk about the BOTS Dataset v1, which we will use throughout this book.

Introducing the case study – splunking the  BOTS Dataset v1

In this section, we will introduce the case study that we will use throughout this book. We will explore logs in BOTS Dataset v1. Boss of the SOC (BOTS) is a blue-team capture-the-flag competition held during the annual Splunk .conf conference (https://tinyurl.com/39ru8d4b). Participants are given access to realistic network security logs to investigate real-world cybersecurity attacks. The nature of the attacks or the exact attack sequence is beyond the scope of this book. However, the dataset is a collection of data that we can use to explore some of the rich features of Splunk. BOTS Dataset v1 was compiled by Ryan Kovar, David Herrald, and James Brodsky in 2016.

The setup

A fictional company, ABC Inc., has observed unusual activity on its network. They think that the problem is centered around three Windows devices (we8105desk, de9041srv, and we1149srv). The very cyber-conscious ABC Inc. also has several network security solutions installed on their network as part of their security infrastructure:

Suricata: An open source intrusion detection system and intrusion prevention system (https://suricata.io)
Fortigate: A next-generation firewall (https://www.fortinet.com)
Internet Information Services (IIS): An extensible web server software created by Microsoft (https://www.iis.net/)
Nessus: A proprietary vulnerability scanner developed by Tenable (https://www.tenable.com/products/nessus)
Splunk Stream: A wire data capture solution built into Splunk (https://splunkbase.splunk.com/app/1809/)

The company would like you to investigate an incident that occurred in August 2016. What abnormal activity will you discover?

Our solution is to use Splunk to investigate the logs generated in August 2016. To get the full experience of installing Splunk, we will first deploy a Splunk environment to simulate the environment that generated BOTS Dataset v1. The environment will consist of the following components:

Three Splunk forwarders running on Windows devices (we8105desk, de9041srv, and we1149srv) deployed using AWS instances
A dedicated indexer (Splunk Enterprise installed on an AWS instance running Red Hat Linux)
A dedicated search head (Splunk Enterprise installed on an AWS instance running Red Hat Linux)
A deployment server (Splunk Enterprise installed on an AWS instance running Red Hat Linux)

This will give us an environment that we can use to explore the important process of setting up and configuring Splunk in Chapter 2, Setting Up the Splunk Environmentment. This case study will require access to an AWS account, so you should sign up for an account using the AWS Management Console (https://aws.amazon.com/console/) if you do not have one. This case study does not require advanced knowledge of AWS, but it may be helpful to read a tutorial on AWS Cloud such as Learn the Fundamentals (https://tinyurl.com/2p8aj7b7) or watch a YouTube video (https://www.youtube.com/watch?v=r4YIdn2eTm4). You will also need a Splunk account to download the Splunk installation file and Splunk apps (https://www.splunk.com).

BOTS Dataset v1 is available for download from the Splunk Git repository (https://github.com/splunk/botsv1). We will use the dataset containing only attack logs due to space limitations of the free license of Splunk Enterprise. The dataset comes in the form of a Splunk app, which will install on our dedicated search head. Once we have installed and configured the Splunk deployment, we will design a series of Splunk queries, dashboards, reports, and alerts as we investigate the logs.

For this case study, we are assuming that Alice has an established security infrastructure that includes firewalls and other security devices. However, monitoring those devices does not fall under the scope of the project.

Once we have deployed and configured the Splunk environment, we will install BOTS Dataset v1 as an app on the search head and continue our exploration on the search head. The dataset consists of various machine and network logs generated by the appliances mentioned in the The setup section.

Now, let’s summarize what we have learned in this chapter.

Summary

Corporations are discovering the value of analyzing big data to give insight into users behavior. This analysis has yielded results that have proven useful in various fields, including education, medicine, and computer security. In this chapter, we explored the use of Splunk to tackle big data problems. We looked at how data generation has changed over time. We looked at how Splunk has been used in organizations to solve problems. We also reviewed the key components of Splunk – forwarders, indexers, and search heads. We learned that forwarders send data to the indexers, which index the data. Users use Splunk search heads to create search queries in SPL. These search heads create knowledge bundles that they send to the indexers. The indexers search their indexes for data that match the queries. They return the results to the search heads. These components work together to give powerful results.

Finally, we introduced our BOTS dataset v1 dataset, which was generated for the Splunk BOTS competition and is a rich dataset for this exercise. We will use examples from this dataset throughout the rest of this book.

We will deploy our Splunk environment in Chapter 2, Setting Up the Splunk Environment, which will consist of a search head, an indexer, a deployment server, and three forwarders.

About the Author

Dr. Nadine Shillingford

Dr. Nadine Shillingford is a certified Splunk Architect with 10 years of security consulting experience. She has installed, managed, and configured large-scale and large-volume Splunk deployments in the healthcare, retail, insurance, and federal spaces. In addition, Dr. Shillingford has teaching experience at the undergraduate and graduate levels. Dr. Shillingford holds a Ph.D. in computer science from the University of Notre Dame. She is an artist and the mother of a teenage daughter.
Browse publications by this author

Data Analytics Using Splunk 9.x

Introduction to Splunk and its Core Components

Splunking big data

How is big data generated?

Understanding Splunk

Exploring Splunk components

Forwarders

Indexers

Search heads

Introducing the case study – splunking the BOTS Dataset v1

The setup

Summary

Introducing the case study – splunking the  BOTS Dataset v1