Let’s get started with Splunk Enterprise. By the end of this chapter, you should understand what Splunk Enterprise is and its rich set of features and be able to list the Splunk components that work together to get business insights out of data. You will also learn about the installation of standalone Splunk Enterprise in a Windows environment, along with advanced Splunk Validated Architectures (SVAs) covering all the Splunk components. Throughout the book, you’ll often find us using the terms Splunk Enterprise and Splunk interchangeably. They both refer to the product itself. You will rarely find references to Splunk Inc., which refers to the company that developed and offers the Splunk Enterprise product.
This chapter covers the following topics to get you started:
The Splunk Enterprise Admin exam is the prerequisite to attain the Splunk Enterprise Certified Admin certification. The exam contains 56 questions that you need to answer in 57 minutes, and you will get an extra 3 minutes to review your answers, bringing the duration of the exam to a total of 60 minutes. Successful candidates will be issued a digital certificate along with Splunk digital badges. In order to be eligible to sit the Splunk Enterprise Admin certification exam, you should have already passed the Splunk Core Certified Power User exam and obtained that certification.
The exam tests your knowledge of Splunk Enterprise system administration and Splunk data administration concepts. Splunk Education and/or Splunk Authorized Learning Partners (ALPs) offer administration courses through instructor-led training along with material, labs, and sample questions. Splunk recommends going through these training sessions. They are paid courses. However, do note that taking part in this training is optional for the admin exam. This book covers both system and data administration concepts along with self-assessment questions on each topic, for you to get ready for the exam.
A Splunk Enterprise system administrator is someone who looks after the Splunk Enterprise platform on a day-to-day basis. This exam tests your knowledge of user management, installation, the configuration of Splunk Enterprise, forwarder management, license management, search head (SH) management, index creation, indexer management, and monitoring the whole Splunk platform using the Monitoring Console (MC).
Splunk Enterprise data administrator responsibilities include getting the data into Splunk from various sources, such as data inputs leveraging the universal forwarder (UF), network inputs, scripted inputs, and Technology Add-ons (TAs). The data admin ensures the data is correctly broken down into individual events, applying timestamps and setting sourcetype
and other metadata fields. In addition, they can create knowledge objects required to support other Splunk features for data insights and data retrieval using the Splunk Search Processing Language (SPL).
The following section explains the weightage of exam questions per topic that are asked.
A list of topics in scope and their weightage has been provided by Splunk in its test blueprint for the admin exam. The topics might be slightly updated by Splunk in the future. At the time of writing this book, these are current and valid for the Splunk Enterprise 9.x Certified Admin exam.
Refer to the latest blueprint prior to booking your exam and find out whether any new concepts have been included. You could try accessing this blueprint using this link: https://tinyurl.com/36x7apnr. Otherwise, if the web link changes, look for the blueprint PDF deep link in the Splunk Certification Exams Study Guide (https://www.splunk.com/pdfs/training/splunk-certification-exams-study-guide.pdf) on the Splunk Enterprise Certified Admin page.
Don’t be alarmed by the length of the topic list; the topics are covered in thorough detail in the rest of this book, to get you prepared with confidence.
Now that you have an idea of the topics and their weightage, let’s understand the exam’s test pattern.
The exam contains 56 questions to be answered in 57 minutes. Each question has at most five options. Some of the questions will have more than one answer, under the Select all that apply category. Others are either true or false or single-answer.
The following are sample questions of the different categories with answers.
Q. Splunk Enterprise is only able to store and retrieve text-based data.
Here, the answer is option A.
Q. A UF is sending data to index=linux_os
, which does not exist on the indexer layer. What happens to the data in this scenario?
linux_os
index is automatically created since it did not exist beforelostandfound
indexHere, the answer is option A.
Q. A Splunk admin user has, by default, which capabilities? (Select all that apply)
Here, the answers are options B, C, and D.
Let’s get started with learning about Splunk Enterprise in the following section.
Splunk Enterprise is software that collects data from heterogeneous sources and provides interfaces to analyze machine data. Getting to know Splunk Enterprise helps you to choose the right feature for the needs or requirements that will come through while you are working on real-time projects. As an administrator, it is highly expected that you are well aware of these capabilities of Splunk. Key features of this product are explained as follows:
Let’s look at the newly introduced features in version 9.x of Splunk in the following section.
Splunk Enterprise has evolved over the years and currently stands at version 9.0.3 at the time of writing this book. As it gets more advanced, some of its features become deprecated and new features are added or enhanced. Older versions often reach end of life (EOL), which means Splunk won’t offer support or fix bugs; instead, it advises upgrading to the latest version.
This section covers the important features of Splunk version 8.x that have been carried forward to the latest 9.0 product version, along with new features introduced in the 9.x version. These features are good to be aware of but are not tested in the exam. Feel free to skip this section if you want to:
_configtracker
, has been introduced to track config files and their stanzas, including key-value pairs. This is a cool new feature that helps to troubleshoot config issues and find who, when, and what changed from an audit perspective.https://docs.splunk.com/Documentation/Splunk/8.2.10/ReleaseNotes/MeetSplunk
Similarly, a full list of 9.0.X features is available here:
https://docs.splunk.com/Documentation/Splunk/9.0.3/ReleaseNotes/MeetSplunk
In the next section, we will learn about Splunk Enterprise components.
Splunk Enterprise has multiple integral components that work together and are primarily divided based on their functions. The list is very comprehensive. A standalone Splunk deployment doesn’t require all the components; however, a distributed and highly available deployment requires almost all of them.
A detailed understanding of standalone versus distributed deployment is covered in the following section of this chapter, Splunk Validated Architectures (SVAs). By the end of this section, you will be familiar with two types of Splunk components—namely, processing components and management components.
The following are processing components:
Let’s understand the roles of these components in detail and their association with management components.
As the name suggests, this primarily forwards data from the source to the target indexer. There are two types of forwarders:
UF is a software agent typically installed on the source system where data is being generated. It consists of an input configuration (that is, an inputs.conf
file) with a list of absolute file paths along with metadata fields such as index and sourcetype. UF is the preferred approach to monitoring and forwarding file contents to designated indexers. By default, UF makes use of the fishbucket process to forward data for indexing exactly once and avoids data duplication through cyclic redundancy checks (CRCs) and seek pointers. You will find further information about the additional supported data inputs and detailed explanations about the fishbucket concept in Chapter 9, Configuring Splunk Data Inputs.
The following diagram illustrates UF installed on a web server configured to monitor the web server logs and forward them continuously to the indexer as and when the logs get updated:
Figure 1.1: UF forwarding web server logs to indexer
Let us now look at SH, which is a critical user-facing processing component in a distributed deployment.
HF is a Splunk Enterprise instance and doesn't require separate binary for installation. It provides an extended feature set compared to a UF. It not only collects and forwards data, but also includes a Splunk user interface for configuration and management. To operate an HF, a forwarder license is required. Typically, an HF is configured in forwarding mode by disabling local data storage. Splunk Add-ons available on Splunkbase can be installed on an HF to facilitate data collection from various sources. This combination of features makes HFs a versatile choice for preprocessing and forwarding data while benefiting from a user-friendly interface.
The SH component is a Splunk Enterprise instance that is dedicated to search management and provides a number of interfaces for users to interact with. The popular interfaces it offers to users are web, CLI, and RESTful API.
Multiple SHs can be grouped together and form a cluster called a SH cluster (SHC). Members of an SHC share the same baseline configuration, and jobs are allocated to available members by the SH captain.
In a standalone deployment, a single Splunk Enterprise instance (that is, the same instance) works as both the SH and indexer. In a distributed deployment model, the SH or SHC can submit searches to multiple indexers and consolidate the results returned. The results are stored locally in a dispatch directory located in $SPLUNK_HOME/var/run/splunk/dispatch
for later retrieval, and the results will be deleted after the job expires. $SPLUNK_HOME
refers to the installation directory where the Splunk software is installed. For example, ad hoc search results (that is, the search job outcome) are retained for 10 minutes in the dispatch directory, which will be removed after the job expires by a process called the dispatch reaper, which runs every 30 seconds.
SH stores search-time knowledge objects that work directly on raw data and/or fields being returned from the indexer—for example, knowledge objects such as field extractions, alerts, reports, dashboards, and macros are categorized as search-time knowledge objects in Splunk.
The following diagram illustrates a distributed deployment configuration featuring a single dedicated SH that communicates with three separate indexers when executing a search query:
Figure 1.2: SH and indexers interaction
Let us look at another critical processing component—the indexer, which is also called a search peer, as it responds to queries issued by the SH.
The indexer accepts and stores the indexed data, which can be retrieved later when requested by the SH. The sources of data transmission can include forwarder agents or inputs without requiring dedicated agents. The indexer(s) can be set up as either standalone instances or as a clustered configuration for HA. The data that has been indexed remains unchangeable and is stored in the form of buckets. More details about buckets are provided in Chapter 5, Splunk Index Management:
Figure 1.3: Indexers receiving data from forwarders and storing it in indexes
So far, we have gone through the processing components and their roles in a Splunk Enterprise deployment. Let us go through the management components in the following section.
These are management components that support the processing components:
We’ll discuss them in the following subsections.
A standalone Splunk Enterprise instance is used to manage the forwarders. The forwarders, which are located at the data source (typically a UF), often need new configurations to monitor new files or changes to an existing configuration followed by an optional restart. Changing them manually is a very time-consuming task in larger infrastructures. That’s where the DS comes to the rescue, by maintaining a central repository of configurations in the form of apps. In addition to UFs, HFs can also be centrally managed using a DS.
Chapter 4, Splunk Forwarder Management, goes through more details on this topic.
The SHC-D manages app configurations and deployments for an SHC in Splunk Enterprise deployment. It distributes app bundles to the SHs, applies configurations, and coordinates rolling restarts if needed.
The SHC-D usually stores all the apps at the following location: $SPLUNK_HOME/etc/shcluster/apps
.
An indexer cluster incorporates a distinct Splunk Enterprise instance that functions as a Cluster manager, known as a CM. This CM does not engage in typical search operations but rather oversees the indexer cluster, governing it in the following ways:
The Search head indexer clustering overview section of Chapter 7 will explain the RF and SF in detail.
All components in Splunk Enterprise require a license for commercial use, except for UF, which is a software offered by Splunk that is available for use without requiring a separate license. The LM is loaded with the license file received from Splunk sales by an admin. Multiple license files might exist depending on the agreement with Splunk. The rest of the instances in the deployment, called license peers, are connected to the manager node. The manager node acts as a central license repository for configuring stacks, pools, and license volumes. It stores usage logs in a license_usage.log
file, which tracks all Splunk instances connected to the LM for violations and their usage. Out-of-the-box license reports are dependent on this log. We will discuss this in detail in Chapter 2, Splunk License Management.
The MC is a built-in app in Splunk that provides a centralized location for monitoring and managing Splunk deployments. It offers a GUI that allows administrators to monitor and configure various aspects of Splunk, including alerts and dashboards for monitoring indexing, license usage, search, resource usage, forwarders, health checks, and more. We will go through some of these dashboards in detail and set up alerts in later chapters.
Note
Do note that although these components have dedicated roles and activities to perform, some of them can be installed together on the same Splunk instance. A matrix of which components can be combined is provided in the docs: https://tinyurl.com/26f9n5zf.
We have come to the end of the components section. We learned that a UF is preferred for file monitoring and forwarding data to indexers. Depending on the deployment type, whether standalone or distributed, the number of components required to set up differs. Standalone Splunk doesn’t require many components as it functions as both an SH and indexer. A distributed deployment includes a number of additional management components for deployment, cluster management, and license management. The Splunk Enterprise binary utilized for all components remains same; the differentiation lies in the configuration of each binary instance, determining the role of each component such as the SH, indexer, SHC-D, DS, or LM.
As we dive into the chapters associated with both processing and management components, we will look into these topics in more detail, and you will find them discussed a lot throughout the book. So, understanding these components and their role in Splunk Enterprise deployment is quite important to understand the rest of the sections and chapters.
This section is completely optional as this topic isn’t included in the Splunk admin exam blueprint; however, I recommend going through it to get an insight and familiarize yourself with what Splunk’s architecture looks like, as well as where the processing and management components are positioned and interconnected.
So far, we have learned about Splunk Enterprise’s features and components and their roles in a standalone or distributed deployment. It is time to see some of the deployment architectures, called SVAs, curated by the best minds at Splunk Inc.
Just as there is more than one solution to a problem, similarly, a single architecture might not fit every organization. For Splunk Enterprise architects and Splunk Enterprise admins who go through many variables and evaluate to come up with a suitable design, SVAs offer guidance with best practices and off-the-shelf readily available designs. A Splunk Enterprise architect’s roles and responsibilities vary from that of a typical admin. Splunk Education offers courses to prepare you to become a Splunk Enterprise-certified architect, and the Splunk Enterprise Admin certification is a prerequisite.
Let’s go through some of the prominent validated architectures of Splunk Enterprise on-premises. A full list of SVAs is available here: https://www.splunk.com/pdfs/technical-briefs/splunk-validated-architectures.pdf.
A single-server alias standalone deployment consists of a Splunk Enterprise instance that combines both SH and indexer functionality.
The following diagram shows the deployment architecture:
Figure 1.4: Standalone deployment architecture
The diagram shows a standalone/single Splunk instance, a collection tier forwarding events to a single instance, and an optional DS to manage the collection tier/forwarders.
The only advantage of this deployment type is its cost-effective and easy to manage.
Let’s look at the limitations of this deployment type, as follows:
Let’s take a look at distributed non-cluster deployment, which is a more advanced setup than a single-server deployment.
Distributed non-clustered deployment works better for additional workload and indexing capacity than a single-server deployment. The separation of SH and indexing duties increases the total cost of ownership (TCO).
The following diagram shows the non-clustered deployment architecture with separate SH and indexing tiers:
Figure 1.5: Distributed non-clustered architecture
In the depicted architecture, a separate search tier comprises a SHC and an indexing tier with multiple standalone indexers. The SHC-D is a mandatory management component responsible for deploying configurations to the SHC using apps. It facilitates the deployment process by pushing configuration updates via apps from the SHC-D to the SHC. A DS is utilized for managing forwarders, while an LM stores license information. The DS ensures effective forwarder management, while the LM serves as a central repository for license details, with all other instances connecting to it for license information. Let’s look at the advantages of this deployment over single-server deployment:
Now, let’s look at the limitations of this deployment:
Let’s take a look at distributed cluster deployment, which is a more advanced setup than a distributed non-cluster deployment.
A distributed clustered SH and indexer deployment at a single site is a highly available, resilient architecture. A site is a classic data center in a particular region/geography.
The following diagram shows the clustered deployment architecture with a separate SHC and clustered indexing tier running on a single site:
Figure 1.6: Distributed clustered deployment and SHC – single-site
Figure 1.6 shares similarities with Figure 1.5, as it depicts a similar architecture. However, in Figure 1.6, an additional management component, known as the CM, is introduced. The CM is responsible for overseeing and managing the indexer cluster, providing coordination and control of the cluster’s operation. It acts as a central point for configuring and monitoring the indexers within the cluster, ensuring their effective functioning and synchronization. Let’s understand the advantages of this over the two architectures we previously looked at:
Now, let’s look at its limitations compared to the previous architectures:
Let’s take a look at the multi-site distributed clustered deployment, which is a more advanced setup than distributed clustered deployment and single-site.
This is by far the most complex architecture valid for organizations that have strict HA and DR requirements. It has the same advantages as single-site architecture (as seen in the previous section), and the failure of a site doesn’t impact the entire deployment.
The following diagram shows a clustered deployment architecture with an SHC and clustered indexing tiers deployed in more than one site:
Figure 1.7: Distributed clustered deployment and SHC – multi-site
As in Figure 1.6, the components remain the same in each site. However, the collection tier is common across both sites. Each site has a dedicated SHC.
Let’s understand the limitations:
We’ve looked at a very basic single-server architecture (preferably used for testing or development) and an advanced multi-site cluster deployment architecture. Each has its advantages, limitations, and cost implications. At this stage, you are pretty much familiar with Splunk components and architectures. In the next section, we are going to install a standalone/single-server deployment, which we talked about at the very beginning of this section.
As discussed in the preceding section, a single-server deployment consists of a single Splunk instance combining both SH and indexer functionality. The installation actually isn’t part of the admin exam blueprint; however, it is very helpful to get your hands dirty by experiencing Splunk yourself through the Splunk Web, configuration file (.conf
), and CLI options that we are going to discuss in upcoming chapters. This section provides instructions for installing Splunk Enterprise 9.0.3 on the Windows operating system. Let's get into it.
Let’s look at the system requirements of the computing environment. Splunk Enterprise supports multiple operating system environments. A full list of the supported options is available here: https://tinyurl.com/2tuudjwr. Splunk has the following hardware requirements:
My system specifications for where Splunk version 9.0.3 is going to be installed are as follows:
You might have noticed the physical CPU cores in my PC are fewer than recommended, which is absolutely fine as we are not going to run production workloads on the Splunk instance. Let’s get into the installation steps, as follows.
As a prerequisite, you need a high-speed internet connection to download the Splunk Enterprise free software package from here: https://www.splunk.com/en_us/download.html. If you do not have a Splunk account, then sign up and log in to continue. Choose the installation package by operating system and download the latest version, which is 9.0.3 at the time of writing.
Let’s begin the installation:
.msi
file that appears as splunk-9.0.3-dd0128b1f8cd-x64-release.msi
. Double-click on it to start the installation. You will be prompted to accept the license with the default installation options. Refer to Figure 1.8 and click the Next button:Figure 1.8: Installation – license agreement
Figure 1.9: Installation – creating administrator account credentials
Figure 1.10: Installation – click Install to begin
Figure 1.11: Installation successful
8000
is the default Splunk Web port and 127.0.0.1
is the loopback address. Enter the admin credentials created in step 2; then you will be taken to the Splunk Enterprise home page at http://127.0.0.1:8000/en-GB/app/launcher/home:Figure 1.12: Splunk Enterprise – first-time sign-in page
The installation is successfully completed. Now, let’s summarize what we learned in this chapter in the next section.
We have come to the end of the first chapter. There has definitely been a lot to digest. Let’s briefly summarize what we have learned so far.
In this chapter, we began by looking at the Splunk Certified Admin certification prerequisites, the exam topics, and their weightage. In line with the exam topics, this book is organized into two parts: Splunk Enterprise system administration and data administration. We also discussed the exam pattern, which includes single- and multiple-choice as well as true/false questions.
We looked at the fundamentals of what Splunk Enterprise does and its key highlights as a data analysis product. We then progressed to look at the Splunk Enterprise 9.x product family features, followed by components and their role in deployment.
We also looked at prominent SVAs. We covered single-server, distributed non-clustered, distributed clustered single-site, and distributed clustered multi-site architectures. We discussed their advantages and limitations, showcasing processing and management components. Finally, we successfully installed a Splunk Enterprise single instance on a Windows system.
This chapter is the foundation for the rest of the book. The Splunk components that we looked at will be detailed in further chapters. It is required to know in what context they would be used and how they help in overall Splunk deployment architecture. Though SVAs are not part of the exam guide, they are included in the book to give you a better understanding of the upcoming chapters.
In the next chapter, we are going to deep-dive into license management. License management includes types of licenses, how they work, and license configuration.
In the next section, you are going to practice exam-style questions covering the topics that we have learned so far.
This self-assessment section is to help you better understand which sections you are good at and which need improvements out of the topics covered in the chapter. I would suggest carefully reading the questions and answers and taking your time to go back through the sections that you think need more understanding. Alternatively, you could refer to the Splunk documentation. Good luck!
You will be given 10 questions and answers to choose from. The question patterns are the same as discussed in the Introducing the exam’s test pattern section. At the end of this section, answers to the questions are provided. Let’s get started:
.conf
) file contains file monitoring details on a forwarder?outputs.conf
inputs.conf
server.conf
source.conf
I hope you were able to recollect the topics that we went through with these questions. Let’s review the answers.
outputs.conf
and server.conf
are Splunk configuration files used for different purposes. There is no source.conf
file available in Splunk.Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.
If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.
Please Note: Packt eBooks are non-returnable and non-refundable.
Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:
If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:
Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.
You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.
Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.
When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.
For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.