My history with Splunk goes back about 4 years to when I was working for a company that was building a browser plugin. All of the logging for all the users was going to be built around Splunk. I am not sure whether they knew the implications, but it was not until some 2 years later that I saw the full benefit of making this decision. I had been convinced of the power of Splunk. I saw it as a great platform to build and develop applications and reports with ease, and it should be looked at in exactly the same way as LAMP or other development stacks. I also saw the opportunity to write a book about the Splunk Web Framework as a great way to show other people what I have learned without them having to waste the time of trial and error that I had to.
If you have not yet installed Splunk on a virtual machine, server, or your own PC or laptop, it is probably best to get this done now before moving further. Towards the end of this chapter, we will introduce the data and example projects that we will be working on throughout this book. The example work that we will be performing throughout this book will be on a Linux or Mac platform. You should be able to follow along if you are using a different platform. If you have not installed Splunk before, you will be able to get all the details you need for your installation at the following link: http://docs.splunk.com/Documentation/Splunk/6.3.3/Installation/Chooseyourplatform .
So you've installed Splunk, got things running, and now what? Hopefully, that is where this book will come in and help you get the ball rolling, making fresh, interactive, useful, and dynamic applications using the Splunk Web Framework. We are hoping that we can actually get you creating some interesting applications without the usual log, index, search, graph, and report documentation that seems to be out in abundance.
Welcome to the Splunk Web Framework, which has been set up as an essential support structure for Splunkusers to build custom reports, dashboards, and apps on Splunk and with Splunk. This means that there is a supporting environment that can be used to develop end-to-end applications with no need to install anything other than Splunk. The Splunk Web Framework allows the user to start from the basics using a drag-and-drop interface, and makes them able to get underneath the hood and interact and customize the code directly. Further still, developers don't even need to develop with Splunk as their platform of choice to display their data. They are free to simply interface with Splunk API calls, search for data, and then display this returned data directly on their own websites and applications.
As of Splunk version 6, there was a major overhaul to the Splunk Web Framework. The framework is now integrated directly into Splunk Enterprise 6, so now you don't need to install anything else to start using the web framework. Previously, in Splunk 5, you needed to use a standalone version of the web framework. So unless you're using an old version of Splunk, you will be able to get going and working with the framework straight away. All your apps from previous versions of Splunk should work on Splunk 6, including apps created in Advanced XML, so it is well worth the upgrade to get an improved interface and functionality that it brings.
Let's get this out of the way early. You may have heard about Advanced XML, or you may have even seen some dashboards or views created in your environment that have been set up using Advanced XML. As of Splunk Enterprise 6.3, the Advanced XML feature has been deprecated. Although apps and dashboards using Advanced XML will continue to work and Splunk will continue to support and fix bugs, there will no longer be any feature enhancements to the Advanced XML feature of the Splunk Web Framework.
A date has not yet been set for the removal of Advanced XML from Splunk Enterprise. All future development should be done using other features of the Splunk Web Framework, and all existing apps or dashboards that use Advanced XML should be migrated away from Advanced XML and onto one of the other options available in the Splunk Web Framework.
All the examples and work in this book will be using Splunk version 6.4, so we will not be performing any of the example exercises in Advanced XML. When we start to develop with Splunk's XML code, the only approach we will take towards Advanced XML will be to show you how to recognize applications made with Advanced XML.
The Splunk Web Framework is now built directly on the core Splunk daemon, splunkd. Originally, splunkd only handled indexing, searching, and forwarding, but as of version 6.2, it also operates the Splunk Web Interface. Making this change was practical because it gave the framework the tools you need to build web applications directly on Splunk, as well as use the data that Splunk provides to display on your own website.
Within the framework, you have an app that will include numerous dashboard elements within the app. Within the dashboards, you will then have numerous panel and visualization elements that will make up your dashboard:
The preceding diagram provides a clear breakdown of the architecture, and its three distinct layers. It shows splunkd, which is built on C/C++ for speed and stability, as a server that provides the indexing and searching capabilities to the SplunkJS stack, which delivers the display and interface supporting the SimpleXML, HTML, and external web displays. Each layer builds on the others, providing further enhanced functionality.
By now, I am sure you at least know that Splunk has a web interface. If you are competent with using Splunk, you would already be familiar with using the web interface for searching, configuring, and administration of Splunk. As part of the Splunk Web Framework, the web interface also provides an easy-to-use graphical user interface, which allows you to drag and drop tools and functionality with no prior programming knowledge or experience. It provides rapid development on the framework and allows you to visualize dashboard panels with ease.
The dashboard editor is the main interface and is part of the SimpleXML layer of the Splunk Web Framework; it allows you to build dashboards within Splunk Web. Here you can visualize your events and statistical information as dashboard panels and views and provide charting functionality. It even allows you to start providing form-based controls and an interface with the user.
Simple XML expands the functionality of the framework further and allows the user to fine-tune the dashboard panels with more layout and display options. Splunk's Extensible Markup Language (XML) is the underlying code that is developed when using the web interface and dashboard editor. Simple XML code can be edited and manipulated directly from Splunk's built-in editor, or you can use your own code editor to configure the easy-to-learn syntax. The directory structure within Splunk is also straightforward and easy to learn, and it helps you manipulate the environment in ways that you can't actually do within the web interface.
From the preceding example, you can see that the syntax of the Simple XML code is straightforward and relatively easy to learn. The code provides a multitude of options to tweak and fine-tune all aspects of the display of the different types of panels provided. It is definitely worth learning to use this function of the Splunk Web Framework. Although the drag-and-drop interface allows you to develop rich and interesting dashboard panels, sooner or later you will start to want to configure the display in a way that you can only do in SimpleXML.
Each visualization type has a long list of properties that can be managed and changed through SimpleXML code. Although simple, you still need to adhere to the white space and open and close tags within the code. If not, you could end up with no display provided.
SplunkJS provides a framework of tools and libraries that allows developers to build and manage dashboards and organize dependencies, as well as integrate Splunk components into their own web applications. The libraries allow you to manage views and search managers to allow you to work with searches and interact with Splunk data.
SplunkJS removes the developer from the Splunk Web Interface but gives the ability to both build Splunk Apps for Splunk and build web applications using Splunk data.
This is the main system process that Splunk uses to handle all of the indexing, searching, forwarding, and web interface that you work with in Splunk Enterprise. Although we will need to restart Splunk and the splunkd process occasionally, this book will not be focusing on splunkd, as this would be more of a server administration focus.
For the next few pages, we are going to take a little break from Splunk and specifically look at the development process and using Git as part of this process. The topics covered are more suited to new developers or developers who are not familiar with working as part of a team or with applications such as Git. If you are familiar with these subjects, feel free to jump to the end of this chapter, where we introduce the sample data and example applications we will be working on through this book.
When you work in a team developing applications, there is most likely a process in place that you would need to follow to develop, deploy to the development environment, test, deploy to the test environment, test, and deploy to production. It sounds like a lot of work, but the last thing you want to be doing is deploy a new application into a production environment and realize that you have misspelled the company name, or worse still, you are getting the following dreaded no show screen from Splunk.
This book is not designed to educate you on the software development process, and there are many books, videos, and courses dedicated to the subject, but we will go through a brief run through of the types of things you should be thinking about and the types of good habits you should be getting into.
So, even if you are just developing at home on your own projects, it is good practice to get into the habit of setting up and following a development process, including using a specific development host that mirrors the setup of the production server and some form of version control software:
Develop your application in your development environment. Even if your development environment is on your laptop or PC, you need to make sure that you are developing on an identical environment to what your application will be eventually deployed on. You won't be able to have everything 100%, but you need to make sure you are using the same language versions and libraries and on the same operating system.
Test your application in your development environment. Within Agile development methodologies, we can perform test-driven development, where the writing of tests should be performed at the start of the development process. As each iteration of your application is completed, you then need to implement these tests to verify the operation of your application and lodge any bugs or defects that may be found after the development process.
In the preceding diagram, we showed that we are packaging our application. For now, we will be using Git as part of our development process instead of packaging our application before release. In later chapters, we will also take a look at packaging our Splunk app to deploy and allow others to use our application.
Deploy your application in a test environment. This is only after your application has successfully passed testing. This should be a standalone environment, isolated from development and once again set up to mimic your production setup. A test environment should go further than your development environment to mimic how the application will be run in production. It should even be on the same hardware as well as operating systems and have the same accompanying applications.
Test your application in a test environment. Upon successful deployment in your test environment, you can test the application further. It is not a matter of simply performing the same tests that you did in development. This is your chance to perform security tests, make sure that the performance of the application and surrounding applications that are on the same environment is also fine, and simulate production loads to ensure that your application operates under heavy usage.
User acceptance testing. If you are working for a specific client, you may be asking them to access the application deployed in the test environment and make sure that it operates to their agreed-upon standard. This may mean that the client has requested specific features be added and bugs be removed. If user acceptance testing is in place for your development process, this will usually be the final approval before it is deployed to production.
Deploy to production. It's time to push the button and deploy your changes into production. If everything has worked as it should, you shouldn't have any surprises, but it is still important to test your application as you would to make sure that the functionality of your application still works the way it should.
Monitor a new application in production. We're working with Splunk aren't we? Well, this is where we can set up monitoring for our application to make sure we are not seeing an increase in errors, a decline in usage, weird things happening with our hardware and unauthorized users accessing our application.
In the early stages of development of an application, the development process can be stripped down a little. Your production environment may be running on your laptop, but still keep the aforementioned processes in mind so that when you move on to developing within more complex environments and architectures, you will have the basics covered and extending them will not be too difficult.
You should have a development environment set up as closely as possible to mirror what you are deploying in production. If you need to set up VirtualBox, VMware, or another virtualization environment, it is worth doing so to make sure you are setting up an operating system—the same as what you have in production. At the very least, your version of Splunk should be the exact same version as what you will be deploying in production.
Nowadays, with products such as Amazon Web Services, Google Cloud, and Softlayer from IBM, they offer us a much easier way to create development, test, and production environments that all mirror each other without the need to interact with hardware. Automation can also be put in place to create the environment, deploy code, and then test against that environment. Within later chapters of this book, we will touch on automated testing, packaging, and deployment of our code, but for now, we will use collaboration tools such as source code management software to allow us to deploy our code in development and in turn revert changes when needed.
It may not be possible to have the data indexed in exactly the same way as you would be able to in production, but ensure that you have a sample to demonstrate that visualizations and reports are operating correctly and will provide the insight that you need. Try to have as much data as you can, as with reporting tools such as Splunk, your development process may need to incorporate speeding up and optimization of your searches.
When discussing the development process, it's probably the best time to introduce collaboration tools such as Git to help you manage your code and track changes. Git is a free and open source tool that offers source code management and collaboration features that should hopefully improve the way we code and interact with our code. As a developer working on smaller projects and development environments, you may be tempted to simply make the changes locally and upload your work to a web server when you're done, but by using source code management software such as Git, you are able to do the following:
Track and monitor changes to your code. Even if you are working alone on a project, Git will allow you keep a historical log of all the changes made to your code. You may find non-developers accessing code on production environments and making changes to code. Git allows you to verify that the code has not been altered from the original source code. Disk space is not over-utilized in the process as Git only keeps a copy of the changes made and not an entire copy of the software each time changes are made.
Create specific versions of projects. This allows you to demonstrate changes over time to keep track of feature enhancements to your code and bug fixes, and allows you to easily establish when bugs may have entered your code.
Revert to old versions of code. As you have been creating versions of your software and tracking your changes, it then becomes a lot easier to back out of changes or revert to old versions of code if something goes wrong. As long as your servers have Git installed and can access your repository, changes can be deployed or reverted with ease and pushed onto each of your development and production environments.
It allows you to collaborate with other developers. Features and projects can be branched off, so development can be performed on the same code by numerous developers and then merged back once the development is complete. Git also allows these projects to be updated from the central code base on a periodic basis to ensure that these projects keep up to date with the other features being developed around them.
Store your code in a centrally hosted location. In this book, we will be using GitHub, which is a free hosted service that allows all our code to be hosted in a central location to make sure that we do not need to be working on a specific laptop or have access to a specific server to be able to work on our code. If security is an issue, you can use a licensed version of GitHub to ensure that your code is private, or you can host a Git environment on your own servers to increase security even further.
Allow your code to be reviewed by other developers. GitHub allows you to create requests to have your code reviewed by other developers and allow them to vote or approve the code changes made.
If you want to collaborate with a few other developers, you will either need to have a Git server running or be using a Git hosting repository service. As we have mentioned earlier, we will be using GitHub as it is one of the most popular online repositories available to use and is free if you don't mind not being able to create private repositories.
You can install Git directly on your PC or laptop and use it as a standalone application without any problems. As for our projects and examples, as well as having Git installed, we will set up an account with GitHub and create a new repository for storing all our apps that we develop for Splunk.
A lot of the work we will be doing with Git will be performed on the command line, but there is a little work to be done on the GitHub web interface. Git will also work with different Integrated Development Environments (IDEs).
Let's start by creating an account on GitHub. Go to the following URL and create your own account: https://github.com/.
Take a little time to set up your account and add all your specific details and passwords. Make sure that you also set up SSH keys on GitHub as this will allow you to pull and push changes to and from the GitHub servers. You will still be able to create repositories and track and add changes, but you will not be able to make any of these changes public to other developers; they will only be available on the local PC or laptop you are developing on.
In the following example, we will work through setting up a repository to store an app.
Make sure you are happy with the free account as the repositories will be public. Within your account, you will have a Repositories tab; click on that and click on the New button. You will be presented with the following screen to give your repository a name and description and display it as Public or Private. When you are happy with the name, click on Create Repository:
We are using the free version of GitHub. Please make sure you are happy with this before you start creating repositories that need to be kept private or have sensitive information. You may need to look at a different solution or pay for a Private GitHub repository.
For now, this means we have somewhere to store our repository, but we still need to initialize our repository where we will be developing it. We will create a simple
README.md file in a development environment and initialize it:
Access your development environment and make sure it is set up to run Git.
Go to the directory that you want to be developing on.
Run the following command to create the
README.mdfile and populate it with its first line:
echo "# SplunkAppDev" >> README.md
Then run the following Git command:
We have now initialized our repository, which tells our Git installation that we are setting up a repository and everything inside this directory will be included. This now includes the new
README.mdfile. We will be able to see that Git recognizes that we have initialized a repository, but does not know where to put the information. We will now see what Git is thinking about our code, add our
README.mdfile, and then commit our changes to our repository in GitHub.
To see if there have been any changes made in your repository, run the status command:
git status On branch master Initial commit Untracked files: (use "git add <file>..." to include in what will be committed) README nothing added to commit but untracked files present (use "git add" to track)
We then use the
addcommand to allow Git to track our new file:
git add .
When we are happy with all our additions, then we commit the changes that have been added:
git commit -m "Our first commit"
All this is still on our local Git application, so let GitHub know we are going to add some more information. Get the URL for the repository you have created and run the following command:
git remote add origin firstname.lastname@example.org: <username>/<repository>.git
Finally, push your changes back to the remote repository on GitHub:
git push -u origin master
If you access the GitHub web interface again, you will be able to see the new files added to your repository.
When we want to start working on development projects, creating features and bug fixes for application and code, the best thing we could do is create a branch from our master code. In our previous example, we simply added files and committed changes to our master branch. But what if we wanted to develop on one specific feature while someone else works on a bug in the code? This is where we can create a branch from our master branch of code and work on it in isolation, while our fellow developer creates a separate branch and works on their bug fix.
The best thing about branching is that we can use this to follow the development process that we outlined earlier in the chapter as we can create and develop on our branch, test these changes, before merging the code back into the master before we then deploy our changes to our test environment and then production.
The following diagram gives you a clear example of how the development branch is taken from our master code branch. Code is changed and commits are made to the code in which the new features are created. The changes are tested and once complete, a pull request is made, allowing other developers and our peers to view the changes and make sure there is nothing that we have missed or could have done in a more efficient way. Once the pull request is approved, we can merge our code branch into the master and deploy our changes into our production environment.
In the following example, we will create a branch from our master repository, make changes, and then merge the changes back into the master branch:
First we want to make sure that the master branch in the environment we are developing in is as up to date as possible, so we will be in sync with what is currently on GitHub:
Then we use the checkout option to create a branch of our master code:
git checkout -b branchname master
We then simply go about our work as we normally would, adding and committing changes as we did in our previous example and making sure we regularly push our changes back up to GitHub. Sometimes our development may run on for days and we should be merging changes from master back into our branch.
Move back to the master branch:
git checkout master
Grab any changes that have been made back onto our system:
Change back to our development branch:
git checkout branchname
Then merge any changes from the master back into our branch to make sure we are developing on the later version of code:
git merge master
So far, as part of our development process, we have been making changes to our code in a development branch, but at some point in time, we will want to be able to merge our branched code back into our master branch. Of course, this will only happen once we have successfully tested our changes in our development and test environment.
In these situations, it is simple to merge the branched code back into the master, but as we are working in a development team, we create a pull request, we ask that other developers to review our changes, and then once they are approved by our peers, they can be merged back into the master branch.
To create a pull request, we need to go back to our GitHub repository and click on the New Pull Request button at the top left of the screen. We will then be presented with a similar screen to the following one:
In the example screenshot, we can see that we are using the master branch as our base and using our branch (which in this case is called
branchname) that we can compare it with. This feature of GitHub also shows us the differences between the two branches, where additions are in green, and if we removed code as part of our branch, we would see it highlighted in red. Once you then click on Create Pull Request, you are given the option to provide some more information about your changes, so your reviewers will then have some idea of what the code is doing. This is displayed in the following screenshot:
Once you create your
pull request, you can then send the request out to other developers to allow them to view, comment on, and vote on your changes.
Once everyone is happy with the changes, click on the Merge Pull Request button at the bottom of the screen where your branch will be merged back into master, hopefully ready for your changes to be then deployed to your production environment.
There may be some situations when a change has been implemented into production and testing within the development and test environments has missed some specific edge cases that are being hit when the code is released into production. This does happen occasionally, but when we are using Git, we have a way to quickly go back to our old release.
Within GitHub, you will be able to view a history of commits that have been made over the history of your development. Each commit is provided with a commit hash value, which is a 40-character alphanumeric value that can be used to then revert your changes to an earlier commit that you are sure is working. The following command uses an example commit hash, but you can locate your commit has to your code from GitHub. To revert changes, you can use the following command from the command line in your development environment:
git revert -r e088c3a4b62aec6729021945d6d2b0adc9734c72
The preceding command does not need to have the entire Git hash specified, but you can only provide the first five or so characters that provide enough information to identify the specific commit. The best thing about Git is that if ever a file system is corrupted, tampered with, or destroyed, we have the data stored and available on Git ready to be cloned back to our environment. In case of emergencies, the easiest thing that you might want to do is remove the directory that your application is located and then create a fresh clone of the data, as follows:
git clone email@example.com:username/repositoryname.git
This is just a simple introduction to Git and there are many books and websites that can give you a much more in-depth overview of using the application. It is definitely worth getting comfortable with applications such as Git if you are planning to continue working and developing in the technology sector.
This is a good time to introduce the example projects that we are going to work on in the book. The three examples are varied in the type of data they are presenting, in the hope that the examples will present the user with different ways of visualizing and working with different data. It may be worth getting the data indexed so that you can start to get an idea of what we will be working with.
Although the data is a little old, I think it can give an interesting insight into the web traffic for the NASA website. The data is from 1995 and contains two traces of two months of all HTTP requests to the web server at the Kennedy Space Centre in Florida. The log files are Squid proxy logs and provide details on the host making the request, timestamp, request being made, HTTP reply code, and bytes in the reply.
A download of the data can be found at the following location: http://ita.ee.lbl.gov/html/contrib/NASA-HTTP.html.
The example projects will help analyze the web traffic hitting the NASA website and provide visualization and insights into the site's usage. The data will allow us to start with basic visualizations within the SplunkWeb Framework.
If you have been working in development, even for a short period of time, I am sure you will have heard of Conway's Game of Life. Even though it's called a game, it's more of a simulation of biological cells, where we can watch the cells evolve to either live or fail. The cells are governed by a set of rules that determines if they live or die through each generation or step in the simulation:
Any live cell with fewer than two live neighbors will die, as if caused by under-population.
Any live cell with two or three live neighbors lives on to the next generation.
Any live cell with more than three live neighbors dies, as if caused by overpopulation.
Any dead cell with exactly three live neighbors becomes a live cell, as if caused by reproduction.
The logs presented here are random, but will consist of the grid where the cells will live, a timestamp, and the cells that are present through each generation of the life cycle (https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life).
I have created a GitHub repository with a basic example of Conway's Game Of Life, but I have also produced logs for the script for 2 hours to give you some sample data that can be worked with through the examples. The sample Python script and log file can be found by going to the following link: https://github.com/vincesesto/game_of_life_splunk.
From here, you can index the file called
game_of_life.log. If you are using at least version 6 of Splunk, the logs will be indexed correctly with the events separated correctly for each date and timestamp. The sample log file will look similar to the following image:
The example data that we have will allow us to analyze the simulation of cells, and although the data is not very complex, we should hopefully provide some interesting visualizations and take our skills with the Splunk Web Framework further.
Yahoo! Finance provides an API that allows people to download historical stock market data directly to their environment. In our example, we will take a few different companies and download their historical data for the year 2015, displaying the date stamp, opening value for the day, highest value of the day, lowest value for the day, closing value, volume traded for the day, and adjusted close value of the stock. The sample data will be in CSV form and the API call will be similar to the following URL: http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2016&g=d&a=3&b=12&c=2015&ignore=.csv.
The API call is pretty straightforward and the commands are listed here:
s: Company symbol (Yahoo!)
d: To month
e: To day
f: To year
g: Set up of date (d for day, m for month, y for yearly)
a: From month -1
b: From day (two digits)
c: From year
For more details on different company symbols and more explanations of the data that the API can provide, go to the Yahoo! Finance site at https://finance.yahoo.com/.
The data presented is an interesting sample is varied, allowing for interesting trend analysis. This is where we will take our skills further and start to use more of the advanced features of the Splunk Web Framework.
In this chapter, we covered the fundamentals of the Splunk Web Framework, including the architecture of the environment and an explanation of all the different components. We have walked through the development process and discussed having a good procedure in place before you start to develop. We took a look at Git, the application and hosting code repositories on GitHub, and finally the example data we are going to be working on through the rest of the book.
We also outlined some of the reasons behind the book and the hope that we will be able to bring you on an interesting and motivating journey into the Splunk Web Framework.
It feels like we have been doing a lot of reading and not a lot of work, but hold on! The next chapter is going take is into the work of Splunk App creation using the Splunk Web Framework. We will get our feet wet with our first Splunk App using the Web Interface; we will create dashboards and basic dashboard elements for our App. We will also gain an understanding of the structure of Splunk Apps and their file structure and discuss why it is important to understand our audience.