You're reading from The DevOps 2.5 Toolkit

Product type Book

Published in Nov 2019

Publisher Packt

ISBN-13 9781838647513

Pages 322 pages

Edition 1st Edition

Languages

Concepts

DevOps

Author (1):

Viktor Farcic

Visualizing Metrics and Alerts

It is curious how often you humans manage to obtain that which you do not want.

- Spock

Dashboards are useless! They are a waste of time. Get Netflix if you want to watch something. It's cheaper than any other option.

I repeated those words on many public occasions. I think that companies exaggerate the need for dashboards. They spend a lot of effort creating a bunch of graphs and put a lot of people in charge of staring at them. As if that's going to help anyone. The main advantage of dashboards is that they are colorful and full of lines, boxes, and labels. Those properties are always an easy sell to decision makers like CTOs and heads of departments. When a software vendor comes to a meeting with decision makers with authority to write checks, he knows that there is no sale without "pretty colors". It does not matter what that...

Creating a cluster

The vfarcic/k8s-specs (https://github.com/vfarcic/k8s-specs) repository will continue to serve as our source of Kubernetes definitions. We'll make sure that it is up-to-date by pulling the latest version.

All the commands from this chapter are available in the 06-grafana.sh (https://gist.github.com/vfarcic/b94b3b220aab815946d34af1655733cb) Gist.

 1  cd k8s-specs
 2
 3  git pull

The requirements are the same as those we had in the previous chapter. For your convenience, the Gists are available here as well. Feel free to use them to create a new cluster, or to validate that the one you're planning to use meets the requirements.

gke-instrument.sh: GKE with 3 n1-standard-1 worker nodes, nginx Ingress, tiller, Prometheus Chart, and environment variables LB_IP, PROM_ADDR, and AM_ADDR (https://gist.github.com/vfarcic/675f4b3ee2c55ee718cf132e71e04c6e).
eks...

Which tools should we use for dashboards?

It doesn't take more than a few minutes with Prometheus to discover that it is not designed to serve as a dashboard. Sure, you can create graphs in Prometheus but they are not permanent, nor do they offer much in terms of presenting data. Prometheus' graphs are designed to be used as a way to visualize ad-hoc queries. And that's what we need most of the time. When we receive a notification from an alert that there is a problem, we usually start our search for the culprit by executing the query of the alert and, from there on, we go deeper into data depending on the results. That is, if the alert does not reveal the problem immediately, in which case there is no need to receive notifications since those types of apparent issues can usually be fixed automatically.

But, as I already mentioned, Prometheus' does not have...

Installing and setting up Grafana

You probably know what's coming next. We Google "Grafana Helm" and hope that the community already created a Chart we can use. I'll save you from the search by revealing that there is Grafana in Helm's stable channel. All we have to do is inspect the values and choose which ones we'll use.

 1  helm inspect values stable/grafana

I won't go through all the values we could use. I assume that, by now, you are a Helm ninja and that you can explore them yourself. Instead, we'll use the values I already defined.

 1  cat mon/grafana-values-bare.yml

The output is as follows.

ingress:
  enabled: true
persistence:
  enabled: true
  accessModes:
  - ReadWriteOnce
  size: 1Gi
resources:
  limits:
    cpu: 20m
    memory: 50Mi
  requests:
    cpu: 5m
    memory: 25Mi

There's nothing special about those values. We...

Importing and customizing pre-made dashboards

Data sources are useless by themselves. We need to visualize them somehow. We could do that by creating our own dashboard, but that might not be the best (and easiest) introduction to Grafana. Instead, we'll import one of the existing community-maintained dashboards. We just need to choose one that suits our needs.

 1  open "https://grafana.com/dashboards"

Feel free to spend a bit of time exploring the available dashboards.

I think that Kubernetes cluster monitoring (https://grafana.com/dashboards/3119) dashboard is a good starting point. Let's import it.

Please click the + icon from the left-hand menu, followed with the Import link, and you'll be presented with a screen that allows us to import one of the Grafana.com dashboards, or to paste JSON that defines it.

We'll go with the former option.

Figure...

Creating custom dashboards

It would be great if all our needs could be covered by existing dashboards. But, that is probably not the case. Each organization is "special", and our needs have to be reflected in our dashboards. Sometimes we can get away with dashboards made by others, and sometimes we need to change them. In other cases, we need to create our own dashboards. That's what we'll explore next.

Please click the + icon in the left-hand menu and choose to Create Dashboard. You'll be presented with the choice of a few types of panels. Select Graph.

Before we define our first graph, we'll change a few dashboard settings. Please click the Settings icon in the top-right part of the screen.

Inside the General section, type the Name of the dashboard. If you are not inspired today, you can call it My Dashboard. Set the Tags to Prometheus and Kubernetes...

Creating semaphore dashboards

If I'm claiming that the value dashboards bring to the table is lower than we think, you might be asking yourself the same question from the beginning of this chapter. Why are we talking about dashboards? Well, I already changed my statement from "dashboards are useless" to "there is some value in dashboards". They can serve as a registry for queries. Through dashboards, we do not need to memorize expressions that we would need to write in Prometheus. They might be a good starting point of our search for the cause of an issue before we jump into Prometheus for some deeper digging into metrics. But, there is another reason I am including dashboards into the solution.

I love big displays. It's very satisfying to enter into a room with large screens showing stuff that seem to be important. There is usually a room where operators...

A better dashboard for big screens

We explored how to create a dashboard with a graph and a single stat (semaphore). Both are based on similar queries, and the significant difference is in the way they display the results. We'll assume that the primary purpose of the dashboard we started building is to be available on a big screen, visible to many, and not as something we keep open on our laptops. At least, not continuously.

What should be the primary purpose of such a dashboard? Before I answer that question, we'll import a dashboard I created for this chapter.

Please click the + button from the left-hand menu and select Import. Type 9132 as the Grafana.com Dashboard and press the Load button. Select a Prometheus data source. Feel free to change any of the values to suit your needs. Never the less, you might want to postpone that until you get more familiar with the...

Prometheus alerts vs. Grafana notifications vs. semaphores vs. graph alerts

The title might be confusing by itself, so let us briefly describe each of the elements mentioned in it.

Prometheus alerts and Grafana notifications serve the same purpose, even though we did not explore the latter. I'll let you learn how Grafana notifications work on your own. Who knows? After the discussion that follows you might not even want to spend time with them.

Grafana notifications can be forwarded to different recipients in a similar manner as how Prometheus' alerts are forwarded with Alertmanager. However, there are a few things that make Grafana notifications less appealing.

If we can accomplish the same result with Prometheus alerts as with Grafana alerts, there is a clear advantage with the former. If an alert is fired from Prometheus, that means that the rules that caused the...

What now?

Grafana is relatively simple to use and intuitive. If you know how to write queries for the data source hooked to Grafana (for example, Prometheus), you already learned the most challenging part. The rest is mostly about checking boxes, choosing panel types, and arranging things on the screen. The main difficulty is to avoid being carried away by creating a bunch of flashy dashboards that do not provide much value. A common mistake is to create a graph for everything we can imagine. That only reduces the value of those that are truly important. Less is often more.

That's it. Destroy the cluster if its dedicated to this book, or keep it if it's not or if you're planning to jump to the next chapter right away. If you're keeping it, please delete the grafana Chart by executing the command that follows. If we need it in one of the next chapters, I&apos...