Chapter 8. Monitoring and Logging
This chapter will cover the use and customization of both built-in and third-party monitoring tools on our Kubernetes cluster. We will cover how to use the tools to monitor the health and performance of our cluster. In addition, we will look at built-in logging, the Google Cloud Logging service, and Sysdig.
The following topics will be covered in this chapter:
- How Kuberentes uses cAdvisor, Heapster, InfluxDB, and Grafana
- Customizing the default Grafana dashboard
- Using Fluentd and Grafana
- Installing and using logging tools
- Working with popular third-party tools, such as Stackdriver and Sysdig, to extend our monitoring capabilities
You'll need to have your Google Cloud Platform account enabled and logged in to it, or you can use a local Minikube instance of Kubernetes. You can also use Play with Kubernetes over the web: https://labs.play-with-k8s.com/.
Real-world monitoring goes far beyond checking whether a system is up and running. Although health checks like those you learned in Chapter 2, Building a Foundation with Core Kubernetes Constructs, in the Health checks section can help us isolate problem applications, operations teams can best serve the business when they can anticipate the issues and mitigate them before a system goes offline.
The best practices in monitoring are to measure the performance and usage of core resources and watch for trends that stray from the normal baseline. Containers are not different here, and a key component to managing our Kubernetes cluster is having a clear view of the performance and availability of the OS, network, system (CPU and memory), and storage resources across all nodes.
In this chapter, we will examine several options to monitor and measure the performance and availability of all our cluster resources. In addition, we will look at a few options for alerting and notifications...
If you recall from Chapter 1, Introduction to Kubernetes, we noted that our nodes were already running a number of monitoring services. We can see these once again by running the get pods
command with the kube-system
namespace specified as follows:
$ kubectl get pods --namespace=kube-system
The following screenshot is the result of the preceding command:
System pod listing
Again, we see a variety of services, but how does this all fit together? If you recall, the node (formerly minions) section from Chapter 2, Building a Foundation with Core Kubernetes Constructs, each node is running a kubelet
. The kubelet
is the main interface for nodes to interact with and update the API server. One such update is the metrics of the node resources. The actual reporting of the resource usage is performed by a program named cAdvisor.
The cAdvisor program is another open source project from Google, which provides various metrics on container resource use. Metrics include CPU, memory, and network...
FluentD and Google Cloud Logging
Looking back at the System pod listing screenshot at the beginning of the chapter, you may have noted a number of pods starting with the words fluentd-cloud-logging-kubernetes
. These pods appear when using the GCE provider for your K8s cluster.
A pod like this exists on every node in our cluster, and its sole purpose is to handle the processing of Kubernetes logs. If we log in to our Google Cloud Platform account, we can see some of the logs processed there. Simply use the left side, and under Stackdriver
, select Logging
. This will take us to a log listing page with a number of drop-down menus on the top. If this is your first time visiting the page, the first drop-down will likely be set to Cloud HTTP Load Balancer
.
In this drop-down menu, we'll see a number of GCE types of entries. Select GCE VM instances and then the Kubernetes master or one of the nodes. In the second drop-down, we can choose various log groups, including kubelet.
We can also filter by...
Maturing our monitoring operations
While Grafana gives us a great start to monitoring our container operations, it is still a work in progress. In the real world of operations, having a complete dashboard view is great once we know there is a problem. However, in everyday scenarios, we'd prefer to be proactive and actually receive notifications when issues arise. This kind of alerting capability is a must to keep the operations team ahead of the curve and out of reactive mode.
There are many solutions available in this space, and we will take a look at two in particular: GCE monitoring (Stackdriver) and Sysdig.
Stackdriver is a great place to start for infrastructure in the public cloud. It is actually owned by Google, so it's integrated as the Google Cloud Platform monitoring service. Before your lock-in alarm bells start ringing, Stackdriver also has solid integration with AWS. In addition, Stackdriver has alerting capability with support for notification to a variety of...
We took a quick look at monitoring and logging with Kubernetes. You should now be familiar with how Kubernetes uses cAdvisor and Heapster to collect metrics on all the resources in a given cluster. Furthermore, we saw how Kubernetes saves us time by providing InfluxDB and Grafana set up and configured out of the box. Dashboards are easily customizable for our everyday operational needs.
In addition, we looked at the built-in logging capabilities with FluentD and the Google Cloud Logging service. Also, Kubernetes gives us great time savings by setting up the basics for us.
Finally, you learned about the various third-party options available to monitor our containers and clusters. Using these tools will allow us to gain even more insight into the health and status of our applications. All these tools combine to give us a solid toolset to manage day-to-day operations. Lastly, we explored different methods of installing Prometheus, with an eye on building more robust production systems...