Reader small image

You're reading from  The DevOps 2.5 Toolkit

Product typeBook
Published inNov 2019
PublisherPackt
ISBN-139781838647513
Edition1st Edition
Concepts
Right arrow
Author (1)
Viktor Farcic
Viktor Farcic
author image
Viktor Farcic

Viktor Farcic is a senior consultant at CloudBees, a member of the Docker Captains group, and an author. He codes using a plethora of languages starting with Pascal (yes, he is old), Basic (before it got the Visual prefix), ASP (before it got the .NET suffix), C, C++, Perl, Python, ASP.NET, Visual Basic, C#, JavaScript, Java, Scala, and so on. He never worked with Fortran. His current favorite is Go. Viktor's big passions are Microservices, Continuous Deployment, and Test-Driven Development (TDD). He often speaks at community gatherings and conferences. Viktor wrote Test-Driven Java Development by Packt Publishing, and The DevOps 2.0 Toolkit. His random thoughts and tutorials can be found in his blog—Technology Conversations
Read more about Viktor Farcic

Right arrow

Collecting and Querying Logs

In critical moments, men sometimes see exactly what they wish to see.

- Spock

So far, our primary focus was on metrics. We used them in different forms and for different purposes. In some cases, we used metrics to scale Pods and nodes. In others, metrics were used to create alerts that would notify us when there is an issue that cannot be fixed automatically. We also created a few dashboards.

However, metrics are often not enough. That is especially true when dealing with issues that require manual interventions. When metrics alone are insufficient, we usually need to consult logs hoping that they will reveal the cause of the problem.

Logging is often misunderstood or, to be more precise, mixed with metrics. For many, the line between logs and metrics is blurred. Some are extracting metrics from logs. Others are treating metrics and logs as the same...

Creating a cluster

You know the drill. We'll move into the directory with the vfarcic/k8s-specs (https://github.com/vfarcic/k8s-specs) repository, we'll pull the latest version of the code just in case I pushed something recently, and we'll create a new cluster unless you already have one at hand.

All the commands from this chapter are available in the 07-logging.sh (https://gist.github.com/vfarcic/74774240545e638b6cf0e01460894f34) Gist.
 1  cd k8s-specs
 2
3 git pull

This time, the requirements for the cluster changed. We need much more memory than before. The main culprit is ElasticSearch which is very resource hungry.

If you're using Docker for Desktop or minikube, you'll need to increase the memory dedicated to the cluster to 10 GB. If that's too much for your laptop, you might choose to read the Exploring Centralized Logging Through Elasticsearch...

Exploring logs through kubectl

The first contact most people have with logs in Kubernetes is through kubectl. It is almost unavoidable not to use it.

As we're learning how to tame the Kubernetes beast, we are bound to check logs when we get stuck. In Kubernetes, the term "logs" is reserved for the output produced by our and third-party applications running inside a cluster. However, those exclude the events generated by different Kubernetes resources. Even though many would call them logs as well, Kubernetes separates them from logs and calls them events. I'm sure that you already know how to retrieve logs from the applications and how to see Kubernetes events. Nevertheless, we'll explore them briefly here as well since that will add relevance to the discussion we'll have later on. I promise to keep it short, and you are free to skip this section...

Choosing a centralized logging solution

The first thing we need to do is to find a place where we'll store logs. Given that we want to have the ability to filter log entries, storing them in files should be discarded from the start. What we need is a database, of sorts. It is more important that it is fast than transactional, so we are most likely looking into a solution that is an in-memory database. But, before we take a look at the choices, we should discuss the location of our database. Should we run it inside our cluster, or should we use a service? Instead of making that decision right away, we'll explore both options, before we make a choice.

There are two major groups of logging-as-a-service types. If we are running our cluster with one of the Cloud providers, an obvious choice might be to use a logging solution they provide. EKS has AWS CloudWatch, GKE has GCP...

Exploring logs collection and shipping

For a long time now, there are two major contestants for the "logs collection and shipping" throne. Those are Logstash (https://www.elastic.co/products/logstash) and Fluentd (https://www.fluentd.org/). Both are open source, and both are widely accepted and actively maintained. While both have their pros and cons, Fluentd turned up to have an edge with cloud-native distributed systems. It consumes fewer resources and, more importantly, it is not tied to a single destination (Elasticsearch). While Logstash can push logs to many different targets, it is primarily designed to work with Elasticsearch. For that reason, other logging solutions adopted Fluentd.

As of today, no matter which logging product you embrace, the chances are that it will support Fluentd. The culmination of that adoption can be seen by Fluentd's entry into...

Exploring centralized logging through Papertrail

The first centralized logging solution we'll explore is Papertrail (https://papertrailapp.com/). We'll use it as a representative of a logging-as-a-service solution that can save us from installing and, more importantly, maintaining a self-hosted alternative.

Papertrail features live trailing, filtering by timestamps, powerful search queries, pretty colors, and quite a few other things that might (or might not) be essential when skimming through logs produced inside our clusters.

The first thing we need to do is to register or, if this is not the first time you tried Papertrail, to log in.

 1  open "https://papertrailapp.com/"

Please follow the instructions to register or to log in if you already have a user in their system.

You will be glad to find out that Papertrail provides a free plan that allows storage...

Combining GCP Stackdriver with a GKE cluster

If you're using GKE cluster, logging is already set up, even though you might not know about it. By default, every GKE cluster comes by default with a Fluentd DaemonSet that is configured to forward logs to GCP Stackdriver. It is running in the kube-system Namespace.

Let's describe GKE's Fluentd DaemonSet and see whether there is any useful information we might find.

 1  kubectl -n kube-system \
 2    describe ds -l k8s-app=fluentd-gcp

The output, limited to the relevant parts, is as follows.

...
Pod Template:
  Labels:     k8s-app=fluentd-gcp
              kubernetes.io/cluster-service=true
              version=v3.1.0
...
  Containers:
   fluentd-gcp:
    Image: gcr.io/stackdriver-agents/stackdriver-logging-agent:0.3-1.5.34-1-k8s-1
    ...

We can see that, among others, the DaemonSet's Pod Template has the label...

Combining AWS CloudWatch with an EKS cluster

Unlike GKE that has a logging solution baked into a cluster, EKS requires us to set up a solution. It does provide CloudWatch service, but we need to ensure that the logs are shipped there from our cluster.

Just as before, we'll use Fluentd to collect logs and ship them to CloudWatch. Or, to be more precise, we'll use a Fluentd tag built specifically for CloudWatch. As you probably already know, we'll also need an IAM policy that will allow Fluentd to communicate with CloudWatch.

All in all, the setup we are about to make will be very similar to the one we did with Papertrail, except that we'll store the logs in CloudWatch, and that we'll have to put some effort into creating AWS permissions.

Before we proceed, I'll assume that you still have the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY...

Combining Azure Log Analytics with an AKS cluster

Just like GKE (and unlike EKS), AKS comes with an integrated logging solution. All we have to do is enable one of the AKS addons. To be more precise, we'll enable the monitoring addon. As the name indicates, the addon does not fulfill only the needs to collect logs, but it also handles metrics. However, we are interested just in logs. I believe that nothing beats Prometheus for metrics, especially since it integrates with HorizontalPodAutoscaler. Still, you should explore AKS metrics as well and reach your own conclusion. For now, we'll explore only the logging part of the addon.

 1  az aks enable-addons \
 2    -a monitoring \
 3    -n devops25-cluster \
 4    -g devops25-group

The output is a rather big JSON with all the information about the newly enabled monitoring addon. There's nothing exciting in it.

It&apos...

Exploring centralized logging through Elasticsearch, Fluentd, and Kibana

Elasticsearch is probably the most commonly used in-memory database. At least, if we narrow the scope to self-hosted databases. It is designed for many other scenarios, and it can be used to store (almost) any type of data. As such, it is almost perfect for storing logs which could come in many different formats. Given its flexibility, some use it for metrics as well and, as such, Elasticsearch competes with Prometheus. We'll leave metrics aside, for now, and focus only on logs.

The EFK (Elasticsearch, Fluentd, and Kibana) stack consists of three components. Data is stored in Elasticsearch, logs are collected, transformed, and pushed to the DB by Fluentd, and Kibana is used as UI through which we can explore data stored in Elasticsearch. If you are used to ELK (Logstash instead of Fluentd), the setup...

Switching to Elasticsearch for storing metrics

Now that we had Elasticsearch running in our cluster and knowing that it can handle almost any data type, a logical question could be whether we can use it to store our metrics besides logs. If you explore elastic.co (https://www.elastic.co/), you'll see that metrics are indeed something they advertise. If it could replace Prometheus, it would undoubtedly be beneficial to have a single tool that can handle not only logs but also metrics. On top of that, we could ditch Grafana and keep Kibana as a single UI for both data types.

Nevertheless, I would strongly advise against using Elasticsearch for metrics. It is a general-purpose free-text no-SQL database. That means that it can handle almost any data but, at the same time, it does not excel at any specific format. Prometheus, on the other hand, is designed to store time-series...

What should we expect from centralized logging?

We explored several products that can be used to centralize logging. As you saw, all are very similar, and we can assume that most of the other solutions follow the same principles. We need to collect logs across the cluster. We used Fluentd for that, which is the most widely accepted solution that you will likely use no matter which database receives those logs (Azure being an exception).

Log entries collected with Fluentd are shipped to a database which, in our case, is Papertrail, Elasticsearch, or one of the solutions provided by hosting vendors. Finally, all solutions offer a UI that allows us to explore the logs.

I usually provide a single solution for a problem but, in this case, there are quite a few candidates for your need for centralized logging. Which one should you choose? Will it be Papertrail, Elasticsearch-Fluentd...

What now?

You know what to do. Destroy the cluster if you created it specifically for this chapter.

Before you leave, you might want to go over the main points of this chapter.

  • For any but the smallest systems, going from one resource to another and from one node to another to find the cause of an issue is anything but practical, reliable, and fast.
  • More often than not, kubectl logs command does not provide us with enough options to perform anything but simplest retrieval of logs.
  • Elasticsearch is excellent, but it does too much. Its lack of focus makes it inferior to Prometheus for storing and querying metrics, as well as sending alerts based on such data.
  • Logs themselves are too expensive to parse, and most of the time they do not provide enough data to act as metrics.
  • We need logs centralized in a single location so that we can explore logs from any part of the system.
  • We...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
The DevOps 2.5 Toolkit
Published in: Nov 2019Publisher: PacktISBN-13: 9781838647513
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Viktor Farcic

Viktor Farcic is a senior consultant at CloudBees, a member of the Docker Captains group, and an author. He codes using a plethora of languages starting with Pascal (yes, he is old), Basic (before it got the Visual prefix), ASP (before it got the .NET suffix), C, C++, Perl, Python, ASP.NET, Visual Basic, C#, JavaScript, Java, Scala, and so on. He never worked with Fortran. His current favorite is Go. Viktor's big passions are Microservices, Continuous Deployment, and Test-Driven Development (TDD). He often speaks at community gatherings and conferences. Viktor wrote Test-Driven Java Development by Packt Publishing, and The DevOps 2.0 Toolkit. His random thoughts and tutorials can be found in his blog—Technology Conversations
Read more about Viktor Farcic