Reader small image

You're reading from  The DevOps 2.5 Toolkit

Product typeBook
Published inNov 2019
PublisherPackt
ISBN-139781838647513
Edition1st Edition
Concepts
Right arrow
Author (1)
Viktor Farcic
Viktor Farcic
author image
Viktor Farcic

Viktor Farcic is a senior consultant at CloudBees, a member of the Docker Captains group, and an author. He codes using a plethora of languages starting with Pascal (yes, he is old), Basic (before it got the Visual prefix), ASP (before it got the .NET suffix), C, C++, Perl, Python, ASP.NET, Visual Basic, C#, JavaScript, Java, Scala, and so on. He never worked with Fortran. His current favorite is Go. Viktor's big passions are Microservices, Continuous Deployment, and Test-Driven Development (TDD). He often speaks at community gatherings and conferences. Viktor wrote Test-Driven Java Development by Packt Publishing, and The DevOps 2.0 Toolkit. His random thoughts and tutorials can be found in his blog—Technology Conversations
Read more about Viktor Farcic

Right arrow

Debugging Issues Discovered Through Metrics and Alerts

When you eliminate the impossible, whatever remains, however improbable, must be the truth.

- Spock

So far, we explored how to gather metrics and how to create alerts that will notify us when there is an issue. We also learned how to query metrics and dig for information we might need when trying to find the cause of a problem. We'll expand on that and try to debug a simulated issue.

Saying that an application does not work correctly should not be enough by itself. We should be much more precise. Our goal is to be able to pinpoint not only which application is malfunctioning, but also which part of it is the culprit. We should be able to blame a specific function, a method, a request path, and so on. The more precise we are in detecting which part of an application is causing a problem, the faster we will find the cause...

Creating a cluster

The vfarcic/k8s-specs (https://github.com/vfarcic/k8s-specs) repository will continue being our source of Kubernetes definitions we'll use for our examples. We'll make sure that it is up-to-date by pulling the latest version.

All the commands from this chapter are available in the 04-instrument.sh (https://gist.github.com/vfarcic/851b37be06bb7652e55529fcb28d2c16) Gist. Just as in the previous chapter, it contains not only the commands but also Prometheus' expressions. They are all commented (with #). If you're planning to copy and paste the expressions from the Gist, please exclude the comments. Each expression has # Prometheus expression comment on top to help you identify it.
 1  cd k8s-specs
 2
 3  git pull

Given that we learned how to install a fully operational Prometheus and the rest of the tools from its chart, and that we'll...

Facing a disaster

Let's explore one disaster scenario. Frankly, it's not going to be a real disaster, but it will require us to find a solution to an issue.

We'll start by installing the already familiar go-demo-5 application.

 1  GD5_ADDR=go-demo-5.$LB_IP.nip.io
 2
 3  helm install \
 4      https://github.com/vfarcic/go-demo-5/releases/download/
0.0.1/go-demo-5-0.0.1.tgz \
5 --name go-demo-5 \ 6 --namespace go-demo-5 \ 7 --set ingress.host=$GD5_ADDR 8 9 kubectl -n go-demo-5 \ 10 rollout status \ 11 deployment go-demo-5

We declared GD5_ADDR with the address through which we'll be able to access the application. We used it as ingress.host variable when we installed the go-demo-5 Chart. To be on the safe side, we waited until the app rolled out, and all that's left, from the deployment perspective, is to confirm that it...

Using instrumentation to provide more detailed metrics

We shouldn't just say that the go-demo-5 application is slow. That would not provide enough information for us to quickly inspect the code in search of the exact cause of that slowness. We should be able to do better and deduce which part of the application is misbehaving. Can we pinpoint a specific path that produces slow responses? Are all methods equally slow, or the issue is limited only to one? Do we know which function produces slowness? There are many similar questions we should be able to answer in situations like that. But we can't, with the current metrics. They are too generic, and they can usually only tell us that a specific Kubernetes resource is misbehaving. The metrics we're collecting are too broad to answer application-specific questions.

The metrics we explored so far are a combination of...

Using internal metrics to debug potential issues

We'll re-send requests with slow responses again so that we get to the same point where we started this chapter.

 1  for i in {1..20}; do
 2      DELAY=$[ $RANDOM % 10000 ]
 3      curl "http://$GD5_ADDR/demo/hello?delay=$DELAY"
 4  done
5 6 open "http://$PROM_ADDR/alerts"

We sent twenty requests that will result in responses with random duration (up to ten seconds). Further on, we opened Prometheus' alerts screen.

A while later, the AppTooSlow alert should fire (remember to refresh your screen), and we have a (simulated) problem that needs to be solved. Before we start panicking and do something hasty, we'll try to find the cause of the issue.

Please click the expression of the AppTooSlow alert.

We are redirected to the graph screen with the pre-populated expression from the alert. Feel free...

What now?

I don't believe that we need many other examples of instrumented metrics. They are not any different than those we are collecting through exporters. I'll leave it up to you to start instrumenting your applications. Start small, see what works well, improve and extend.

Yet another chapter is finished. Destroy your cluster and start the next one fresh, or keep it. If you choose the latter, please execute the commands that follow to remove the go-demo-5 application.

 1  helm delete go-demo-5 --purge
 2
 3  kubectl delete ns go-demo-5

Before you leave, remember the point that follows. It summarizes instrumentation.

  • Instrumented metrics are baked into applications. They are an integral part of the code of our apps, and they are usually exposed through the /metrics endpoint.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
The DevOps 2.5 Toolkit
Published in: Nov 2019Publisher: PacktISBN-13: 9781838647513
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Viktor Farcic

Viktor Farcic is a senior consultant at CloudBees, a member of the Docker Captains group, and an author. He codes using a plethora of languages starting with Pascal (yes, he is old), Basic (before it got the Visual prefix), ASP (before it got the .NET suffix), C, C++, Perl, Python, ASP.NET, Visual Basic, C#, JavaScript, Java, Scala, and so on. He never worked with Fortran. His current favorite is Go. Viktor's big passions are Microservices, Continuous Deployment, and Test-Driven Development (TDD). He often speaks at community gatherings and conferences. Viktor wrote Test-Driven Java Development by Packt Publishing, and The DevOps 2.0 Toolkit. His random thoughts and tutorials can be found in his blog—Technology Conversations
Read more about Viktor Farcic