Reader small image

You're reading from  The KCNA Book

Product typeBook
Published inJun 2023
PublisherPackt
ISBN-139781835080399
Edition1st Edition
Right arrow
Author (1)
Nigel Poulton
Nigel Poulton
author image
Nigel Poulton

Nigel Poulton is a cloud-native subject matter expert who spends his life creating books and training videos on the latest cloud technologies. He is the author of best-selling books on Docker and Kubernetes and the most popular online training videos on the same topic. He is a Docker Captain. Prior to this, Nigel has held various infrastructure roles for large enterprises. When he is not playing with technology, he is dreaming about it. When he is not dreaming about it, he is reading and watching sci-fi. He wishes he lived in the future so he could explore spacetime, the universe, and tons of other mind-blowing stuff. He likes cars, football (soccer), and food. He has a fabulous wife and three children.
Read more about Nigel Poulton

Right arrow

6: Cloud native observability

This chapter covers the topics required to pass the cloud native observability section of the exam and accounts for 8% of your exam grade.

The chapter is divided as follows.

  • Primer
  • Telemetry and observability
  • Prometheus
  • Cost management
  • Chapter summary
  • Exam essentials
  • Recap questions

Primer

Cloud native applications are deployed as lots of small moving parts that connect over the network and are frequently updated and replaced. This makes it more important than ever to be able to see how applications are connected and have the ability to track requests as they traverse the various distributed microservices of an application.

To assist with this, every microservice needs to output high-quality telemetry in the form of metrics and log data. You then need the right tools to collect the telemetry, aggregate it, and analyse it for troubleshooting and making adjustments. The process is a feedback loop that looks like Figure 6.1 – observe > analyse > adjust.

Figure 6.1

Even though we’ve been doing things like this for years with traditional monolithic apps, all of the following are either new or far more important with cloud native apps.

  • Tracing requests
  • Application-specific metrics
  • Network latency
  • Scaling decisions
  • The...

Telemetry and observability

Telemetry is jargon for logs, metrics, and traces. This is observed by remote monitoring platforms that use it to learn how systems work, diagnose problems, and measure and optimise performance. A highly observable system is one that generates high-quality telemetry.

Consider the following quick example. You have an in-house cloud native application comprising lots of small microservices. You also have a centralised monitoring platform. Each microservice outputs telemetry that is collected, stored, and analysed by the monitoring platform.

As mentioned already, cloud native telemetry comes in three major flavours that we sometimes call signal types, verticals, or classes.

  • Logs
  • Metrics
  • Traces

Each serves a different purpose, but collectively they provide detailed insights that can be vital in understanding, troubleshooting, and optimising microservices applications.

The OpenTelemetry project is an incubating CNCF project that aims to standardise...

Prometheus

Prometheus is considered the industry standard monitoring tool for Kubernetes and cloud native environments. It was created by SoundCloud in 2012 when they realised existing monitoring tools weren’t good enough for what they needed. It was donated to the CNCF as an open-source project in 2016 and became the second project to graduate the CNCF in late 2018.

It’s a common pattern to run cloud native applications on Kubernetes, use Prometheus for monitoring and alerting, and use Grafana for dashboards and visualisation.

How Prometheus works

At a high level, every microservice should generate high-quality telemetry in the form of logs and metrics. Prometheus collects these and stores them where they can be queried and analysed to help with troubleshooting and optimisation. It can also feed into Grafana for high-quality dashboards and visualisations.

Digging a little deeper…

Prometheus expects applications to expose metrics via the /metrics HTTP endpoint...

Cost management

Effective cost management is a key part of managing cloud native apps and infrastructure, and you should always consider the following three things.

  1. Choosing the right infrastructure
  2. Rightsizing
  3. Turning unused infrastructure off

Choosing the right infrastructure

Choosing the right cloud platform is a vital decision. However, even after you’ve chosen your cloud, there are still a lot of decisions that have a high impact on costs and cost management.

For example, most clouds offer the following instance categories – instance is the technical term for a virtual server on a cloud.

  • On-demand
  • Reserved
  • Spot

On-demand instances are the most common and most flexible, however, they’re the most expensive. You can create and delete them “on demand” and they’re yours for as long as you need them.

Reserved instances come at a discounted price in exchange for a long-term commitment. For example, if you commit to a use...

Chapter summary

In this chapter, you learned the three main types of telemetry are logs, metrics, and traces. Logs are application events and are a great tool for determining why an application or microservice isn’t working. Metrics provide historical trends and help identify changes over time. Traces let you track requests as they traverse today’s complex microservices apps. You also learned that the OpenTelemetry project offers specifications and instrumentation to standardise and simplify the way we generate and store telemetry.

You learned that Prometheus is the most popular monitoring platform used with Kubernetes and is often coupled with Grafana for dashboards and visualisations. It expects services to expose telemetry on the /metrics endpoint and operates a pull model where it “scrapes” the data at periodic intervals. It has a push gateway for getting telemetry from short-lived processes, and an alert manager for sending out alerts.

You finished...

Exam essentials

Telemetry
Telemetry is jargon for the monitoring-related outputs generated by a system. They include logs, metrics, and traces. High-quality telemetry is vital for troubleshooting, tweaking, and understanding the behaviour of cloud native apps.
OpenTelemetry
The OpenTelemetry project is an incubating CNCF project that defines specifications and instrumentation aimed at making high-quality telemetry part of all cloud native apps.
Logs
Logs are a form of telemetry relating to application events and are one of the best places to look when things break. For example, application restarts, failed logins, dropped connections, and more will all be logged as events. Log data is often structured and formatted as JSON to make it easy to read and search. It’s also commonly categorised according to severity, with common severity levels including info, warning, error, and critical.
Metrics
Metrics are usually performance related data captured over long...

Recap questions

See Appendix A for answers.

1. Which of the following best describes telemetry?

    1. Remote shell access to a system
    1. A map diagram of a microservices app
    1. Pay-as-you-use cloud infrastructure
    1. Health and performance related system outputs

2. Which of the following are the three main types of telemetry data? Choose three.

    1. Logs
    1. Traces
    1. Metrics
    1. Audits

3. What format does Prometheus require metrics in?

    1. Hexadecimal
    1. YAML
    1. SOAP
    1. Timeseries

4. Which of the following describes the main aim of telemetry and observability?

    1. Troubleshooting and auditing systems
    1. Troubleshooting and improving systems
    1. Auditing and improving systems
    1. Troubleshooting...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
The KCNA Book
Published in: Jun 2023Publisher: PacktISBN-13: 9781835080399
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Nigel Poulton

Nigel Poulton is a cloud-native subject matter expert who spends his life creating books and training videos on the latest cloud technologies. He is the author of best-selling books on Docker and Kubernetes and the most popular online training videos on the same topic. He is a Docker Captain. Prior to this, Nigel has held various infrastructure roles for large enterprises. When he is not playing with technology, he is dreaming about it. When he is not dreaming about it, he is reading and watching sci-fi. He wishes he lived in the future so he could explore spacetime, the universe, and tons of other mind-blowing stuff. He likes cars, football (soccer), and food. He has a fabulous wife and three children.
Read more about Nigel Poulton