You're reading from The KCNA Book

Product typeBook

Published inJun 2023

PublisherPackt

ISBN-139781835080399

Edition1st Edition

Tools

Kubernetes

Concepts

Containerization

Author (1)

Nigel Poulton

6: Cloud native observability

This chapter covers the topics required to pass the cloud native observability section of the exam and accounts for 8% of your exam grade.

The chapter is divided as follows.

Primer
Telemetry and observability
Prometheus
Cost management
Chapter summary
Exam essentials
Recap questions

Primer

Cloud native applications are deployed as lots of small moving parts that connect over the network and are frequently updated and replaced. This makes it more important than ever to be able to see how applications are connected and have the ability to track requests as they traverse the various distributed microservices of an application.

To assist with this, every microservice needs to output high-quality telemetry in the form of metrics and log data. You then need the right tools to collect the telemetry, aggregate it, and analyse it for troubleshooting and making adjustments. The process is a feedback loop that looks like Figure 6.1 – observe > analyse > adjust.

Even though we’ve been doing things like this for years with traditional monolithic apps, all of the following are either new or far more important with cloud native apps.

Tracing requests
Application-specific metrics
Network latency
Scaling decisions
The...

Telemetry and observability

Telemetry is jargon for logs, metrics, and traces. This is observed by remote monitoring platforms that use it to learn how systems work, diagnose problems, and measure and optimise performance. A highly observable system is one that generates high-quality telemetry.

Consider the following quick example. You have an in-house cloud native application comprising lots of small microservices. You also have a centralised monitoring platform. Each microservice outputs telemetry that is collected, stored, and analysed by the monitoring platform.

As mentioned already, cloud native telemetry comes in three major flavours that we sometimes call signal types, verticals, or classes.

Logs
Metrics
Traces

Each serves a different purpose, but collectively they provide detailed insights that can be vital in understanding, troubleshooting, and optimising microservices applications.

The OpenTelemetry project is an incubating CNCF project that aims to standardise...

Prometheus

Prometheus is considered the industry standard monitoring tool for Kubernetes and cloud native environments. It was created by SoundCloud in 2012 when they realised existing monitoring tools weren’t good enough for what they needed. It was donated to the CNCF as an open-source project in 2016 and became the second project to graduate the CNCF in late 2018.

It’s a common pattern to run cloud native applications on Kubernetes, use Prometheus for monitoring and alerting, and use Grafana for dashboards and visualisation.

How Prometheus works

At a high level, every microservice should generate high-quality telemetry in the form of logs and metrics. Prometheus collects these and stores them where they can be queried and analysed to help with troubleshooting and optimisation. It can also feed into Grafana for high-quality dashboards and visualisations.

Digging a little deeper…

Prometheus expects applications to expose metrics via the /metrics HTTP endpoint...

Cost management

Effective cost management is a key part of managing cloud native apps and infrastructure, and you should always consider the following three things.

Choosing the right infrastructure
Rightsizing
Turning unused infrastructure off

Choosing the right infrastructure

Choosing the right cloud platform is a vital decision. However, even after you’ve chosen your cloud, there are still a lot of decisions that have a high impact on costs and cost management.

For example, most clouds offer the following instance categories – instance is the technical term for a virtual server on a cloud.

On-demand
Reserved
Spot

On-demand instances are the most common and most flexible, however, they’re the most expensive. You can create and delete them “on demand” and they’re yours for as long as you need them.

Reserved instances come at a discounted price in exchange for a long-term commitment. For example, if you commit to a use...

Chapter summary

In this chapter, you learned the three main types of telemetry are logs, metrics, and traces. Logs are application events and are a great tool for determining why an application or microservice isn’t working. Metrics provide historical trends and help identify changes over time. Traces let you track requests as they traverse today’s complex microservices apps. You also learned that the OpenTelemetry project offers specifications and instrumentation to standardise and simplify the way we generate and store telemetry.

You learned that Prometheus is the most popular monitoring platform used with Kubernetes and is often coupled with Grafana for dashboards and visualisations. It expects services to expose telemetry on the /metrics endpoint and operates a pull model where it “scrapes” the data at periodic intervals. It has a push gateway for getting telemetry from short-lived processes, and an alert manager for sending out alerts.

You finished...

Exam essentials

Telemetry: Telemetry is jargon for the monitoring-related outputs generated by a system. They include logs, metrics, and traces. High-quality telemetry is vital for troubleshooting, tweaking, and understanding the behaviour of cloud native apps.
OpenTelemetry: The OpenTelemetry project is an incubating CNCF project that defines specifications and instrumentation aimed at making high-quality telemetry part of all cloud native apps.
Logs: Logs are a form of telemetry relating to application events and are one of the best places to look when things break. For example, application restarts, failed logins, dropped connections, and more will all be logged as events. Log data is often structured and formatted as JSON to make it easy to read and search. It’s also commonly categorised according to severity, with common severity levels including info, warning, error, and critical.
Metrics: Metrics are usually performance related data captured over long...

Recap questions

See Appendix A for answers.

1. Which of the following best describes telemetry?

1. Remote shell access to a system
1. A map diagram of a microservices app
1. Pay-as-you-use cloud infrastructure
1. Health and performance related system outputs

2. Which of the following are the three main types of telemetry data? Choose three.

1. Logs
1. Traces
1. Metrics
1. Audits

3. What format does Prometheus require metrics in?

1. Hexadecimal
1. YAML
1. SOAP
1. Timeseries

4. Which of the following describes the main aim of telemetry and observability?

1. Troubleshooting and auditing systems
1. Troubleshooting and improving systems
1. Auditing and improving systems
1. Troubleshooting...

The rest of the chapter is locked

You have been reading a chapter from

The KCNA Book

Published in: Jun 2023Publisher: PacktISBN-13: 9781835080399

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Nigel Poulton

Nigel Poulton is a cloud-native subject matter expert who spends his life creating books and training videos on the latest cloud technologies. He is the author of best-selling books on Docker and Kubernetes and the most popular online training videos on the same topic. He is a Docker Captain. Prior to this, Nigel has held various infrastructure roles for large enterprises. When he is not playing with technology, he is dreaming about it. When he is not dreaming about it, he is reading and watching sci-fi. He wishes he lived in the future so he could explore spacetime, the universe, and tons of other mind-blowing stuff. He likes cars, football (soccer), and food. He has a fabulous wife and three children.
Read more about Nigel Poulton

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2