Reader small image

You're reading from  The Linux DevOps Handbook

Product typeBook
Published inNov 2023
PublisherPackt
ISBN-139781803245669
Edition1st Edition
Concepts
Right arrow
Authors (2):
Damian Wojsław
Damian Wojsław
author image
Damian Wojsław

Damian Wojsław has been working in the IT industry since 2001. He specializes in administration and troubleshooting of Linux servers. Being a system operator and support engineer he has found DevOps philosophy a natural evolution of the way sysops work with developers and other members of the software team.
Read more about Damian Wojsław

Grzegorz Adamowicz
Grzegorz Adamowicz
author image
Grzegorz Adamowicz

Grzegorz Adamowicz has been working in the IT industry since 2006 in a number of positions, including Systems Administrator, Backend Developer (PHP, Python), Systems Architect and Site Reliability Engineer. Professionally was focused on building tools and automations inside projects he is involved in. He's also engaged with the professional community by organizing events like conferences and workshops. Grzegorz worked in many industries including Oil & Gas, Hotel, Fintech, DeFI, Automotive, Space and many more.
Read more about Grzegorz Adamowicz

View More author details
Right arrow

Monitoring, Tracing, and Distributed Logging

Applications developed nowadays tend to be running inside Docker containers or as a serverless application stack. Traditionally, applications were built as a monolithic entity—one process running on a server. All logs were stored on a disk. It made it easy to get to the right information quickly. To diagnose a problem with your application, you had to log in to a server and search through logs or stack traces to get to the bottom of the problem. But when you run your application inside a Kubernetes cluster in multiple containers that are executed on different servers, things get complicated.

This also makes it very difficult to store logs, let alone view them. In fact, while running applications inside a container, it’s not advisable to save any files inside it. Oftentimes, we run those containers in a read-only filesystem. This is understandable as you should treat a running container as an ephemeral identity that can be...

Differences between monitoring, tracing, and logging

You will hear these terms being used interchangeably depending on the context and person you’re talking to, but there’s a subtle and very important difference between them.

Monitoring refers to instrumenting your servers and applications and gathering data about them for processing, identifying problems, and, in the end, bringing results in front of interested parties. This also includes alerting.

Tracing, on the other hand, is more specific, as we already mentioned. Trace data can tell you a lot about how your system is performing. With tracing, you can observe statistics that are very useful to developers (such as how long a function ran and whether the SQL query is fast or bottleneck), DevOps engineers (how long we were waiting for a database or network), or even the business (what was the experience of the user with our application?). So, you can see that when it’s used right, it can be a very powerful...

Cloud solutions

Every cloud provider out there is fully aware of the need for proper monitoring and distributed logging, so they will have built their own native solutions. Sometimes it’s worth using native solutions, but not always. Let’s take a look at the major cloud providers and what they have to offer.

One of the first services available in AWS was CloudWatch. At first, it would just collect all kinds of metrics and allow you to create dashboards to better understand system performance and easily spot issues or simply a denial-of-service attack, which in turn allowed you to quickly react to them.

Another function of CloudWatch is alerting, but it’s limited to sending out emails using another Amazon service, Simple Email Service. Alerting and metrics could also trigger other actions inside your AWS account, such as scaling up or down the number of running instances.

As of the time of writing this book, CloudWatch can do so much more than monitoring...

Open source solutions for self-hosting

One of the most popular projects built around monitoring that is also adopted by commercial solutions is OpenTelemetry. It’s an open source project for application monitoring and observability. It provides a set of APIs, libraries, agents, and integrations for collecting, processing, and exporting telemetry data such as traces, metrics, and logs from different sources in distributed systems. OpenTelemetry is designed to be vendor-agnostic and cloud-native, meaning it can work with various cloud providers, programming languages, frameworks, and architectures.

The main goal of OpenTelemetry is to provide developers and operators with a unified and standardized way to instrument, collect, and analyze telemetry data across the entire stack of their applications and services, regardless of the underlying infrastructure. OpenTelemetry supports different data formats, protocols, and export destinations, including popular observability platforms...

SaaS solutions

SaaS monitoring solutions are the easiest (and most expensive) to use. In most cases, what you’ll need to do is install and configure a small daemon (agent) on your servers or inside a cluster. And there you go, all your monitoring data is visible within minutes. SaaS is great if your team doesn’t have the capacity to implement other solutions but your budget allows you to use one. Here are some more popular applications for handling your monitoring, tracing, and logging needs.

Datadog

Datadog is a monitoring and analytics platform that provides visibility into the performance and health of applications, infrastructure, and networks. It was founded in 2010 by Olivier Pomel and Alexis Lê-Quôc and is headquartered in New York City, with offices around the world. According to Datadog’s financial report for the fiscal year 2021 (ending December 31, 2021), their total revenue was $2.065 billion, which represents a 60% increase from the...

Log and metrics retention

Data retention refers to the practice of retaining data, or keeping data stored for a certain period of time. This can involve storing data on servers, hard drives, or other storage devices. The purpose of data retention is to ensure that data is available for future use or analysis.

Data retention policies are often developed by organizations to determine how long specific types of data should be retained. These policies may be driven by regulatory requirements, legal obligations, or business needs. For example, some regulations may require financial institutions to retain transaction data for a certain number of years, while businesses may choose to retain customer data for marketing or analytics purposes.

Data retention policies typically include guidelines for how data should be stored, how long it should be retained, and when it should be deleted. Effective data retention policies can help organizations to manage their data more efficiently, reduce...

Summary

In this chapter, we covered the differences between monitoring, tracing, and logging. Monitoring is the process of observing and collecting data on a system to ensure it’s running correctly. Tracing is the process of tracking requests as they flow through a system to identify performance issues. Logging is the process of recording events and errors in a system for later analysis.

We also discussed cloud solutions for monitoring, logging, and tracing in Azure, GCP, and AWS. For Azure, we mentioned Azure Monitor for monitoring and Azure Application Insights for tracing. For AWS, we mentioned CloudWatch for monitoring and logging, and X-Ray for tracing.

We then went on to explain and provide an example of configuring the AWS CloudWatch agent on an EC2 instance. We also introduced AWS X-Ray with a code example to show how it can be used to trace requests in a distributed system.

Finally, we named some open source and SaaS solutions for monitoring, logging, and tracing...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Linux DevOps Handbook
Published in: Nov 2023Publisher: PacktISBN-13: 9781803245669
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Damian Wojsław

Damian Wojsław has been working in the IT industry since 2001. He specializes in administration and troubleshooting of Linux servers. Being a system operator and support engineer he has found DevOps philosophy a natural evolution of the way sysops work with developers and other members of the software team.
Read more about Damian Wojsław

author image
Grzegorz Adamowicz

Grzegorz Adamowicz has been working in the IT industry since 2006 in a number of positions, including Systems Administrator, Backend Developer (PHP, Python), Systems Architect and Site Reliability Engineer. Professionally was focused on building tools and automations inside projects he is involved in. He's also engaged with the professional community by organizing events like conferences and workshops. Grzegorz worked in many industries including Oil & Gas, Hotel, Fintech, DeFI, Automotive, Space and many more.
Read more about Grzegorz Adamowicz