You're reading from Mastering Prometheus

Product typeBook

Published inApr 2024

PublisherPackt

ISBN-139781805125662

Edition1st Edition

Concepts

DevOps

Author (1)

William Hegedus

Enabling Systems Monitoring with the Node Exporter

Now that we’re all experts on running Prometheus itself, it’s time to get into what truly sets Prometheus apart: the ecosystem around it. Prometheus has a vibrant community with a multitude of open source projects that extend it and expose data to it. Many Prometheus-related projects are exporters.

The term “exporter” in Prometheus refers to any application that runs independently to expose metrics from some other data source that is not exposing Prometheus metrics natively. There are exporters for almost anything you can think of, from MySQL to Minecraft, and a non-exhaustive list can be found on Prometheus’s official docs site at https://prometheus.io/docs/instrumenting/exporters/. However, in this chapter, we’re going to focus on the most popular and common exporter: the Node Exporter.

We’re going to cover the following main topics:

Node Exporter overview
Default...

Technical requirements

For this chapter, you can connect to the Prometheus cluster we created in Chapter 2 to follow along with exploring metrics, but the only true requirement for this chapter is optional. It is only needed if you want to experiment with the basics of writing a Prometheus exporter:

go: https://go.dev/dl/

Code used in this chapter is available at https://github.com/PacktPublishing/Mastering-Prometheus.

Note

This chapter focuses only on the Node Exporter, which is only useful for systems with *NIX kernels (e.g., Linux, FreeBSD, MacOS, etc.). For Windows systems, a similar – but separate – exporter exists called the Windows Exporter (https://github.com/prometheus-community/windows_exporter).

Node Exporter overview

The Node Exporter is one of the select few exporters maintained by the official Prometheus project, alongside others such as the Blackbox Exporter and the SNMP Exporter. Its purpose is to expose a variety of machine-level metrics pertaining to resources such as CPU, disk, memory, networking, and more.

One of the things I often say to people asking whether we have some system-level metric in Prometheus is, “If it’s in /proc, the Node Exporter can get it.”

What’s /proc?

In Linux systems, a /proc directory exists that contains a plethora of information about the state of the machine. The Linux kernel documentation describes it thusly:

The proc file system acts as an interface to internal data structures in the kernel. It can be used to obtain information about the system and to change certain kernel parameters at runtime (sysctl).

The Node Exporter primarily retrieves data through the /proc pseudo-filesystem. There are...

Default collectors

At the time of writing, there are a whopping 49 different collectors that are enabled by default in the Node Exporter. Many of them are either niche (such as dmi) or dependent on your infrastructure (such as filesystem collectors for xfs and zfs). Rather than go through all of them, we’ll take a look at some of the most useful ones to see what info they provide and why you would care about it.

conntrack

The conntrack collector exposes metrics related to the Linux kernel’s netfilter connection tracking subsystem. This is used to keep track of connections established to your server and can cause issues when the table it uses becomes full.

Two commonly used metrics from this collector are as follows:

...

The textfile collector

The textfile collector is a hidden gem in the Node Exporter. This single collector adds tremendous versatility to what you can accomplish with a Prometheus monitoring stack.

Using the textfile collector, you can read Prometheus-formatted metrics from files on the server and include them in the output of the Node Exporter’s /metrics scrape endpoint.

Being able to read metrics from a file opens up a whole new world of monitoring short-lived processes such as batch or cron jobs, where it isn’t possible or doesn’t make sense to expose metrics on an HTTP endpoint for Prometheus to scrape. For example, at my company, we leverage the textfile collector to expose metrics related to the last time a server executed its scheduled synchronization with our configuration management system.

The textfile collector is enabled by default but requires additional configuration to actually work. For the collector to work, it must know where it should...

Troubleshooting the Node Exporter

I would be remiss if I gave you the impression that the Node Exporter just magically works 100% of the time. Undoubtedly, you’ll experience issues where Node Exporter scrapes begin experiencing issues such as slow scrapes or even timeouts. Thankfully, the Node Exporter provides us with some per-collector metrics to help pinpoint where the issue lies.

The node_scrape_collector_success metric returns whether or not running an individual collector was successful. But wait – before you go putting alerts in for any time any node_scrape_collector_success time series returns a 0, remember that not all of the collectors that are enabled by default are expected to apply to your system. For example, I seriously doubt your server has both InfiniBand and Fibre Channel connections (most likely you have neither), so something’s always going to be marked as failing.

Instead, the metric I tend to look at the most for Node Exporter troubleshooting...

Summary

In this chapter, we learned all about Prometheus’s most popular exporter, the Node Exporter. We went over the basics of what an exporter is and what is involved in creating one. Then, we dove headfirst into the dozens of collectors that the Node Exporter enables by default. Finally, we looked at how to use the textfile collector and how to troubleshoot issues with the Node Exporter.

In our next chapter, we’re going to be stepping out of the realm of “vanilla” Prometheus and begin looking at how we can extend and augment Prometheus through the use of other open source projects. To begin, we’ll see how projects such as VictoriaMetrics and Grafana Mimir can function as remote storage for Prometheus metrics.

The /proc filesystem: https://www.kernel.org/doc/html/latest/filesystems/proc.html
Understanding and Building Exporters: https://training.promlabs.com/training/understanding-and-building-exporters
PSI - Pressure Stall Information: https://www.kernel.org/doc/html/latest/accounting/psi.html
Awesome Prometheus alerts for Node Exporter: https://samber.github.io/awesome-prometheus-alerts/rules#host-and-hardware

The rest of the chapter is locked

You have been reading a chapter from

Mastering Prometheus

Published in: Apr 2024Publisher: PacktISBN-13: 9781805125662

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus

Personalised recommendations for you

Based on your interests and search pattern

Designing and Implementing Microsoft Azure Networking Solutions

Designing and Implementing Microsoft Azure Networking Solutions Exam Ref AZ-700 is an all-encompassing guide to the AZ-700 exam and contains all the information you need to succeed in the world of virtual networking with Azure. With this book, you will be fully prepared for the exam and the world of cloud networking.

BookAug 2023524 pages

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

BookAug 2023630 pages

Zero Trust Overview and Playbook Introduction

Get started on Zero Trust with this step-by-step playbook and learn everything you need to know for a successful Zero Trust journey with tailored guidance for every role, covering strategy, operations, architecture, implementation, and measuring success. This book will become an indispensable reference for everyone in your organization.

BookOct 2023240 pages

The Self-Taught Cloud Computing Engineer

This self-study book helps you master multiple clouds, including AWS, Azure, and GCP, and serves as a roadmap to becoming a certified cloud computing expert. The book will guide you to develop a professional cloud career by helping you build a broad cloud knowledge base, developing hands-on cloud computing skills, and getting cloud certified.

BookSep 2023472 pages

Technology Operating Models for Cloud and Edge

This book will help you build and create ownership of a technology operating model, as well as connect your leadership with engineering and operations, keeping your internal and external customers in mind. It provides practical tips on why, where, and how to make the cloud and edge platform paradigm sing for you, your team, and your organization.

BookAug 2023228 pages

Azure Architecture Explained

Azure is the preferred platform to build mission-critical and secure apps. This book provides comprehensive coverage of essential Azure products, services, and solutions vital for every solution architect's success. Elevate your knowledge and master the critical components of Azure to excel in your role with Azure Architecture Explained.

BookSep 2023446 pages

Pentesting Active Directory and Windows-based Infrastructure

This practical guide helps you explore the pentesting of Microsoft infrastructure in detail, and enhances your offensive skillset by showing you the different ways to perform security assessment. This book will help blue teamers and IT engineers get up to speed with possible security issues they may encounter in their Windows environments.

BookNov 2023360 pages

Practical Ansible

In Practical Ansible, you'll work with the latest release of Ansible and learn to solve complex issues quickly with the help of task-oriented scenarios. You'll start by installing and configuring Ansible to automate monotonous and repetitive IT tasks and get to grips with concepts such as playbooks, inventories, plugins, collections, and network modules.

BookSep 2023420 pages

Windows 11 for Enterprise Administrators

Microsoft’s launch of Windows 11 is a step toward satisfying the enterprise administrator’s needs for better management and enhanced user experience customization. This book provides the enterprise administrator with the knowledge needed to fully utilize the advanced feature set of Windows 11 Enterprise.

BookOct 2023286 pages

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.

BookNov 2023428 pages2

Metric	Description
`node_nf_conntrack_entries`	Current number of entries in the connection tracking table

You're reading from Mastering Prometheus

Enabling Systems Monitoring with the Node Exporter

Technical requirements

Node Exporter overview

Default collectors

conntrack

The textfile collector

Troubleshooting the Node Exporter

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Designing and Implementing Microsoft Azure Networking Solutions

Microsoft 365 Security, Compliance, and Identity Administration

The Microsoft 365 Security, Compliance, and Identity Administration is a comprehensive guide that helps you employ Microsoft 365's robust suite of features and empowers you to optimize your administrative tasks.

Zero Trust Overview and Playbook Introduction

The Self-Taught Cloud Computing Engineer

Technology Operating Models for Cloud and Edge

Azure Architecture Explained

Pentesting Active Directory and Windows-based Infrastructure

Practical Ansible

Windows 11 for Enterprise Administrators

The Linux DevOps Handbook

This book is for software and IT professionals seeking knowledge on Linux systems and DevOps practices. This book will provide you with guidance and tools to learn and gain proficiency in managing Linux-based infrastructures and knowledge of DevOps.