Reader small image

You're reading from  Mastering Prometheus

Product typeBook
Published inApr 2024
PublisherPackt
ISBN-139781805125662
Edition1st Edition
Concepts
Right arrow
Author (1)
William Hegedus
William Hegedus
author image
William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus

Right arrow

Troubleshooting the Node Exporter

I would be remiss if I gave you the impression that the Node Exporter just magically works 100% of the time. Undoubtedly, you’ll experience issues where Node Exporter scrapes begin experiencing issues such as slow scrapes or even timeouts. Thankfully, the Node Exporter provides us with some per-collector metrics to help pinpoint where the issue lies.

The node_scrape_collector_success metric returns whether or not running an individual collector was successful. But wait – before you go putting alerts in for any time any node_scrape_collector_success time series returns a 0, remember that not all of the collectors that are enabled by default are expected to apply to your system. For example, I seriously doubt your server has both InfiniBand and Fibre Channel connections (most likely you have neither), so something’s always going to be marked as failing.

Instead, the metric I tend to look at the most for Node Exporter troubleshooting...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Mastering Prometheus
Published in: Apr 2024Publisher: PacktISBN-13: 9781805125662

Author (1)

author image
William Hegedus

William Hegedus has worked in tech for over a decade in a variety of roles, culminating in site reliability engineering. He developed a keen interest in Prometheus and observability technologies during his time managing a 24/7 NOC environment and eventually became the first SRE at Linode, one of the foremost independent cloud providers. Linode was acquired by Akamai Technologies in 2022, and now Will manages a team of SREs focused on building the internal observability platform for Akamai's Connected Cloud. His team is responsible for a global fleet of Prometheus servers spanning over two dozen data centers and ingesting millions of data points every second, in addition to operating a suite of other observability tools. Will is an open source advocate and contributor who has contributed code to Prometheus, Thanos, and many other CNCF projects related to Kubernetes and observability. He lives in central Virginia with his wonderful wife, four kids, three cats, two dogs, and a bearded dragon.
Read more about William Hegedus