Reader small image

You're reading from  Practical Site Reliability Engineering

Product typeBook
Published inNov 2018
PublisherPackt
ISBN-139781788839563
Edition1st Edition
Right arrow
Authors (3):
Pethuru Raj Chelliah
Pethuru Raj Chelliah
author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

Shreyash Naithani
Shreyash Naithani
author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

Shailender Singh
Shailender Singh
author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh

View More author details
Right arrow

Chapter 10. Containers, Kubernetes, and Istio Monitoring

In the cloud world, we need to carry out monitoring to observe the progress and quality of our services and applications over a period of time. Monitoring allows us to keep our applications under systematic review. If something breaks, we want to know what it is and what caused it to malfunction. Monitoring helps us to investigate the failure points in our services. We can make sure that we detect these services early on using anomaly detection. White-box monitoring can help us work out which services are failing and why, and also how to debug them. It can also provide future trends, which means it can detect potential future failures. Here, we will be focusing only on tools that enable us to monitor either our application or our infrastructure:

  • Monitoring the application: It is very important that the features or services that are being developed are monitored. There should be a proper time-series graph and a dashboard.
  • Monitoring the...

Prometheus


Prometheus is an open source monitoring tool that was originally built by SoundCloud in 2012, inspired by Google's BrogMon. It is written in GoLang. According to the New Stack Survey of 2017, Prometheus is one of the most widely used tools for monitoring Kubernetes clusters. What makes Prometheus different than other open source monitoring systems is that it has a simple, text-based format, making it easy to get metrics from other systems. It also has a multidimensional data model and a rich and concise query language. Using Prometheus, we can monitor all levels, nodes, container-scheduling systems, and also routers and switches. If we are dealing with large applications and a fast-moving infrastructure, this means that the jobs that we run change rapidly and we have to deploy them around 100 times a day. In this case, Prometheus will be very useful, as it has the ability to discover services. If we have a dynamic infrastructure, we can use Prometheus to detect early failures...

Grafana


Grafana is a widely used open source tool that is used to monitor services and applications by visualizing time-series data. It can tell us how our services or servers are doing by showing us production business metrics. It can carry out both infrastructure monitoring and application monitoring. The official definition of Grafana is as follows:

"It is the analytics platform for all your metrics. Grafana allows you to query, visualize, alert, on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team, and foster a data-driven culture."

One of the main reasons why we would use Grafana over Prometheus is to get perfect visualization and dashboard editing. Using Grafana, it is very easy to create a dashboard and customize it. With Prometheus, on the other hand, we would need to make use of console templates to do this, which makes it a little harder to use. Other features of Grafana include the following:

  • Advanced graphing
  • Powerful...

Summary


Monitoring is not a one-time task. We should be regularly measuring what's going on with our Kubernetes pods or our microservices. Monitoring plays a crucial role in the microservice system, as we need to monitor all endpoints in our microservices. To achieve a higher quality product, we should be able to detect failures before our customer does. We should enable anomaly detection and notify our operation team to troubleshoot the problem. We have to set up the necessary monitoring and alerts on both the infrastructure side and the application side.In this chapter, we saw how to use Prometheus and Grafana metrics to create powerful dashboards and alerts. 

In the next chapter, we will talk about post-production activities and best practices for ensuring and enhancing the IT reliability.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Practical Site Reliability Engineering
Published in: Nov 2018Publisher: PacktISBN-13: 9781788839563
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (3)

author image
Pethuru Raj Chelliah

 Pethuru Raj Chelliah (PhD) works as the chief architect at the Site Reliability Engineering Center of Excellence, Reliance Jio Infocomm Ltd. (RJIL), Bangalore. Previously, he worked as a cloud infrastructure architect at the IBM Global Cloud Center of Excellence, IBM India, Bangalore, for four years. He also had an extended stint as a TOGAF-certified enterprise architecture consultant in Wipro Consulting services division and as a lead architect in the corporate research division of Robert Bosch, Bangalore. He has more than 17 years of IT industry experience.
Read more about Pethuru Raj Chelliah

author image
Shreyash Naithani

Shreyash Naithani is currently a site reliability engineer at Microsoft R&D. Prior to Microsoft, he worked with both start-ups and mid-level companies. He completed his PG Diploma from the Centre for Development of Advanced Computing, Bengaluru, India, and is a computer science graduate from Punjab Technical University, India. In a short span of time, he has had the opportunity to work as a DevOps engineer with Python/C#, and as a tools developer, site/service reliability engineer, and Unix system administrator. During his leisure time, he loves to travel and binge watch series.
Read more about Shreyash Naithani

author image
Shailender Singh

Shailender Singh is a principal site reliability engineer and a solution architect with around 11 year's IT experience who holds two master's degrees in IT and computer application. He has worked as a C developer on the Linux platform. He had exposure to almost all infrastructure technologies from hybrid to cloud-hosted environments. In the past, he has worked with companies including Mckinsey, HP, HCL, Revionics and Avalara and these days he tends to use AWS, K8s, Terraform, Packer, Jenkins, Ansible, and OpenShift.
Read more about Shailender Singh