Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Infrastructure monitoring and alerting

The main dimensions of monitoring in ML systems from an infrastructure perspective do not differ from those in traditional software systems.

In order to illustrate this exact issue, we will leverage the monitoring and alerting tools available in AWS CloudWatch and SageMaker to illustrate an example of setting up monitoring and alerting infrastructure. This same mechanism can be set up with tools such as Grafana/Prometheus for on-premises and cloud deployments alike. These monitoring tools achieve similar goals and provide comparable features, so you should choose the most appropriate depending on your environment and cloud provider.

AWS CloudWatch provides a monitoring and observability solution. It allows you to monitor your applications, respond to system-wide performance changes, optimize resource use, and receive a single view of operational health.

At a higher level, we can split the infrastructure monitoring and alerting components...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande