Reader small image

You're reading from  Distributed Data Systems with Azure Databricks

Product typeBook
Published inMay 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781838647216
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Alan Bernardo Palacio
Alan Bernardo Palacio
author image
Alan Bernardo Palacio

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.
Read more about Alan Bernardo Palacio

Right arrow

Model Registry example

In this section, we will go through an example in which we will develop a machine learning model and use the MLflow Model Registry to save it, manage the stages in which it belongs, and use it to make predictions. The model will be a Keras neural network, and we will use the Windfarm US dataset to predict the power output of wind farms based on parameters from weather conditions such as wind direction, speed, and air temperature. We will make use of MLflow to keep track of the stage of the model and be able to register and load it back again to make predictions:

  1. First, we will retrieve the dataset that will be used to train the model. We will use the pandas read_csv() function to load directly from the Uniform Resource Identifier (URI) of the file in GitHub, as follows:
    import pandas as pd
    wind_farm_data = pd.read_csv("https://github.com/dbczumar/model-registry-demo-notebook/raw/master/dataset/windfarm_data.csv", index_col=0)

    The dataset...

lock icon
The rest of the page is locked
Previous PageNext Page
You have been reading a chapter from
Distributed Data Systems with Azure Databricks
Published in: May 2021Publisher: PacktISBN-13: 9781838647216

Author (1)

author image
Alan Bernardo Palacio

Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.
Read more about Alan Bernardo Palacio