Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Chapter 3: Your Data Science Workbench

In this chapter, you will learn about MLflow in the context of creating a local environment so that you can develop your machine learning project locally with the different features provided by MLflow. This chapter is focused on machine learning engineering, and one of the most important roles of a machine learning engineer is to build up an environment where model developers and practitioners can be efficient. We will also demonstrate a hands-on example of how we can use workbenches to accomplish specific tasks.

Specifically, we will look at the following topics in this chapter: 

  • Understanding the value of a data science workbench
  • Creating your own data science workbench
  • Using the workbench for stock prediction

Technical requirements 

For this chapter, you will need the following prerequisites: 

  • The latest version of Docker installed on your machine. If you don’t already have it installed, please follow the instructions at https://docs.docker.com/get-docker/.

    The latest version of Docker Compose installed. If you don’t already have it installed, please follow the instructions at https://docs.docker.com/compose/install/.

  • Access to Git in the command line, and installed as described in this Uniform Resource Locator (URL): https://git-scm.com/book/en/v2/Getting-Started-Installing-Git.
  • Access to a bash terminal (Linux or Windows). 
  • Access to a browser.
  • Python 3.5+ installed.
  • MLflow installed locally, as described in Chapter 1, Introducing MLflow.

Understanding the value of a data science workbench

A data science workbench is an environment to standardize the machine learning tools and practices of an organization, allowing for rapid onboarding and development of models and analytics. One critical machine learning engineering function is to support data science practitioners with tools that empower and accelerate their day-to-day activities.

In a data science team, the ability to rapidly test multiple approaches and techniques is paramount. Every day, new libraries and open source tools are created. It is common for a project to need more than a dozen libraries in order to test a new type of model. These multitudes of libraries, if not collated correctly, might cause bugs or incompatibilities in the model.

Data is at the center of a data science workflow. Having clean datasets available for developing and evaluating models is critical. With an abundance of huge datasets, specialized big data tooling is necessary to process...

Creating your own data science workbench

In order to address common frictions for developing models in data science, as described in the previous section, we need to provide data scientists and practitioners with a standardized environment in which they can develop and manage their work. A data science workbench should allow you to quick-start a project, and the availability of an environment with a set of starting tools and frameworks allows data scientists to rapidly jump-start a project.

The data scientist and machine learning practitioner are at the center of the workbench: they should have a reliable platform that allows them to develop and add value to the organization, with their models at their fingertips.

The following diagram depicts the core features of a data science workbench:

Figure 3.1 – Core features of a data science workbench

In order to think about the design of our data science workbench and based on the diagram in Figure...

Using the workbench for stock prediction

In this section, we will use the workbench step by step to set up a new project. Follow the instructions step by step to start up your environment and use the workbench for the stock-prediction project.

Important note

It is critical that all packages/libraries listed in the Technical requirements section are correctly installed on your local machine to enable you to follow along.

Starting up your environment

We will move on next to exploring your own development environment, based on the development environment shown in this section. Please execute the following steps:

  1. Copy the contents of the project available in https://github.com/PacktPublishing/Machine-Learning-Engineering-with-MLflow/tree/master/Chapter03/gradflow.
  2. Start your local environment by running the following command:
    make
  3. Inspect the created environments, like this:
    $ docker ps 

    The following screenshot presents three Docker images: the first for Jupyter...

Summary

In this chapter, we introduced the concept of a data science workbench and explored some of the motivation behind adopting this tool as a way to accelerate our machine learning engineering practice.

We designed a data science workbench, using MLflow and adjacent technologies based on our requirements. We detailed the steps to set up your development environment with MLflow and illustrated how to use it with existing code. In later sections, we explored the workbench and added to it our stock-trading algorithm developed in the last chapter.

In the next chapter, we will focus on experimentation to improve our models with MLflow, using the workbench developed in this chapter.

Further reading

In order to further your knowledge, you can consult the documentation in the following links: 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande