Reader small image

You're reading from  Machine Learning Engineering with MLflow

Product typeBook
Published inAug 2021
PublisherPackt
ISBN-139781800560796
Edition1st Edition
Tools
Right arrow
Author (1)
Natu Lauchande
Natu Lauchande
author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande

Right arrow

Chapter 2: Your Machine Learning Project

The approach of this book is to iterate through a practical business project – namely, stock market prediction – and, with this use case, explore through the different chapters the different features of MLflow. We will use a structured approach to frame a machine learning problem and project. A sample pipeline will be created and used to iterate and evolve the project in the remainder of the book.

Using a structured framework to describe a machine learning problem helps the practitioner to reason more efficiently about the different requirements of the machine learning pipeline. We will present a practical pipeline using the requirements elicited during framing.

Specifically, we will cover the following sections in this chapter: 

  • Exploring the machine learning process
  • Framing the machine learning problem
  • Introducing the stock market prediction problem
  • Developing your machine learning baseline...

Technical requirements 

For this chapter, you will need the following prerequisites: 

  • The latest version of Docker installed on your machine. If you don't already have it installed, please follow the instructions at https://docs.docker.com/get-docker/.
  • Access to a Bash terminal (Linux or Windows). 
  • Access to a browser.
  • Python 3.5+ installed.
  • MLflow installed locally as described in Chapter 1, Introducing MLflow.

Exploring the machine learning process

In this chapter, we will begin by describing the problem that we will solve throughout the book. We aim to focus on machine learning in the context of stock trading.

Machine learning can be defined as the process of training a software artifact – in this case, a model to make relevant predictions in a problem. Predictions are used to drive business decisions, for instance, which stock should be bought or sold or whether a picture contains a cat or not.

Having a standard approach to a machine learning project is critical for a successful project. The typical iteration of a machine learning life cycle is depicted in Figure 2.1:

Figure 2.1 – Excerpt of the acquired data with the prediction column

Let's examine each stage in detail:

  • Ideation: This phase involves identifying a business opportunity to use machine learning and formulating the problem.
  • Prototyping: This involves verifying...

Framing the machine learning problem

Machine learning problem framing, as defined in this section, is a technique and methodology to help specify and contextualize a machine learning problem in such a way that an engineering solution can be implemented. Without a solid approach to tackling machine learning problems, it can become very hard to extract the real value of the undertaking.

We will draw inspiration from the approaches of companies such as Amazon and Google, which have been successfully applying the technique of machine learning problem framing.

The machine learning development process is highly based on the scientific method. We undergo different stages of stating a goal, data collection, hypothesis testing, and conclusion. It's expected that we will cycle through the different stages of the workflow until either a good model is identified or it becomes apparent that it's impossible to develop one.

The following subsections depict the framework that...

Introducing the stock market prediction problem

The scenario that we will cover in the remaining chapters of the book is of the hypothetical company PsyStock LLC, which provides a platform for amateur traders, providing APIs and UIs to solve different predictions in the context of stock prediction.

As machine learning practitioners and developers, we should be able to build a platform that will allow a team of data scientists to quickly develop, test, and bring into production machine learning projects.

We will apply and frame the problems initially so we can build our platform upon the basis of the definitions of the problems. It should be noted that the problem framing will evolve as we learn more about the problem: the initial framing will give us guidance on the problem spaces that we will be tackling.

The following are the core projects that we will use as references in the rest of the book for machine learning development in MLflow.

Stock movement predictor

This...

Sentiment analysis of market influencers

The sentiment machine learning pipeline will predict whether the sentiment over a stock ticker is positive or negative on social media and provide it as an API to the users of the machine learning platform that we are developing in this book.

Problem statement

To predict whether a given stock ticker has positive sentiment for the current day of relevant market influencers on Twitter selected by PsyStock LLC.

Success and failure definition

Success, in this case, is a bit harder to define, as the fact of a sentiment being positive can't exactly be tracked to a market metric. The definition of success on this particular prediction problem should be a proxy for how many times a user is a repeat user of the API.

Model output

The model output is basically a number matching the polarity of the tweet – positive, negative, or neutral sentiment – of a ticker.

Output usage

The output of this system will be used...

Developing your machine learning baseline pipeline

For our machine learning platform, we will start with a very simple, heuristic-based pipeline, in order to get the infrastructure of your end-to-end system working correctly and an environment where the machine learning models can iterate on it.

Important note

It is critical that the technical requirements are correctly installed in your local machine to follow along. The assumption on this section is that you have MLflow and Docker installed as per the Technical requirements section.

By the end of this section, you will be able to create our baseline pipeline. The baseline pipeline value is to enable rapid iteration to the model developers. So, basically, an end-to-end infrastructure with placeholders for training and model serving will be made available to the development team. Since it's all implemented in MLflow, it becomes easy to have specialization and focus of the different types of teams involved in a machine...

Summary

In this chapter, we introduced the machine learning problem framing approach, and explored some of the motivation behind adopting this framework.

We introduced the stock market prediction machine learning platform and our initial set of prediction problems using the ML problem framing methodology.

We briefly introduced in this chapter the use case of a stock market prediction basic pipeline that will be used in the rest of the book. 

In the next chapter, we will focus on creating a data science development environment with MLflow using the definitions of the problem made in this chapter.

Further reading

In order to further your knowledge, you can consult the documentation at the following links: 

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Engineering with MLflow
Published in: Aug 2021Publisher: PacktISBN-13: 9781800560796
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Natu Lauchande

Natu Lauchande is a principal data engineer in the fintech space currently tackling problems at the intersection of machine learning, data engineering, and distributed systems. He has worked in diverse industries, including biomedical/pharma research, cloud, fintech, and e-commerce/mobile. Along the way, he had the opportunity to be granted a patent (as co-inventor) in distributed systems, publish in a top academic journal, and contribute to open source software. He has also been very active as a speaker at machine learning/tech conferences and meetups.
Read more about Natu Lauchande