The past few years have seen great success in Deep Learning (DL) for solving practical business, industrial, and scientific problems, particularly for tasks such as Natural Language Processing (NLP), image, video, speech recognition, and conversational understanding. While research in these areas has made giant leaps, bringing these DL models from offline experimentation to production and continuously improving the models to deliver sustainable values is still a challenge. For example, a recent article by VentureBeat (https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/) found that 87% of data science projects never make it to production. While there might be business reasons for such a low production rate, a major contributing factor is the difficulty caused by the lack of experiment management and a mature model production and feedback platform.
This chapter will help us to understand the challenges and bridge these gaps by learning the concepts, steps, and components that are commonly used in the full life cycle of DL model development. Additionally, we will learn about the challenges of an emerging field known as Machine Learning Operations (MLOps), which aims to standardize and automate ML life cycle development, deployment, and operation. Having a solid understanding of these challenges will motivate us to learn the skills presented in the rest of this book using MLflow, an open source, ML full life cycle platform. The business values of adopting MLOps' best practices are numerous; they include faster time-to-market of model-derived product features, lower operating costs, agile A/B testing, and strategic decision making to ultimately improve customer experience. By the end of this chapter, we will have learned about the critical role that MLflow plays in the four pillars of MLOps (that is, data, model, code, and explainability), implemented our first working DL model, and grasped a clear picture of the challenges with data, models, code, and explainability in DL.
In this chapter, we're going to cover the following main topics:
All of the code examples for this book can be found at the following GitHub URL: https://github.com/PacktPublishing/Practical-Deep-Learning-at-Scale-with-MLFlow.
You need to have Miniconda (https://docs.conda.io/en/latest/miniconda.html) installed on your development environment. In this chapter, we will walk through the process of installing the PyTorch lightning-flash
library (https://github.com/PyTorchLightning/lightning-flash), which can be used to build our first DL model in the Implementing a basic DL sentiment classifier section. Alternatively, you can sign up for a free Databricks Community Edition account at https://community.cloud.databricks.com/login.html and use a GPU cluster and a notebook to carry out the model development described in this book.
In addition to this, if you are a Microsoft Windows user, we recommend that you install WSL2 (https://www.windowscentral.com/how-install-wsl2-windows-10) so that you have a Linux environment to run the command lines that are present in this book.
Nowadays, the most successful DL models that are deployed in production primarily observe the following two steps:
One ground-breaking and successful example of a DL model in production is the Buyer Sentiment Analysis model, which is built on top of BERT for classifying sales engagement email messages, providing critical fine-grained insights into buyer emotions and signals beyond simple activity metrics such as reply, click, and open rates (https://www.prnewswire.com/news-releases/outreach-unveils-groundbreaking-ai-powered-buyer-sentiment-analysis-transforming-sales-engagement-301188622.html). There are different variants regarding how this works, but in this book, we will primarily focus on the Transfer Learning paradigm of developing and deploying DL models, as it exemplifies a practical DL life cycle.
Let's walk through an example to understand a typical core DL development paradigm. For example, the popular BERT model released in late 2018 (a basic version of the BERT model can be found at https://huggingface.co/bert-base-uncased) was initially pretrained on raw texts (without human labeling) from over 11,000 books from BookCorpus and the entire English Wikipedia. This pretrained language model was then fine-tuned to many downstream NLP tasks, such as text classification and sentiment analysis, in different application domains such as movie review classifications by using labeled movie review data (https://huggingface.co/datasets/imdb). Note that sometimes, it might be necessary to further pretrain a foundation model (for example, BERT) within the application domain by using unlabeled data before fine-tuning to boost the final model performance in terms of accuracy. This core DL development paradigm is illustrated in Figure 1.1:
Note that while Figure 1.1 represents a common development paradigm, not all of these steps are necessary for a specific application scenario. For example, you might only need to do fine-tuning using a publicly available pretrained DL model with your labeled application-specific data. Therefore, you don't need to do your own pretraining or carry out further pretraining using unlabeled data since other people or organizations have already done the pretraining step for you.
DL over Classical ML
Unlike classical ML model development, where, usually, a feature engineering step is required to extract and transform raw data into features to train an ML model such as decision tree or logistic regression, DL can learn the features automatically, which is especially attractive for modeling unstructured data such as texts, images, videos, audio, and speeches. DL is also called representational learning due to this characteristic. In addition to this, DL is usually data- and compute-intensive, requiring Graphics Process Units (GPUs), Tensor Process Units (TPU), or other types of computing hardware accelerators for at-scale training and inference. Explainability for DL models is also harder to implement, compared with traditional ML models, although recent progress has now made that possible.
To set up the development of a basic DL sentiment classifier, you need to create a virtual environment in your local environment. Let's assume that you have miniconda installed. You can implement the following in your command-line prompt to create a new virtual environment called dl_model
and install the PyTorch lightning-flash
package so that the model can be built:
conda create -n dl_model python==3.8.10 conda activate dl_model pip install lightning-flash[all]==0.5.0 pip pytorch-lightning==1.4.4 datasets==1.9.0
Depending on your local machine's memory, the preceding commands might take about 10 minutes to finish. You can verify the success of your installation by running the following command:
conda list | grep lightning
If you see output similar to the following, your installation was successful:
lightning-bolts 0.3.4 pypi_0 pypi lightning-flash 0.5.0 pypi_0 pypi pytorch-lightning 1.4.4 pypi_0 pypi
Now you are ready to build your first DL model!
To begin building a DL model, complete the following steps:
torch
and flash
libraries, and import download_data
, TextClassificationData
, and TextClassifier
from the flash
subpackages:import torch import flash from flash.core.data.utils import download_data from flash.text import TextClassificationData, TextClassifier
download_data
to download the imdb.zip
file, which is the public domain binary sentiment classification (positive/negative) dataset from Internet Movie Database (IMDb) to a local data folder. The IMDb ZIP file contains three CSV files: train.csv
valid.csv
test.csv
Each file contains two columns: review
and sentiment
. We then use TextClassificationData.from_csv
to declare a datamodule
variable that assigns the "review" to input_fields
, and the "sentiment" to target_fields
. Additionally, it assigns the train.csv
file to train_file
, the valid.csv
file to val_file
, and the test.csv
file to the test_file
properties of datamodule
, respectively:
download_data("https://pl-flash-data.s3.amazonaws.com/imdb.zip", "./data/") datamodule = TextClassificationData.from_csv( input_fields="review", target_fields="sentiment", train_file="data/imdb/train.csv", val_file="data/imdb/valid.csv", test_file="data/imdb/test.csv" )
classifier_model
by calling TextClassifier
with a backbone assigned to prajjwal1/bert-tiny
(which is a much smaller BERT-like pretrained model located in the Hugging Face model repository: https://huggingface.co/prajjwal1/bert-tiny). This means our model will be based on the bert-tiny
model. torch.cuda.device_count()
will return either 0 (no GPU) or 1 to N, where N is the maximum number of GPUs you can have in your running environment. Now we are ready to call trainer.finetune
to train a binary sentiment classifier for the IMDb dataset:classifier_model = TextClassifier(backbone="prajjwal1/bert-tiny", num_classes=datamodule.num_classes) trainer = flash.Trainer(max_epochs=3, gpus=torch.cuda.device_count()) trainer.finetune(classifier_model, datamodule=datamodule, strategy="freeze")
DL Fine-Tuning Time
Depending on your running environment, the fine-tuning step might take a couple of minutes on a GPU or around 10 minutes (if you're only using a CPU). You can reduce max_epochs=1
if you simply want to get a basic version of the sentiment classifier quickly.
trainer.test()
:trainer.test()
The output of the test should look similar to the following screenshot, whichindicates that the final model accuracy is about 52%:
The test result shown in the preceding diagram indicates that we have a basic version of the model, as we only fine-tuned the foundation model for three epochs and haven't used any advanced techniques such as hyperparameter tuning or better fine-tuning strategies. However, this is a great accomplishment since you now have a working knowledge of how the core DL model paradigm works! We will explore more advanced model training techniques in later chapters of this book.
By now, you should have your first DL model ready and should feel proud of it. Now, let's explore the full DL life cycle together to fully understand its concepts, components, and challenges.
You might have gathered that the core DL development paradigm revolves around three key artifacts: Data, Model, and Code. In addition to this, Explainability is another major artifact that is required in many mission-critical application scenarios such as medical diagnoses, the financial industry, and decision making for criminal justice. As DL is usually considered a black box, providing explainability for DL increasingly becomes a key requirement before and after shipping to production.
Note that Figure 1.1 is still considered offline experimentation if we are still trying to figure out which model works using a dataset in a lab-like environment. Even in such an offline experimentation environment, things will quickly become complicated. Additionally, we would like to know and track which experiments we have or have not performed so that we don't waste time repeating the same experiments, whatever parameters and datasets we have used, and whatever kind of metrics we have for a specific model. Once we have a model that's good enough for the use cases and customer scenarios, the complexity increases as we need a way to continuously deploy and update the model in production, monitor the model and data drift, and then retrain the model when necessary. This complexity further increases when at-scale training, deployment, monitoring, and explainability are needed.
Let's examine what a DL life cycle looks like (see Figure 1.3). There are five stages:
Figure 1.3 provides a diagram to show that it is a continuous development cycle for a DL model:
In addition to this, we want to point out that the backbone of these five stages, as shown in Figure 1.3, essentially revolves around the four artifacts: data, model, code, and explainability. We will examine the challenges related to these four artifacts in the life cycle in the following sections. However, first, let's explore and understand MLOps, which is an evolving platform concept and framework that supports the full life cycle of ML. This will help us understand these challenges in a big-picture context.
MLOps has some connections to DevOps, where a set of technology stacks and standard operational procedures are used for software development and deployment combined with IT operations. Unlike traditional software development, ML and especially DL represent a new era of software development paradigms called Software 2.0 (https://karpathy.medium.com/software-2-0-a64152b37c35). The key differentiator of Software 2.0 is that the behavior of the software does not just depend on well-understood programming language code (which is the characteristic of Software 1.0) but depends on the learned weights in a neural network that's difficult to write as code. In other words, there exists an inseparable integration of the code, data, and model that must be managed together. Therefore, MLOps is being developed and is still evolving to accommodate this new Software 2.0 paradigm. In this book, MLOps is defined as an operational automation platform that consists of three foundation layers and four pillars. They are listed as follows:
Additionally, we will explain MLflow's roles in these MLOps layers and pillars so that we have a clear picture regarding what MLflow can do to build up the MLOps layers in their entirety:
In summary, MLflow plays a critical and foundational role in MLOps. It fills in the gaps that DevOps traditionally does not cover and, thus, is the focus of this book. The following diagram (Figure 1.4) shows the central roles of MLflow in the still-evolving MLOps world:
While the bottom two layers and the topmost layer are common within many software development and deployment processes, the middle four pillars are either entirely unique to ML/DL applications or partially unique to ML/DL applications. MLflow plays a critical role in all four of these pillars in MLOps. This book will help you to confidently apply MLflow to solve the issues of these four pillars while also equipping you to further integrate with other tools in the MLOps layers depicted in Figure 1.4 for full automation depending on your scenario requirements.
In this section, we will discuss the data challenges at each stage of the DL life cycle, as illustrated in Figure 1.3. Essentially, DL is a data-centric AI, unlike symbolic AI where human knowledge can be used without lots of data. The challenges for data in DL are pervasive in all stages of the full life cycle:
MyCoolAnnotatedData-v1.0.csv MyCoolAnnotatedData-v2.0.csv MyCoolAnnotatedData-v3.0.csv MyCoolAnnotatedData-v4.0.csv
This seems to work until some changes are required in any one of the vX.0 datasets due to the need to correct annotation errors or remove email messages because of customer churn. Also, what happens if we need to combine several datasets together or perform some data cleaning and transformation to train a new DL model? What if we need to implement data augmentation to artificially generate some datasets? Evidently, simply changing the names of the files is neither scalable nor sustainable.
In summary, data tracking and observability are major challenges in all stages of the DL life cycle.
In this section, we will discuss DL model challenges. Let's look at the challenges at each stage of the DL life cycle, as depicted in Figure 1.3:
pickle
. (https://github.com/cloudpipe/cloudpickle) is usually used in serializing the model written in Python. However, TorchScript (https://pytorch.org/docs/stable/jit.html) is now highly performant for PyTorch models. In addition, Open Neural Network Exchange or ONNX (https://onnx.ai/) tries to provide more framework-agnostic DL serialization. Finally, we need to log the serialized model and register the model so that we can track model versioning. MLflow is one of the first open source tools to overcome these challenges.In summary, DL model challenges in the full life cycle are unique. It is also worth pointing out a common framework that can assist the model development and online production back-and-forth is of great importance, as we don't want to use different tools just because the execution environment is different. MLflow provides this unified framework to bridge such gaps.
In this section, we will discuss DL code challenges. Let's look at how these code challenges are manifested in each of the stages described in Figure 1.3. In this section, and within the context of DL development, code refers to the source code that's written in certain programming languages such as Python for data processing and implementation, while a model refers to the model logic and architecture in its serialized format (for example, pickle, TorchScript, or ONNX):
In summary, DL code challenges are especially unique because DL frameworks are still evolving (for example, TensorFlow, PyTorch, Keras, Hugging Face, and SparkNLP). MLflow provides a lightweight framework to overcome many common challenges and can interface with many DL frameworks seamlessly.
In this section, we will discuss DL explainability challenges at each of the stages described in Figure 1.3. It is increasingly important to view explainability as an integral and necessary mechanism to define, test, debug, validate, and monitor models across the entire model life cycle. Embedding explainability early will make subsequent model validation and operations easier. Also, to maintain ongoing trust in ML/DL models, it is critical to be able to explain and debug ML/DL models after they go live in production:
A subpopulation could have lots of missing fields and then be dropped during data cleaning. This could result in underrepresentation following the data cleaning process. Human annotations could favor the privileged group and other possible unconscious biases. A metric called Disparate Impact could reveal the hidden biases in the data, which compares the proportion of individuals that receive a positive outcome for two groups: an unprivileged group and a privileged group. If the unprivileged group (for example, persons with age > 60) receives a positive outcome (for example, loan approval) less than 80% of the proportion of the privileged group (persons with age < 60), this is a disparate impact violation based on the current common industry standard (a four-fifths rule). Tools such as Dataiku could help to automate the disparate impact and subpopulation analysis to find groups of people who may be treated unfairly or differently because of the data used for model training.
Figure 1.5 shows that this NLP model's prediction results' number one feature is the word impressive
, followed by rent
. Essentially, this breaks the black box of the DL model, giving much confidence to the usage of DL models in production.
In summary, DL explainability is a major challenge where ongoing research is still needed. However, MLflow's integration with SHAP now provides a ready-to-use tool for practical DL applications, which we will cover in our advanced chapter later in this book.
In this opening chapter, we implemented our first DL model by following the pretrain plus fine-tuning core DL development paradigm using PyTorch lightning-flash
for a text sentiment classification model. We learned about the five stages of the full life cycle of DL. We defined the concept of MLOps along with the three foundation layers and four ML/DL pillars, where MLflow plays critical roles in all four pillars (data, model, code, and explainability). Finally, we described the challenges in DL data, model, code, and explainability.
With the knowledge and first DL model experience gained in this chapter, we are now ready to learn about and implement MLflow in our DL model in the following chapters. In the next chapter, we will start with the implementation of a DL model with MLflow autologging enabled.
To further your knowledge, please consult the following resources and documentation:
Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.
If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.
Please Note: Packt eBooks are non-returnable and non-refundable.
Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:
If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:
Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.
You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.
Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.
When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.
For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.