Reader small image

You're reading from  Practical Deep Learning at Scale with MLflow

Product typeBook
Published inJul 2022
PublisherPackt
ISBN-139781803241333
Edition1st Edition
Right arrow
Author (1)
Yong Liu
Yong Liu
author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu

Right arrow

Understanding DL code challenges

In this section, we will discuss DL code challenges. Let's look at how these code challenges are manifested in each of the stages described in Figure 1.3. In this section, and within the context of DL development, code refers to the source code that's written in certain programming languages such as Python for data processing and implementation, while a model refers to the model logic and architecture in its serialized format (for example, pickle, TorchScript, or ONNX):

  • Data collection/cleaning/annotation: While data is the central piece in this stage, the code that does the query, extraction/transformation/loading (ETL), and data cleaning and augmentation is of critical importance. We cannot decouple the development of the model from the data pipelines that provide the data feeds to the model. Therefore, data pipelines that implement ETL need to be treated as one of the integrated steps in both offline experimentation and online production. A common mistake is that we use different data ETL and cleaning pipelines in offline experimentation, and then implement different data ETL/cleaning pipelines in online production, which could cause different model behaviors. We need to version and serialize the data pipeline as part of the entire model pipeline. MLflow provides several ways to allow us to implement such multistep pipelines.
  • Model development: During offline experiments, in addition to different versions of data pipeline code, we might also have different versions of notebooks or use different versions of DL library code. The usage of notebooks is particularly unique in ML/DL life cycles. Tracking which model results are produced by which notebook/model pipeline/data pipeline needs to be done for each run. MLflow does that with automatic code version tracking and dependencies. In addition, code reproducibility in different running environments is unique to DL models, as DL models usually require hardware accelerators such as GPUs or TPUs. The flexibility of running either locally, or remotely, on a CPU or GPU environment is of great importance. MLflow provides a lightweight approach in which to organize the ML projects so that code can be written once and run everywhere.
  • Model deployment and serving in production: While the model is serving production traffic, any bugs will need to be traced back to both the model and code. Thus, tracking code provenance is critical. It is also critical to track all the dependency code library versions for a particular version of the model.
  • Model validation and A/B testing: Online experiments could use multiple versions of models using different data feeds. Debugging any experimentation will require not only knowing which model is used but also which code is used to produce that model.
  • Monitoring and feedback loops: This stage is similar to the previous stage in terms of code challenges, where we need to know whether model degradation is due to code bugs or model and data drifting. The monitoring pipeline needs to collect all the metrics for both data and model performance.

In summary, DL code challenges are especially unique because DL frameworks are still evolving (for example, TensorFlow, PyTorch, Keras, Hugging Face, and SparkNLP). MLflow provides a lightweight framework to overcome many common challenges and can interface with many DL frameworks seamlessly.

Previous PageNext Page
You have been reading a chapter from
Practical Deep Learning at Scale with MLflow
Published in: Jul 2022Publisher: PacktISBN-13: 9781803241333
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Yong Liu

Yong Liu has been working in big data science, machine learning, and optimization since his doctoral student years at the University of Illinois at Urbana-Champaign (UIUC) and later as a senior research scientist and principal investigator at the National Center for Supercomputing Applications (NCSA), where he led data science R&D projects funded by the National Science Foundation and Microsoft Research. He then joined Microsoft and AI/ML start-ups in the industry. He has shipped ML and DL models to production and has been a speaker at the Spark/Data+AI summit and NLP summit. He has recently published peer-reviewed papers on deep learning, linked data, and knowledge-infused learning at various ACM/IEEE conferences and journals.
Read more about Yong Liu