You're reading from Machine Learning Infrastructure and Best Practices for Software Engineers

Product typeBook

Published inJan 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781837634064

Edition1st Edition

Languages

Python

Concepts

Machine Learning

Author (1)

Miroslaw Staron

Designing and Implementing Large-Scale, Robust ML Software

So far, we have learned how to develop ML models, how to work with data, and how to create and test the entire ML pipeline. What remains is to learn how we can integrate these elements into a user interface (UI) and how to deploy it so that they can be used without the need to program. To do so, we’ll learn how to deploy the model complete with a UI and the data storage for the model.

In this chapter, we’ll learn how to integrate the ML model with a graphical UI programmed in Gradio and storage in a database. We’ll use two examples of ML pipelines – an example of the model for predicting defects from our previous chapters and a generative AI model to create pictures from a natural language prompt.

In this chapter, we’re going to cover the following main topics:

ML is not alone – elements of a deployed ML-based system
The UI of an ML model
Data storage
Deploying...

ML is not alone

Chapter 2 introduced several elements of an ML system – storage, data collection, monitoring, and infrastructure, just to name a few of them. We need all of them to deploy a model for the users, but not all of them are important for the users directly. We need to remember that the users are interested in the results, but we need to pay attention to all details related to the development of such systems. These activities are often called AI engineering.

The UI is important as it provides the ability to access our models. Depending on the use of our software, the interface can be different. So far, we’ve focused on the models themselves and on the data that is used to train the models. We have not focused on the usability of models and how to integrate them into the tools.

By extension, as for the UI, we also need to talk about storing data in ML. We can use comma-separated values (CSV) files, but they quickly become difficult to handle. They are either...

The UI of an ML model

A UI serves as the bridge between the intricate complexities of ML algorithms and the end users who interact with the system. It is the interactive canvas that allows users to input data, visualize results, control parameters, and gain insights from the ML model’s outputs. A well-designed UI empowers users, regardless of their technical expertise, to harness the potential of ML for solving real-world problems.

Effective UIs for ML applications prioritize clarity, accessibility, and interactivity. Whether the application is aimed at business analysts, healthcare professionals, or researchers, the interface should be adaptable to the user’s domain knowledge and objectives. Clear communication of the model’s capabilities and limitations is vital, fostering trust in the technology and enabling users to make informed decisions based on its outputs. Hence my next best practice.

Best practice #66

Focus on the user task when designing the...

Data storage

So far, we’ve used CSV files and Excel files to store our data. It’s an easy way to work with ML, but it is also a local one. However, when we want to scale our application and use it outside of just our machine, it is often much more convenient to use a real database engine. The database plays a crucial role in an ML pipeline by providing a structured and organized repository for storing, managing, and retrieving data. As ML applications increasingly rely on large volumes of data, integrating a database into the pipeline becomes essential for a few reasons.

Databases offer a systematic way to store vast amounts of data, making it easily accessible and retrievable. Raw data, cleaned datasets, feature vectors, and other relevant information can be efficiently stored in the database, enabling seamless access by various components of the ML pipeline.

In many ML projects, data preprocessing is a critical step that involves cleaning, transforming, and aggregating...

Deploying an ML model for numerical data

Before we create the UI, we need to define a function that will take care of making predictions using a model that we trained in the previous chapter. This function takes the parameters as a user would see them and then makes a prediction. The following code fragment contains this function:

import gradio as gr
import pandas as pd
import joblib
def predict_defects(cbo,
                    dcc,
                    exportCoupling,
                    importCoupling,
                    nom,
          ...

Deploying a generative ML model for images

The Gradio framework is very flexible and allows for quickly deploying models such as generative AI stable diffusion models – image generators that work similarly to the DALL-E model. The deployment of such a model is very similar to the deployment of the numerical model we covered previously.

First, we need to create a function that will generate images based on one of the models from Hugging Face. The following code fragment shows this function:

import gradio as gr
import pandas as pd
from diffusers import StableDiffusionPipeline
import torch
def generate_images(prompt):
    '''
    This function uses the prompt to generate an image
    using the anything 4.0 model from Hugging Face
    '''
    # importing the model from Hugging Face
    model_id = "xyn-ai/anything-v4.0"...

Deploying a code completion model as an extension

So far, we’ve learned how to deploy models online and on the Hugging Face hub. These are good methods and provide us with the ability to create a UI for our models. However, these are standalone tools that require manual input and provide an output that we need to use manually – for example, paste into another tool or save to disk.

In software engineering, many tasks are automated and many modern tools provide an ecosystem of extensions and add-ins. GitHub Copilot is such an add-in to Visual Studio 2022 and an extension to Visual Studio Code – among other tools. ChatGPT is both a standalone web tool and an add-in to Microsoft’s Bing search engine.

Therefore, in the last part of this chapter, we’ll package our models as an extension to a programming environment. In this section, we learn how to create an extension to complete code, just like GitHub CoPilot. Naturally, we won’t use the CodeX...

Summary

This chapter concludes the third part of this book. It also concludes the most technical part of our journey through the best practices. We’ve learned how to develop ML systems and how to deploy them. These activities are often called AI engineering, which is the term that places the focus on the development of software systems rather than the models themselves. This term also indicates that testing, deploying, and using ML is much more than training, validating, and testing the models.

Naturally, there is even more to this. Just developing and deploying AI software is not enough. We, as software engineers or AI engineers, need to consider the implications of our actions. Therefore, in the next part of this book, we’ll explore the concepts of bias, ethics, and the sustainable use of the fruits of our work – AI software systems.

References

Rana, R., et al. A framework for adoption of machine learning in industry for software defect prediction. In 2014 9th International Conference on Software Engineering and Applications (ICSOFT-EA). 2014. IEEE.
Bosch, J., H.H. Olsson, and I. Crnkovic, Engineering ai systems: A research agenda. Artificial Intelligence Paradigms for Smart Cyber-Physical Systems, 2021: p. 1-19.
Giray, G., A software engineering perspective on engineering machine learning systems: State of the art and challenges. Journal of Systems and Software, 2021. 180: p. 111031.

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning Infrastructure and Best Practices for Software Engineers

Published in: Jan 2024Publisher: PacktISBN-13: 9781837634064

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €14.99/month. Cancel anytime

Author (1)

Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages