You're reading from Machine Learning Infrastructure and Best Practices for Software Engineers

Product typeBook

Published inJan 2024

Reading LevelIntermediate

PublisherPackt

ISBN-139781837634064

Edition1st Edition

Languages

Python

Concepts

Machine Learning

Author (1)

Miroslaw Staron

What this book covers

Chapter 1, Machine Learning Compared to Traditional Software, explores where these two types of software systems are most appropriate. We learn about the software development processes that programmers use to create both types of software and we also learn about the classical four types of machine learning software – rule-based, supervised, unsupervised, and reinforcement learning. Finally, we also learn about the different roles of data in traditional and machine learning software.

Chapter 2, Elements of a Machine Learning System, reviews each element of a professional machine learning system. We start by understanding which elements are important and why. Then, we explore how to create such elements and how to work by putting them together into a single machine learning system – the so-called machine learning pipeline.

Chapter 3, Data in Software Systems – Text, Images, Code, and Features, introduces three data types – images, texts, and formatted text (program source code). We explore how each of these types of data can be used in machine learning, how they should be annotated, and for what purpose. Introducing these three types of data provides us with the possibility to explore different ways of annotating these sources of data.

Chapter 4, Data Acquisition, Data Quality, and Noise, dives deeper into topics related to data quality. We go through a theoretical model for assessing data quality and we provide methods and tools to operationalize it. We also look into the concept of noise in machine learning and how to reduce it by using different tokenization methods.

Chapter 5, Quantifying and Improving Data Properties, dives deeper into the properties of data and how to improve them. In contrast to the previous chapter, we work on feature vectors rather than raw data. The feature vectors are already a transformation of the data; therefore, we can change such properties as noise or even change how the data is perceived. We focus on the processing of text, which is an important part of many machine learning algorithms nowadays. We start by understanding how to transform data into feature vectors using simple algorithms, such as bag of words, so that we can work on feature vectors.

Chapter 6, Processing Data in Machine Learning Systems, dives deeper into the ways in which data and algorithms are entangled. We talk a lot about data in generic terms, but in this chapter, we explain what kind of data is needed in machine learning systems. We explain the fact that all kinds of data are used in numerical form – either as a feature vector or as more complex feature matrices. Then, we will explain the need to transform unstructured data (e.g., text) into structured data. This chapter will lay the foundations for going deeper into each type of data, which is the content of the next few chapters.

Chapter 7, Feature Engineering for Numerical and Image Data, focuses on the feature engineering process for numerical and image data. We start by going through the typical methods such as Principal Component Analysis (PCA), which we used previously for visualization. We then move on to more advanced methods such as the t-Student Distribution Stochastic Network Embeddings (t-SNE) and Independent Component Analysis (ICA). What we end up with is the use of autoencoders as a dimensionality reduction technique for both numerical and image data.

Chapter 8, Feature Engineering for Natural Language Data, explores the first steps that made the transformer (GPT) technologies so powerful – feature extraction from natural language data. Natural language is a special kind of data source in software engineering. With the introduction of GitHub Copilot and ChatGPT, it became evident that machine learning and artificial intelligence tools for software engineering tasks are no longer science fiction.

Chapter 9, Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning), explores different types of machine learning systems. We start from classical machine learning models such as random forest and we move on to convolutional and GPT models, which are called deep learning models. Their name comes from the fact that they use raw data as input and the first layers of the models include feature extraction layers. They are also designed to progressively learn more abstract features as the input data moves through these models. This chapter demonstrates each of these types of models and progresses from classical machine learning to the generative AI models.

Chapter 10, Training and Evaluation of Classical ML Systems and Neural Networks, goes a bit deeper into the process of training and evaluation. We start with the basic theory behind different algorithms and then we show how they are trained. We start with the classical machine learning models, exemplified by the decision trees. Then, we gradually move toward deep learning where we explore both the dense neural networks and some more advanced types of networks.

Chapter 11, Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders, explores how generative AI models work based on GPT and Bidirectional Encoder Representation Transformers (BERT). These models are designed to generate new data based on the patterns that they were trained on. We also look at the concept of autoencoders, where we train an autoencoder to generate new images based on the previously trained data.

Chapter 12, Designing Machine Learning Pipelines and their Testing, describes how the main goal of MLOps is to bridge the gap between data science and operations teams, fostering collaboration and ensuring that machine learning projects can be effectively and reliably deployed at scale. MLOps helps to automate and optimize the entire machine learning life cycle, from model development to deployment and maintenance, thus improving the efficiency and effectiveness of ML systems in production. In this chapter, we learn how machine learning systems are designed and operated in practice. The chapter shows how pipelines are turned into a software system, with a focus on testing ML pipelines and their deployment at Hugging Face.

Chapter 13, Designing and Implementation of Large-Scale, Robust ML Software, explains how to integrate the machine learning model with a graphical user interface programmed in Gradio and storage in a database. We use two examples of machine learning pipelines – an example of the model for predicting defects from our previous chapters and a generative AI model to create pictures from a natural language prompt.

Chapter 14, Ethics in Data Acquisition and Management, starts by exploring a few examples of unethical systems that show bias, such as credit ranking systems that penalize certain minorities. We also explain the problems with using open source data and revealing the identities of subjects. The core of the chapter, however, is the explanation and discussion on ethical frameworks for data management and software systems, including the IEEE and ACM codes of conduct.

Chapter 15, Ethics in Machine Learning Systems, focuses on the bias in machine learning systems. We start by exploring sources of bias and briefly discussing these sources. We then explore ways to spot biases, how to minimize them, and finally, how to communicate potential biases to the users of our system.

Chapter 16, Integration of ML Systems in Ecosystems, explains how packaging the ML systems into web services allows us to integrate them into workflows in a very flexible way. Instead of compiling or using dynamically linked libraries, we can deploy machine learning components that communicate over HTTP protocols using JSON protocols. In fact, we have already seen how to use that protocol by using the GPT-3 model that is hosted by OpenAI. In this chapter, we explore the possibility of creating our own Docker container with a pre-trained machine learning model, deploying it, and integrating it with other components.

Chapter 17, Summary and Where to Go Next, revisits all the best practices and summarizes them per chapter. In addition, we also look into what the future of machine learning and AI may bring to software engineering.

The rest of the page is locked

You have been reading a chapter from

Machine Learning Infrastructure and Best Practices for Software Engineers

Published in: Jan 2024Publisher: PacktISBN-13: 9781837634064

Author (1)

Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Machine Learning Infrastructure and Best Practices for Software Engineers

What this book covers

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook