You're reading from Machine Learning Infrastructure and Best Practices for Software Engineers

Product type Book

Published in Jan 2024

Publisher Packt

ISBN-13 9781837634064

Pages 346 pages

Edition 1st Edition

Languages

Python

Concepts

Machine Learning

Author (1):

Miroslaw Staron

Table of Contents (24) Chapters

Preface

1. Part 1:Machine Learning Landscape in Software Engineering

2. Machine Learning Compared to Traditional Software

3. Elements of a Machine Learning System

4. Data in Software Systems – Text, Images, Code, and Their Annotations

5. Data Acquisition, Data Quality, and Noise

6. Quantifying and Improving Data Properties

7. Part 2: Data Acquisition and Management

8. Processing Data in Machine Learning Systems

9. Feature Engineering for Numerical and Image Data

10. Feature Engineering for Natural Language Data

11. Part 3: Design and Development of ML Systems

12. Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)

13. Training and Evaluating Classical Machine Learning Systems and Neural Networks

14. Training and Evaluation of Advanced ML Algorithms – GPT and Autoencoders

15. Designing Machine Learning Pipelines (MLOps) and Their Testing

16. Designing and Implementing Large-Scale, Robust ML Software

17. Part 4: Ethical Aspects of Data Management and ML System Development

18. Ethics in Data Acquisition and Management

19. Ethics in Machine Learning Systems

20. Integrating ML Systems in Ecosystems

21. Summary and Where to Go Next

22. Index

Why subscribe?

23. Other Books You May Enjoy

Every data has its purpose – annotations and tasks

Data in raw format is important, but only the first step in the development and operations of ML software. The most important part, and the costliest one, is the annotation of the data. To train an ML model and then use it to make inferences, we need to define a task. Defining a task is both conceptual and operational. The conceptual definition is to define what we want the software to do, but the operational definition is how we want to achieve that goal. The operational definition boils down to a definition of what we see in the data and what we want the ML model to identify/replicate.

Annotations are the mechanisms by which we direct the ML algorithms. Every piece of data that we use requires some sort of label to denote what it is. In the raw format of the data, this annotation can be a label of what the data point contains. For example, such a label can be that the image contains the number 1 (from the MNIST dataset...