Reader small image

You're reading from  Machine Learning Infrastructure and Best Practices for Software Engineers

Product typeBook
Published inJan 2024
Reading LevelIntermediate
PublisherPackt
ISBN-139781837634064
Edition1st Edition
Languages
Right arrow
Author (1)
Miroslaw Staron
Miroslaw Staron
author image
Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron

Right arrow

Types of Machine Learning Systems – Feature-Based and Raw Data-Based (Deep Learning)

In the previous chapters, we learned about data, noise, features, and visualization. Now, it’s time to move on to machine learning models. There is no such thing as one model, but there are plenty of them – starting from the classical models such as random forest to deep learning models for vision systems to generative AI models such as GPT.

The convolutional and GPT models are called deep learning models. Their name comes from the fact that they use raw data as input and the first layers of the models include feature extraction layers. They are also designed to progressively learn more abstract features as the input data moves through these models.

This chapter demonstrates each of these types of models and progresses from classical machine learning to generative AI models.

In this chapter, we’ll cover the following topics:

  • Why do we need different types...

Why do we need different types of models?

So far, we have invested a significant amount of effort in data processing while focusing on tasks such as noise reduction and annotation. However, we have yet to delve into the models that are employed to work with this processed data. While we briefly mentioned different types of models based on data annotation, including supervised, unsupervised, and reinforced learning, we have not thoroughly explored the user’s perspective when it comes to utilizing these models.

It is important to consider the perspective of the user when employing machine learning models for working with data. The user’s needs, preferences, and specific requirements play a crucial role in selecting and utilizing the appropriate models.

From the user’s standpoint, it becomes essential to assess factors such as model interpretability, ease of integration, computational efficiency, and scalability. Depending on the application and use case, the...

Classical machine learning models

Classical machine learning models require pre-processed data in the form of tables and matrices. Classical machine learning models, such as random forest, linear regression, and support vector machines, require a clear set of predictors and classes to find patterns. Due to this, our pre-processing pipelines need to be manually designed for the task at hand.

From the user’s perspective, these systems are designed in a very classical way – there is a user interface, an engine for data processing (our classical machine learning model), and an output. This is depicted in Figure 9.1:

Figure 9.1 – Elements of a machine learning system

Figure 9.1 – Elements of a machine learning system

Figure 9.1 shows that there are three elements – the input prompt, the model, and the output. For most such systems, the input prompt is a set of properties that are provided for the model. The user fills in some sort of form and the system provides an answer. It...

Convolutional neural networks and image processing

The classical machine learning models are quite powerful, but they are limited in their input. We need to pre-process it so that it’s a set of feature vectors. They are also limited in their ability to learn – they are one-shot learners. We can only train them once and we cannot add more training. If more training is required, we need to train these models from the very beginning.

The classical machine learning models are also considered to be rather limited in their ability to handle complex structures, such as images. Images, as we have learned before, have at least two different dimensions and they can have three channels of information – red, green, and blue. In more complex applications, the images can contain data from LiDAR or geospatial data that can provide meta-information about the images.

So, to handle images, more complex models are needed. One of these models is the YOLO model. It’s considered...

BERT and GPT models

BERT and GPT models use raw data as input and their main output is one predicted word. This word can be predicted both in the middle of a sentence and at the end of it. This means that the products that are designed around these models need to process data differently than in the other models.

Figure 9.3 provides an overview of this kind of processing with a focus on both prompt engineering in the beginning and output processing in the end. This figure shows the machine learning models based on the BERT or GPT architecture in the center. This is an important aspect, but it only provides a very small element of the entire system (or tool).

The tool’s workflow starts on the left-hand side with input processing. For the user, it is a prompt that asks the model to do something, such as "Write a function that reverses a string in C". The tool turns that prompt into a useful input for the model – it can find a similar C program as input for...

Using language models in software systems

Using products such as ChatGPT is great, but they are also limited to the purpose for which they were designed. Now, we can use models like this from scratch using the Hugging Face interface. In the following code example, we can see how we can use a model dedicated to a specific task – recognizing design patterns – to complete the text – that is, writing the signature of a Singleton design pattern. This illustrates how language models (including GPT-3/4) work with text under the hood.

In the following code fragment, we’re importing the model from the Hugging Face library and instantiating it. The model has been pre-trained on a set of dedicated singleton programs and constructed synthetically by adding random code from the Linux kernel source code as code of a Singleton class in C++:

# import the model via the huggingface library
from transformers import AutoTokenizer, AutoModelForMaskedLM
# load the tokenizer...

Summary

In this chapter, we got a glimpse of what machine learning models look like from the inside, at least from the perspective of a programmer. This illustrated the major differences in how we construct machine learning-based software.

In classical models, we need to create a lot of pre-processing pipelines so that the model gets the right input. This means that we need to make sure that the data has the right properties and is in the right format; we need to work with the output to turn the predictions into something more useful.

In deep learning models, the data is pre-processed in a more streamlined way. The models can prepare the images and the text. Therefore, the software engineers’ task is to focus on the product and its use case rather than monitoring concept drift, data preparation, and post-processing.

In the next chapter, we’ll continue looking at examples of training machine learning models – both the classical ones and, most importantly...

References

  • Staron, M. and W. Meding. Short-term defect inflow prediction in large software project-an initial evaluation. In International Conference on Empirical Assessment in Software Engineering (EASE). 2007.
  • Prykhodko, S. Developing the software defect prediction models using regression analysis based on normalizing transformations. In Modern problems in testing of the applied software (PTTAS-2016), Abstracts of the Research and Practice Seminar, Poltava, Ukraine. 2016.
  • Ochodek, M., et al., Chapter 8 Recognizing Lines of Code Violating Company-Specific Coding Guidelines Using Machine Learning. In Accelerating Digital Transformation: 10 Years of Software Center. 2022, Springer. p. 211-251.
  • Ibrahim, D.R., R. Ghnemat, and A. Hudaib. Software defect prediction using feature selection and random forest algorithm. In 2017 International Conference on New Trends in Computing Sciences (ICTCS). 2017. IEEE.
  • Ochodek, M., M. Staron, and W. Meding, Chapter 9 SimSAX: A Measure...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Machine Learning Infrastructure and Best Practices for Software Engineers
Published in: Jan 2024Publisher: PacktISBN-13: 9781837634064
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Author (1)

author image
Miroslaw Staron

Miroslaw Staron is a professor of Applied IT at the University of Gothenburg in Sweden with a focus on empirical software engineering, measurement, and machine learning. He is currently editor-in-chief of Information and Software Technology and co-editor of the regular Practitioner's Digest column of IEEE Software. He has authored books on automotive software architectures, software measurement, and action research. He also leads several projects in AI for software engineering and leads an AI and digitalization theme at Software Center. He has written over 200 journal and conference articles.
Read more about Miroslaw Staron