You're reading from The Deep Learning Architect's Handbook

Product type Book

Published in Dec 2023

Publisher Packt

ISBN-13 9781803243795

Pages 516 pages

Edition 1st Edition

Languages

Concepts

Deep Learning

Author (1):

Ee Kin Chin

Table of Contents (25) Chapters

Preface

1. Part 1 – Foundational Methods

2. Chapter 1: Deep Learning Life Cycle

3. Chapter 2: Designing Deep Learning Architectures

4. Chapter 3: Understanding Convolutional Neural Networks

5. Chapter 4: Understanding Recurrent Neural Networks

6. Chapter 5: Understanding Autoencoders

7. Chapter 6: Understanding Neural Network Transformers

8. Chapter 7: Deep Neural Architecture Search

9. Chapter 8: Exploring Supervised Deep Learning

10. Chapter 9: Exploring Unsupervised Deep Learning

11. Part 2 – Multimodal Model Insights

12. Chapter 10: Exploring Model Evaluation Methods

13. Chapter 11: Explaining Neural Network Predictions

14. Chapter 12: Interpreting Neural Networks

15. Chapter 13: Exploring Bias and Fairness

16. Chapter 14: Analyzing Adversarial Performance

17. Part 3 – DLOps

18. Chapter 15: Deploying Deep Learning Models to Production

19. Chapter 16: Governing Deep Learning Models

20. Chapter 17: Managing Drift Effectively in a Dynamic Environment

21. Chapter 18: Exploring the DataRobot AI Platform

22. Chapter 19: Architecting LLM Solutions

23. Index

Why subscribe?

24. Other Books You May Enjoy

Strategizing the construction of a deep learning system

A deep learning model can only realize real-world value by being part of a system that performs some sort of operation. Bringing deep learning models from research papers to actual real-world usage is not an easy task. Thus, performing proper planning before conducting any project is a more reliable and structured way to achieve the desired goals. This section will discuss some considerations and strategies that will be beneficial when you start to plan your deep learning project toward success.

Starting the journey

Today, deep learning practitioners tend to focus a lot on the algorithmic model-building part of the process. It takes a considerable amount of mental strength to not get hooked on the hype of state-of-the-art (SOTA) research-focused techniques. With crazy techniques such as pixtopix, which is capable of generating high-resolution realistic color images from just sketches or image masks, and natural language processing (NLP) techniques such as GPT-3, a 175-billion parameters text generation model from OpenAI, and GPT-4, a multimodal text generation model that is a successor to GPT-3 and its sub-models, that are capable of generating practically anything you ask it to in a text format that ranges from text summarization to generating code, why wouldn’t they?!

Jokes aside, to become a true deep learning architect, we need to come to a consensus that any successful machine learning or deep learning project starts with the business problem and not from the shiny new research paper you just read online complete with a public GitHub repository. The planning stage often involves many business executives who are not savvy about the details of machine learning algorithms and often, the same set of people wouldn’t care about it at all. These algorithms are daunting for business-focused stakeholders to understand and, when added on top of the tough mental barriers of the adoption of artificial intelligence technologies itself, it doesn’t make the project any more likely to be adopted.

Evaluating deep learning’s worthiness

Deep learning shines the most in handling unstructured data. This includes image data, text data, audio data, and video data. This is largely due to the model’s ability to automatically learn and extract complex, high-level features from the raw data. In the case of images and videos, deep learning models can capture spatial and temporal patterns, recognizing objects, scenes, and activities. With audio data, deep learning can understand the nuances of speech, noise, and various sound elements, making it possible to build applications such as speech recognition, voice assistants, and audio classification systems. For text data, deep learning models can capture the context, semantics, and syntax, enabling NLP tasks such as sentiment analysis, machine translation, and text summarization.

This means that if this data exists and is utilized by your company in its business processes, there may be an opportunity to solve a problem with the help of deep learning. However, never overcomplicate problems just so you can solve them with deep learning. Equating this to something more relatable, you wouldn’t use a huge sledgehammer to get a nail into wood. It could work and you might get away with it, but you’d risk bending the nail or injuring yourself while using it.

Once a problem has been identified, evaluate the business value of solving it. Not all problems are born the same and they can be ranked based on their business impact, value, complexity, risks, costs, and suitability for deep learning. Generally, you’d be looking for high impact, high value, low complexity, low risks, low cost, and high suitability to deep learning. Trade-offs between these metrics are expected but simply put, make sure the problem you’ve discovered is worth solving at all with deep learning. A general rule of thumb is to always resort to a simpler solution for a problem, even if it ends up abandoning the usage of deep learning technologies. Simple approaches tend to be more reliable, less costly, less prone to risks, and faster to fruition.

Consider a problem where a solution is needed to remove background scenes in a video feed and leave only humans or necessary objects untouched so that a more suitable background scene can be overlaid as a background instead. This is a common problem in the professional filmmaking industry in all film genres today.

Semantic segmentation, which is the task of assigning a label to every pixel of an image in the width and height dimensions, is a method that is needed to solve such a problem. In this case, the task needs to assign labels that can help identify which pixels need to be removed. With the advent of many publicly available semantic segmentation datasets, deep learning has been able to advance considerably in the semantic segmentation field, allowing itself to achieve a very satisfactory fine-grained understanding of the world, enough so that it can be applied in the industry of autonomous driving and robot navigation most prominently. However, deep learning is not known to be 100% error-free and almost always has some error, even in the controlled evaluation dataset. In the case of human segmentation, for example, the model would likely result in the most errors in the fine hair areas. Most filmmakers aim for perfect depictions of their films and require that every single pixel gets removed appropriately without fail since a lot of money is spent on the time of the actors hired for the film. Additionally, a lot of time and money would be wasted in manually removing objects that could be otherwise simply removed if the scene had been shot with a green screen. This is an example of a case where we should not overcomplicate the problem. A green screen is all you need to solve the problem described: specifically, the rare chromakey green color. When green screens are prepped properly in the areas where the desired imagery will be overlaid digitally, image processing techniques alone can remove the pixels that are considered to be in the small light intensity range centered on the chromakey green color and achieve semantic segmentation effectively with a rule-based solution. The green screen is a simpler solution that is cost-effective, foolproof, and fast to set up.

That was a mouthful! Now, let’s go through a simpler problem. Consider a problem where we want to automatically and digitally identify when it rains. In this use case, it is important to understand the actual requirements and goals of identifying the rain: is it sufficient to detect rain exactly when it happens? Or do we need to identify whether rain will happen in the near future? What will we use the information of rain events for? These questions will guide whether deep learning is required or not. We, as humans, know that rain can be predicted by visual input by either looking at the presence of raindrops falling or looking at cloud conditions. However, if the use case is sufficient to detect rain when it happens, and the goal of detecting rain is to determine when to water the plants, a simpler approach would be to use an electronic sensor to detect the presence of water or humidity. Only when you want to estimate whether it will rain in the future, let’s say in 15 minutes, does deep learning make more sense to be applied as there are a lot of interactions between meteorological factors that can affect rainfall. Only by brainstorming each use case and analyzing all potential solutions, even outside of deep learning, can you make sure deep learning brings tangible business value compared to other solutions. Do not just apply deep learning because you want to.

At times, when value isn’t clear when you’re directly considering a use case, or when value is clear but you have no idea how to execute it, consider finding reference projects from companies in the same industry. Companies in the same industry have a high chance of wanting to optimize the same processes or solve the same pain points. Similar reference projects can serve as a guide to designing a deep learning system and can serve as proof that the use case being considered is worthy of the involvement of deep learning technologies. Of course, not everybody has access to details like this, but you’d be surprised what Google can tell you these days. Even if there isn’t a similar project being carried out for direct reference, you would likely be able to pivot upon the other machine learning project references that already have a track record of bringing value to the same industry.

Admittedly, rejecting deep learning at times would be a hard pill to swallow considering that most practitioners get paid to implement deep learning solutions. However, dismissing it earlier will allow you to focus your time on more valuable problems that would be more useful to solve with deep learning and prevent the risk of undermining the potential of deep learning in cases where simpler solutions can outperform deep learning. Criteria for deep learning worthiness should be evaluated on a case-by-case basis and as a practitioner, the best advice to follow is to simply practice common sense. Spend a good amount of time going through the problem exploration and the worthiness evaluation process. The last thing you want is to spend a painstaking amount of time preparing data, building a deep learning model, and delivering very convincing model insights only to find out that the label you are trying to predict does not provide enough value for the business to invest further.

Defining success

Ever heard sentences like “My deep learning model just got 99% accuracy on my validation dataset!”? Data scientists often make the mistake of determining the success of a machine learning project just by using validation metrics they use to evaluate their machine learning models during the model development process. Model-building metrics such as accuracy, precision, or recall are important metrics to consider in a machine learning project but unless they add business values and connect to the business objectives in some way, they rarely mean anything. A project can achieve a good accuracy score but still fail to achieve the desired business goals. This can happen in cases when no proper success metrics have been defined early and subsequently cause a wrong label to be used in the data preparation and model development stages. Furthermore, even when the model metric positively impacts business processes directly, there is a chance that the achievement won’t be communicated effectively to business stakeholders and the worst case not considered to be successful when reported as-is.

Success metrics, when defined early, act as the machine learning project’s guardrails and ensure that the project goals are aligned with the business goals. One of the guardrails is that a success metric can help guide the choice of a proper label that can at inference time, tangibly improve the business processes or otherwise create value in the business. First, let’s make sure we are aligned with what a label means, which is a value that you want the machine learning model to predict. The purpose of a machine learning model is to assign these labels automatically given some form of input data, and thus during the data preparation and model development stages, a label needs to be chosen to serve that purpose. Choosing the wrong label can be catastrophic to a deep learning project as sometimes, when data is not readily available, it means the project has to start all over again from the data preparation stage. Labels should always be indirectly or directly attributed to the success metric.

Success metrics, as the name suggests, can be plural, and range from time-based success definitions or milestones to the overall project success, and from intangible to tangible. It’s good practice to generally brainstorm and document all the possible success criteria from a low level to a high level. Another best practice is to make sure to always define tangible success metrics alongside intangible metrics. Intangible metrics generate awareness, but tangible metrics make sure things are measurable and thus make them that much more attainable. A few examples of intangible and hard-to-measure metrics are as follows:

Increasing customer satisfaction
Increasing employee performance
Improving shareholder outlook

Metrics are ways to measure something and are tied to goals to seal the deal. Goals themselves can be intangible, similar to the few examples listed previously, but so long as it is tied to tangible metrics, the project is off to a good start. When you have a clear goal, ask yourself in what way the goal can be proven to be achieved, demonstrated, or measured. A few examples of tangible success metrics for machine learning projects that could align with business goals are as follows:

Increase the time customers spend, which can be a proxy for customer delight
Increase company revenue, which can be a proxy for employee performance
Increase the click-through rate (CTR), which can be a proxy for the effectiveness of targeted marketing campaigns
Increase the customer lifetime value (CLTV), which can be a proxy for long-term customer satisfaction and loyalty
Increase conversion rate, which can be a proxy for the success of promotional campaigns and website user experience

This concept is not new nor limited to just machine learning projects – just about any single project carried out for a company as every single real-world project needs to be aligned with the business goal. Many foundational project management techniques can be applied similarly to machine learning projects, and spending time gaining some project management skills out of the machine learning field would be beneficial and transferable to machine learning projects. Additionally, as machine learning is considered to be a software-based technology, software project management methodologies also apply.

A final concluding thought to take away is that machine learning systems are not about how advanced your machine learning models are, but instead about how humans and machine intelligence can work together to achieve a greater good and create value.

Planning resources

Deep learning often involves neural network architectures with a large set of parameters, otherwise called weights. These architecture’s sizes can go from holding a few parameters up to holding hundreds of billions of parameters. For example, an OpenAI GPT-3 text generation model holds 175 billion neural network parameters, which amounts to around 350 GB in computer storage size. This means that to run GPT-3, you need a machine with a random access memory (RAM) size of at least 350 GB!

Deep learning model frameworks such as PyTorch and TensorFlow have been built to work with devices called graphics processing units (GPUs), which offer tremendous neural network model training and inference speedups. Off-the-shelf GPU devices commonly have a GPU RAM of 12 GB and are nowhere near the requirements needed to load a GPT-3 model in GPU mode. However, there are still methods to partition big models into multiple GPUs and run the model on GPUs. Additionally, some methods can allow for distributed GPU model training and inference to support larger data batch sizes at any one usage point. GPUs are not considered cheap devices and can cost anywhere from a few hundred bucks to hundreds of thousands from the most widely used GPU brand, Nvidia. With the rise of cryptocurrency technologies, the availability of GPUs is also reduced significantly due to people buying them immediately when they are in stock. All these emphasize the need to plan computing resources for training and inferencing deep learning models beforehand.

It is important to align your model development and deployment needs to your computing resource allocation early in the project. Start by gauging the range of sizes of deep learning architectures that are suitable for the task at hand either by browsing research papers or websites that provide a good summary of techniques, and setting aside computing resources for the model development process.

Tip

paperswithcode.com provides summaries of a wide variety of techniques grouped by a wide variety of tasks!

When computing resources are not readily available, make sure you always make purchase plans early, especially if it involves GPUs. But what if a physical machine is not desired? An alternative to using computing resources is to use paid cloud computing resource providers you can access online easily from anywhere in the world. During the model development stage, one of the benefits of having more GPUs with more RAM allocated is that it can allow you to train models faster by either using a larger data batch size during training or allowing the capability to train multiple models at any one time. It is generally fine to also use CPU-only deep learning model training, but the model training time would just inevitably be much longer.

The GPU and CPU-based computing resources that are required during training are often considered overkill to be used during inference time when they are deployed. Different applications have different deployment computing requirements and the decision on what resource specification to allocate can be gauged by asking yourself the following three questions:

How often are the inference requests made?
- Many inference requests in a short period might signal the need to have more than one inference service up in multiple computing devices in parallel
What is the average amount of samples that are requested for a prediction at any one time?
- Device RAM requirements should match batch size expectations
How fast do you need a reply?
- GPUs are needed if it’s seconds or a faster response time requirement
- CPUs can do the job if you don’t care about the response time

Resource planning is not restricted to just computing resource planning – it also expands to human resource planning. Assumptions for the number of deep learning engineers and data scientists working together in a team would ultimately affect the choices of software libraries and tools used in the model development process. The analogy of choosing these tools will be introduced in future sections.

The next step is to prepare your data.

You're reading from The Deep Learning Architect's Handbook

Table of Contents (25) Chapters

Strategizing the construction of a deep learning system

Starting the journey

Evaluating deep learning’s worthiness

Defining success

Planning resources

Authors (1)

Personalised recommendations for you