Reader small image

You're reading from  Cracking the Data Science Interview

Product typeBook
Published inFeb 2024
PublisherPackt
ISBN-139781805120506
Edition1st Edition
Concepts
Right arrow
Authors (2):
Leondra R. Gonzalez
Leondra R. Gonzalez
author image
Leondra R. Gonzalez

Leondra R. Gonzalez is a data scientist at Microsoft and Chief Data Officer for tech startup CulTRUE, with 10 years of experience in tech, entertainment, and advertising. During her academic career, she has completed educational opportunities with Google, Amazon, NBC, and AT&T.
Read more about Leondra R. Gonzalez

Aaren Stubberfield
Aaren Stubberfield
author image
Aaren Stubberfield

Aaren Stubberfield is a senior data scientist for Microsoft's digital advertising business and the author of three popular courses on Datacamp. He graduated with an MS in Predictive Analytics and has over 10 years of experience in various data science and analytical roles focused on finding insights for business-related questions.
Read more about Aaren Stubberfield

View More author details
Right arrow

Building Networks with Deep Learning

In the previous chapter, we explored machine learning (ML) concepts, including common strengths, weaknesses, pitfalls, and various popular ML algorithms.

In this chapter, we will explore artificial intelligence (AI) as we dive into deep learning (DL) concepts. We will review important neural network (NN) fundamentals, components, tasks, and DL architectures that are most common in data science interviews. In doing so, we will unravel the mysteries of weights, biases, activation functions, and loss functions while mastering the art of gradient descent and backpropagation.

Along the way, we’ll fine-tune our networks, delve into the magic of embeddings and autoencoders (AEs), and harness the transformative power of transformers. Plus, we’ll unlock the secrets of transfer learning (TL), understand why NNs are often referred to as “black boxes,” and explore common network architectures that have revolutionized industries...

Introducing neural networks and deep learning

At its core, a neural network (also known as a neural net) is a computational model inspired by the structure and function of the human brain. It’s designed to process information and make decisions in a manner akin to how our neurons work.

An NN consists of interconnected nodes, or artificial neurons, organized into layers. These layers typically include an input layer, one or more hidden layers, and an output layer, which you can see in Figure 11.1. Each connection between neurons is associated with a weight, which determines the strength of the connection, and an activation function, which defines the output of the neuron:

Figure 11.1: Basic NN diagram

Figure 11.1: Basic NN diagram

Data passes from the input layer through the hidden layers until it reaches the final layer as an output. The preceding diagram shows two output nodes, but an NN can consist of one or even hundreds of output nodes. The number of output nodes is an...

Weighing in on weights and biases

Weights and biases are some of the most important components of NNs. Their functionality within NN nodes complements each other, similar to how weights and biases fit linear regression models. Understanding weights and biases will help you understand how they transform an NN from a static structure into a dynamic learning system. Proficiency in initializing, updating, and optimizing these components is essential in the journey of training NNs effectively.

Introduction to weights

Weights are numerical values that are assigned to the connections between neurons. Each connection possesses a corresponding weight value, which dictates the strength of the influence one neuron has on another. During training, these weights are adjusted, enabling the network to capture patterns and relationships within the data it processes.

Initially set to random values, these weights are fine-tuned through techniques such as backpropagation and gradient descent...

Activating neurons with activation functions

We reviewed how weights and biases contribute to a model’s predictions in the previous section. However, the fourth step in Figure 11.2 involves something called an activation function. What is an activation function anyway?

In the intricate architecture of NNs, activation functions are the gears that infuse life and non-linearity into the system. Activation functions are mathematical functions that are applied to the output of each neuron, introducing non-linearity to the outputs. This is a key distinction between the application of weights and biases in linear regression. Let’s explore the role and types of activation functions that breathe vitality into NNs.

At its core, non-linearity allows NNs to capture complex patterns in data that a linear approach would miss. Imagine trying to fit a straight line to data that twists and turns in various directions. A linear model would fail to capture the intricacies, but with...

Unraveling backpropagation

At this point, you may be wondering why weights, biases, and activation functions are so special. After all, at this point, they probably seem not much different than parameters and hyperparameters in traditional ML models. However, understanding backpropagation will solidify your appreciation of how weights and biases work. This journey begins with a brief discussion of gradient descent.

Gradient descent

In short, gradient descent is a powerful optimization algorithm that’s widely used in ML and DL to minimize a cost or loss function. It is the name that’s given to the process of training a model on a task by first making a prediction with the model, measuring how good that prediction is, and then adjusting its weights slightly so that it will perform better next time. This process allows the model to gradually make better predictions over many iterations of training. It is used to train not only NNs but also other ML models, such as...

Using optimizers

At the heart of DL lies the optimization problem: finding the best set of model parameters (weights and biases) that minimize a chosen loss function. Optimization algorithms play a pivotal role in this journey by iteratively adjusting these parameters to reduce errors between predictions and actual target values.

Optimization is a fundamental concept in mathematics that refers to the process of finding the best or most favorable solution among a set of possible solutions. In the context of ML and DL, optimization is used to adjust model parameters to minimize a cost, objective, or loss function (all used interchangeably), leading to improved model performance. We have already covered that the gradient descent algorithm is used for optimization. However, there are different versions of the algorithm, and when constructing your NN, you can choose which of them to use.

Let’s consider some key aspects of optimization:

  • Objective function: Optimization...

Understanding embeddings

At its core, an embedding is a mapping from a high-dimensional space to a lower-dimensional space that captures essential characteristics or features of data in a more compact form. This transformation not only reduces the dimensionality of the data but also helps NNs process and understand it more effectively.

These compact, meaningful representations of data play a pivotal role in various applications, from NLP to recommendation systems. In this section, we’ll explore the concept of embeddings, their significance, and how they are employed to enhance the capabilities of NNs.

Word embeddings

Word embeddings are among the most renowned and widely used types of embeddings. They represent words as vectors in a continuous space, where each dimension of the vector corresponds to a semantic or syntactic feature of the word. This representation enables NNs to grasp meanings and relationships between words more intuitively.

Word embedding models...

Listing common network architectures

In the ever-evolving world of DL, network architectures serve as the blueprints for intelligence. Each architecture is a unique design, meticulously crafted to tackle specific challenges and excel in particular domains.

In this section, we’ll embark on a journey through the diverse terrain of NN architectures, from CNNs, which conquer image analysis, to RNNs, which master sequential data, and from the creative minds behind generative adversarial networks (GANs) to the memory-enhancing capabilities of long short-term memory (LSTM) networks. Here, we’ll list some common architectures and their applications.

Common networks

While explaining the distinctions between different network architectures is beyond the scope of this book, it is important to understand the basic differences between the most common networks. Here are some to keep in mind:

  • ANNs: ANNs consist of interconnected nodes (neurons) organized in layers ...

Introducing GenAI and LLMs

In the dynamic field of AI, language models stand as titans of NLU and generation. These models have not only revolutionized the way we interact with machines but have also sparked a renaissance in GenAI.

In this section, we’ll delve into the world of LLMs, which are generative language models trained on massive text corpora (think in terms of most of the public data available on the internet) and can contain billions of parameters. We will focus on exploring LLMs: their architecture, training, and the transformative impact they have had on various applications, from text generation to chatbots, language translation, and even creative storytelling.

Unveiling language models

At their core, language models are GenAI models – these are AI models that generate texts, images, or other forms of media.

Specifically, language models are probabilistic models that learn the patterns, structure, and semantics of NL through NLP tasks. These...

Summary

In this comprehensive exploration of DL, we embarked on a journey through the intricate landscapes of NNs, optimization algorithms, and fundamental concepts that underpin this transformative field. We began our voyage by deciphering NN fundamentals, understanding the building blocks of DL, and uncovering the power of activation functions, weight initialization, and embeddings. As we delved deeper, we navigated the seas of optimization, unraveling the intricacies of gradient descent, learning rates, and various optimization algorithms that guide the training of NNs. We also shed light on the vanishing and exploding gradient problems, which are crucial challenges to overcome in the pursuit of effective training.

Our odyssey continued with a tour of common network architectures, from CNNs mastering image analysis to RNNs and LSTMs excelling in sequential data tasks. We encountered the creative minds behind GANs, explored the power of transformers in NLU, and marveled at the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Cracking the Data Science Interview
Published in: Feb 2024Publisher: PacktISBN-13: 9781805120506
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Leondra R. Gonzalez

Leondra R. Gonzalez is a data scientist at Microsoft and Chief Data Officer for tech startup CulTRUE, with 10 years of experience in tech, entertainment, and advertising. During her academic career, she has completed educational opportunities with Google, Amazon, NBC, and AT&T.
Read more about Leondra R. Gonzalez

author image
Aaren Stubberfield

Aaren Stubberfield is a senior data scientist for Microsoft's digital advertising business and the author of three popular courses on Datacamp. He graduated with an MS in Predictive Analytics and has over 10 years of experience in various data science and analytical roles focused on finding insights for business-related questions.
Read more about Aaren Stubberfield