Hands-On Python Deep Learning for the Web

Demystifying Artificial Intelligence and Fundamentals of Machine Learning

"Just as electricity transformed almost everything 100 years ago, today I actually have a hard time thinking of an industry that I don't think AI will transform in the next several years."
- Andrew Ng

This quote may appear extremely familiar and it's needless to say that, as a statement, it is really strongly resonant with respect to the current technological disruption. Over the recent course of time, Artificial Intelligence (AI) has been a great area of interest to almost every industry. Be it an educational company, a telecommunications firm, or an organization working in healthcare —all of them have incorporated AI to enhance their businesses. This uncanny integration of AI and several other industries only promises to get better with time and solve critical real-world problems in intelligent ways. Today, our phones can make clinical appointments for us upon our instructions, our phone cameras can tell us several human-perceived attributes of the images they capture, and our car alarm systems can detect our driving gestures and can save us from possible accidents. The examples will only get better and better and will grow as intelligent as possible with advancements in research, technology, and the democratization of computing power.

As we step into the era of Software 2.0, it is extremely important to understand why a technology that has existed since the 1950s is making most of the headlines in recent times. Yes! Artificial intelligence was born in the 1950s when a handful of computer scientists and mathematicians such as Alan Turing started to think about whether machines could think and whether they could be empowered with intelligence so that they can answer questions on their own without being explicitly programmed.

Soon after this inception, the term artificial intelligence was first coined by John McCarthy in 1956 in an academic conference. From the question "Can machines think?" (proposed by Turing in his paper, entitled Computing Machinery and Intelligence) around 1950 to the current day in the 21^st century, the world of AI has shown some never-seen-before results that we could never have even thought of.

Today, it is almost impossible to think of a day without using the web. It has easily become one of our fundamental necessities. Our favorite search engines can directly answer our questions rather than give us a list of relevant links. They can analyze online text and detect their intent and summarize their content. All of this is possible because of AI.

This book aims to be a hands-on guide to the readers on how they can use AI techniques such as deep learning to make intelligent web applications based on computer vision, natural language processing, security, and lots more. This chapter provides the readers with a quick refresher on AI and its different types and the basic concepts of ML, and introduces some of the biggest names in the industry and what they are doing by fusing AI and web technologies. We will be covering the following aspects:

Introduction to AI and its different types
Machine Learning (ML): The most popular AI
A brief introduction to Deep Learning (DL)
The relationship between AI, ML, and DL
Fundamentals of ML
The web before and after AI
The biggest web-AI players and what they are doing

Introduction to artificial intelligence and its types

In a simpler sense, artificial intelligence is all about giving machines the ability to perform intelligently. For example, many of us can play chess. Essentially, we do this first by learning the fundamentals of playing the game and then we engage ourselves in actually playing the game with others. But can machines do this? Can machines learn on their own and play the game of chess with us?

AI attempts to make this possible by giving us the power to synthesize what we call intelligence in terms of some rules and instill it into machines. Machines as mentioned here can be anything that can compute. For example, it could be software or a robot.

There are actually several types of AI. The popular ones are the following:

Fuzzy systems
Expert systems
ML systems

The final type sounds the most familiar here. We will get to it in the next section. But before we proceed with it, it is a good time to take a look at some of the points that enable the AI advancements we are witnessing today.

Factors responsible for AI propulsion

The major factors that are driving the AI force are the following:

Data
Algorithmic advancements
Computer hardware advancements
The democratization of high-performance computing

Data

The amount of data we have today is enormous—as Hal Varian, Chief Economist at Google, put it in 2016:

"Between the dawn of civilization and 2003, we only created five exabytes; now we're creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes)—an increase of 50 times."

That's a lot of data. As the number of digital devices grows, this volume of data will only continue to grow exponentially. Gone are the times when a running car only displayed the speed on the speedometer. We're in an age where every part of the car can be made to produce logs at every split second, enabling us to entirely reconstruct any moment of the car's life.

The more a person gets to learn from life, the wiser the person becomes, and the better they can predict outcomes of events in the future. Analogically with machines, the greater the amount of (quality) data that a piece of software gets to train upon, the better it gets at predicting future unseen data.

In the last few years, the availability of data has grown manifold due to various factors:

Cheaper storage
Higher data transmission rates
Availability of cloud-based storage solutions
Advanced sensors
The Internet of Things
An increase in the various forms of digital electronic devices
Increased usage of websites and native apps

There are more digital devices now than ever. They are all equipped with systems that can generate logs at all times and transmit them over the internet to the companies that manufacture them or any other vendor that buys that data. Also, a lot of logs are created by the websites or apps people use. All of these are easily stored in cloud-based storage solutions or in physical storage of high storage capacity, which are now cheaper than before.

If you look around yourself, you will probably be able to see a laptop on which you regularly use several pieces of software and websites—all of which may be collecting data on every action you perform on them. Similarly, your phone acts as such a data-generating device. With a television with several channels provided by your television service provider—both the service provider and the channel provider are collecting data about you to serve you better and to improve their products. You can only imagine the massive amount of data a single person generates on a daily basis, and there are billions of us on this planet!

Advancements in algorithms

An algorithm is an unambiguous sequence of steps that leads to the solution of a given problem. Over time, with the expansion of science and human understanding of the laws of nature by the aid of mathematics, algorithms have seen improvements. More often than not, nature has inspired solutions to complex problems. A neural network is probably the most talked-about, nature-inspired algorithm in the present day.

When computer logic began with multiple if-else ladders, no one would ever have thought that one day we'd have computer programs that would learn to produce results similar to the if-else ladder without the need to write conditions manually. What's more, we have computer programs today that generate other programs that can simulate AI!

Surely, with each passing day, algorithms developed by humans and now, by machines too, are getting smarter and more powerful at performing their tasks. This has directly impacted the rise of neural networks, which, in their rudimentary form, seem to be a time-consuming super-nesting of loops to solve matrices and vector arithmetic problems.

Advancements in hardware

When Intel revealed its first Dynamic RAM module in 1970, it was capable of holding 1 KB of data. Approximately 50 years later, we've 128 GB RAM modules available in the market. That's nearly 1.28 x 10⁸ times as much memory space.

A similar trend has been exhibited by hard disks. With the first hard disk for personal computers being able to store a precious 5 megabytes, 2016 saw Seagate announcing a 60-terabyte storage on a solid-state drive. That's a 1.2 x 10⁷ fold increase.

But we've only yet talked about direct individual computing comparisons, without considering the effect of technological growth since the first computers were introduced. Today, with the advent of cloud computing, it's become common to hear someone talking about unlimited cloud storage.

AI has greatly benefited from this exponential increase in computing speed and data storage.

The democratization of high-performance computing

With the reducing costs of commodity hardware and their increasing performance capabilities, high-performance computing is not something exclusive to tech giants these days. Today, it is very easily possible for any single person to set up for their personal use a network of computing devices to facilitate high-performance computing if they're not already satisfied with the exceptional performance that can be delivered through single devices. However, investing in hardware is not the only way of availing high-performance computing. The emergence of cloud-based computing solutions has resulted in very high-speed computing infrastructure available with click-deploy methods. Users can, at any moment, launch a cloud-based instance over the network and run their performance-intensive software on it at minimal charges.

With high-performance computing becoming readily available to individual developers, the development of AI solutions has come into the hands of a wide community of developers. This has led to a boom in the number of creative and research-based applications of AI.

Let's now unravel the most popular form of AI as of the time of writing and discuss some important concepts regarding it.

ML – the most popular form of AI

Without taking any mathematical notations or too many theoretical details, let's try to approach the term Machine Learning (ML) from an intuitive perspective. For doing this, we will have to take a look at how we actually learn. Do you recollect, at school, when we were taught to identify the parts of speech in a sentence? We were presented with a set of rules to identify the part of the speeches in a sentence. We were given many examples and our teachers in the first place used to identify the parts of speeches in sentences for us to train us effectively so that we could use this learning experience to identify the parts of speeches in sentences that were not taught to us. Moreover, this learning process is fundamentally applicable to anything that we learn.

What if we could similarly train the machines? What if we could program them in such a way that they could learn from experiences and could start answering questions based on this knowledge? Well, this has already been done, and, knowingly or unknowingly, we are all taking the benefits yielded by this. And this is exactly what ML is when discussed intuitively. For a more formal, standard understanding, let's take a look at the following definition by Tom Mitchell in his book, Machine Learning:

"A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E."

The preceding definition is a more precise version of what we just discussed about ML from an intuitive perspective. It is important to note here that most AI wizardry that we see today is possible due to this form of AI.

We now have a fair idea of what ML is. Now, we will move to the next section, which discusses the most powerful subfield of ML—DL. We will not go into the bone-breaking mathematical details. Instead, we will break it down intuitively, as in this section.

What is DL?

Now comes the most exciting part and probably the hottest technical term of this century. Reality apart, we now understand the learning to some extent, so let's get to the first part of the term deep learning—deep.

DL is a type of machine learning but it is purely based on neural networks. We will take a look at neural networks too but in the next chapter. The basic objective of any machine learning system is to learn useful representations of the data given to it. But what makes DL different? It turns out that DL systems treat data as a representation of layers. For example, an image can be treated as a representation of layers of varying properties such as edges, contours, orientation, texture, and gradients. The following diagram from the book, Deep Learning with Python, by François Chollet captures this idea nicely:

In the preceding diagram, a DL system is being employed to classify an image of a hand-written digit. The system takes the image of the handwritten digit as its input and tries to learn its underlying representations. In the first layer, the system learns generic features such as strokes and lines. As the layers increase, it learns about the features that are more specific to the given image. The more the number of layers, the deeper the system gets. Let's take a look at the following definition, which is given by François Chollet in his book, Deep Learning with Python:

"The deep in deep learning isn't a reference to any kind of deeper understanding achieved by the approach; rather, it stands for this idea of successive layers of representations. How many layers contribute to a model of the data is called the depth of the model. [...] In deep learning, these layered representations are (almost always) learned via models called neural networks, structured in literal layers stacked on top of each other."

The definition quite aptly captures all of the necessary ingredients of DL and beautifully introduces the concept of treating data as a layered representation. So, a DL system, in a broad sense, breaks down the data into simple representations in a layered fashion, and to learn these representations, it often makes use of many layers (which is referred to as deep). We will now take a look at the big picture, which tells us how AI, ML, and DL are related to each other.

Revisiting the fundamentals of ML

We have already seen what is meant by ML. In this section, we will focus on several terminologies such as supervised learning and unsupervised learning, and we will be taking a look at the steps involved in a standard ML workflow. But you may ask: why ML? We are supposed to learn about the applications of DL in this book. We just learned that DL is a type of ML only. Therefore, a quick overview of the basic ML-related concepts will certainly help. Let's start with several types of ML and how they differ from each other.

Types of ML

ML encompasses a multitude of algorithms and topics. While every such algorithm that makes up an ML model is nothing but a mathematical computation on given data, the form of data that is provided and the manner of the task to be performed on it might hugely vary. Sometimes, you might want your ML model to predict future house prices based on the data of previous house prices with respect to details of the house such as the number of rooms and number of stories it has, and at other times, you might want your ML model to learn how to play computer games against you. You can easily expect the input data for the first task to be in tabular format, but for the second example, you might not be able to come up with the same. Hence, ML algorithms branch into three major categories and another form that derives from them, based on the input data they receive and the kind of output they are supposed to produce, namely, the following:

Supervised learning
Unsupervised learning
Reinforcement learning
Semi-supervised learning

The following diagram captures the three major types of ML, along with the hybrid form as a fourth type, and a very brief summary on each type:

You may have heard of the fourth form of ML—semi-supervised learning, which fuses both the worlds of supervised and unsupervised learning.

Let's now understand these types of ML in greater depth, according to how they function and the types of problems they can be used to solve.

Supervised learning

In this form of ML, the algorithm is presented with a huge number of training samples, which contain information about all of the parameters, or features, that would be used to determine an output feature. This output feature could be a continuous range of values or a discrete collection of labels. Based on this, supervised ML algorithms are divided into two parts:

Classification: Algorithms that produce discrete labels in the output feature, such as normal and not normal or a set of news categories
Regression: When the output feature has real values, for example, the number of votes a political party might receive in an election, or the temperature of a material at which it is predicted to reach its melting point

Most ML enthusiasts, when they begin their study of machine learning, tend to familiarize themselves with supervised learning first due to its intuitive simplicity. It has some of the simplest algorithms, which are easy to understand without a deep knowledge of mathematics and are even derived from what mathematics students learn in their final years at schools. Some of the most well known supervised learning algorithms are linear regression, logistic regression, support vector machines, and k-nearest neighbors.

Unsupervised learning

Unsupervised learning presents itself in scenarios where the training samples do not carry with them output feature(s). You could wonder then, what are we supposed to learn or predict in such situations? The answer is similarity. In more elaborate terms, when we have a dataset for unsupervised learning, we're usually trying to learn the similarity between the training samples and then to assign classes or labels to them.

Consider a crowd of people standing in a large field. All of them have features such as age, gender, marital status, salary range, and education level. Now, we wish to group them based on their similarities. We decide to form three groups and see that they arrange themselves in a manner of gender—a group of females, a group of males, and a group of people who identify with other genders. We again ask them to form subgroups within those groups and see what people make groups based on their age ranges—children, teenagers, adults, and senior citizens. This gives us a total of 12 such subgroups. We could make further smaller subgroups based on the similarity any two individuals exhibit. Also, the manner of grouping discussed in the preceding example is just one among several manners of forming groups. Now, say we have 10 new members joining the crowd. Since we already have our groups defined, we can easily sort these new members into those groups. Hence, we can successfully apply group labels to them.

The preceding example demonstrates just one form of unsupervised learning, which can be divided into two types:

Clustering: This is to form groups of training samples based on the similarity of their features.
Association: This is to find abstract associations or rules exhibited between features or training samples. For example, on analyzing a shop's sales logs, it was found that customers buy beer mostly after 7 p.m.

K-means clustering, DBSCAN, and the Apriori algorithm are some of the best-known algorithms used for unsupervised learning.

Reinforcement learning

Reinforcement learning (RL), is a form of ML wherein a virtual agent tries to learn how to interact with its surroundings in such a way that it can achieve the maximum reward from it for a certain set of actions.

Let's try to understand this with a small example—say you build a robot that plays darts. Now, the robot will get a maximum reward only when it hits the center of the dartboard. It begins with a random throw of dart and lands on the outermost ring. It gets a certain amount of points, say x1. It now knows that throwing near that area will yield it an expected value of x1. So, in the next throw, it makes a very slight change of angle and luckily lands in the second outermost right, fetching it x2 points. Since x2 is greater than x1, the robot has achieved a better result and it will learn to throw nearby this area in the future. If the dart had landed even further out than the outermost ring, the robot would keep throwing it near the first throw that it made until it got a better result.

Over several such trials, the robot keeps learning the better places to throw and makes small detours from those positions until it gets the next better place to throw at. Eventually, it finds the bull's eye and meets the highest points every time.

In the preceding example, your robot is the agent who is trying to throw a dart at the dartboard, which is the environment. Throwing the dart is the action the agent performs on the environment. The points the agent gets act as the reward. The agent, over multiple trials, tries to maximize the reward that it gets by performing the actions.

Some well-known RL algorithms are Monte Carlo, Q-learning, and SARSA.

Semi-supervised learning

While we have discussed the three major types of ML, there exists yet another type, which is semi-supervised learning. By the name of the term, you could guess that it would have to do something with a mix of labeled and unlabeled training samples. In most cases, the number of unlabeled training samples exceeds the number of labeled samples.

Semi-supervised learning has been used successfully to produce more efficient results when some labeled samples are added to a problem entirely belonging to unsupervised learning. Also, since only a few samples are labeled, the complexity of supervised learning is avoided. With this approach, we can produce better results than we would get from a purely unsupervised learning system and incur lesser computational cost than a pure supervised learning system.

Necessary terminologies

We have made ourselves familiar with different types of ML systems. Now, we will learn about some extremely important terminologies related to ML that will help us in the later chapters of this book.

Train, test, and validation sets

Any ML system is to be given data. Without data, it is practically impossible to design an ML system. We are not concerned about the quantity of the data as of now, but it is important to keep in mind that we need data to devise an ML system. Once we have that data, we use it for training our ML systems so that they can be used to predict something on the new data (something is a broad term here and it varies from problem to problem). So, the data that is used for training purposes is known as a train set and the data on which the systems are tested is known as a test set. Also, before actually employing the model on the test data, we tend to validate its performance on another set of data, which is called a validation set. Sometimes, we don't get the data in these nice partitions; we just get the data in a raw unfathomable format, which we further process and make these partitions with accordingly.

Technically, all of the instances in these three different sets are supposed to vary from each other while the distribution in the data is supposed to be the same. Nowadays, many researchers have found critical issues regarding these assumptions and have come up with something called adversarial training, which is out of the scope of this book.

Bias and variance

Bias and variance are very intrinsic to any ML model. Having a good understanding of them really helps in the further assessment of the models. The trade-off between the two is actually used by the practitioners to assess the performance of machine learning systems.

You are encouraged to see this lecture by Andrew Ng to learn more about this trade-off, at https://www.youtube.com/watch?v=fDQkUN9yw44&t=293s.

Bias is the set of assumptions that an ML algorithm makes to learn the representations underlying the given data. When the bias is high, it means that the corresponding algorithm is making more assumptions about the data and in the case of low bias, an algorithm makes as little an amount of assumptions as possible. An ML model is said to have a low bias when it performs well on the train set. Some examples of low-bias ML algorithms are k-nearest neighbors and support vector machines while algorithms such as logistic regression and naive Bayes are generally high-bias algorithms.

Variance in an ML context concerns the information present in the data. Therefore, high variance refers to the quality of how well an ML model has been able to capture the overall information present in the data given to it. Low variance conveys just the opposite. Algorithms such as support vector machines are generally high on variance and algorithms such as naive Bayes are low on variance.

Overfitting and underfitting

When an ML model performs very well on the training data but poorly on the data from either the test set or validation set, the phenomenon is referred to as overfitting. There can be several reasons for this; the following are the most common ones:

The model is very complex with respect to the data. A decision tree with very high levels and a neural network with many layers are good examples of model complexity in this case.
The data has lots of features but very few instances of the population.

In ML literature, the problem of overfitting is also treated as a problem of high variance. Regularization is the most widely used approach to prevent overfitting.

We have already discussed the concept of bias. A model has a low bias if it performs well on the training data, that is, the model is not making too many assumptions on the data to infer its representation. If the model fails miserably on the training data, it is said that the model has a high bias and the model is underfitting. There can be many reasons for underfitting as well. The following are the most common ones in this case:

The model is too simple to learn the underlying representation of the data given to it.
The features of the data have not been engineered well before feeding them to the ML model. The engineering part is very popularly known as feature engineering.

Based on this discussion, we can draw a very useful conclusion: an ML model that is overfitting might be suffering from the issue of high variance whereas an underfitting model might be suffering from the issue of high bias.

The discussion of overfitting and underfitting remains incomplete without the following diagram (shown by Andrew Ng during his flagship course, Machine Learning):

The preceding diagram beautifully illustrates underfitting and overfitting in terms of curvea fitting through the data points. It also gives us an idea of a model that generalizes well, that is, performs well on both the train and test sets. The model prediction line in blue is way off the samples, leading to underfitting, while in the case of overfitting, the model captures all points in the training data but does not yield a model that would perform well on data outside training data.

Often, the idea of learning representations of the data is treated as a problem of approximating a function that best describes the data. And a function can easily be plotted graphically like the previous one, hence, the idea of curve fitting. The sweet spot between underfitting and overfitting where a model generalizes well is called a good fit.

Training error and generalization error

The mistakes that a model makes while predicting during its training phase are collectively referred to as its training error. The mistakes that model makes when tested on either the validation set or the test set are referred to as its generalization error.

If we were to draw a relationship between these two types of error and bias and variance (and eventually overfitting and underfitting), this would look something like the following (although the relationship may not be linear every time as depicted in the diagrams):

If an ML model is underfitting (high bias), then its training error has to be high. On the other hand, if the model is overfitting (high variance), then its generalization error is high.

We will look at a standard ML workflow in the following section.

A standard ML workflow

Any project starts with a problem in mind and ML projects are no exception. Before starting an ML project, it is very important to have a clear understanding of the problem that you are trying to solve using ML. Therefore, problem formulation and mapping with respect to the standard ML workflow serve as good starting points in an ML project. But what is meant by an ML workflow? This section is all about that.

Designing ML systems and employing them to solve complex problems requires a set of skills other than just ML. It is good to know that ML requires knowledge of several things such as statistics, domain knowledge, software engineering, feature engineering, and basic high-school mathematics in varying proportions. To be able to design such systems, certain steps are fundamental to almost any ML workflow and each of these steps requires a certain skill set. In this section, we are going to take a look at these steps and discuss them briefly.

This workflow is inspired by CRISP-DM, which stands for Cross Industry Standard Process for Data Mining and is extremely widely used across many industries pertaining to data mining and analytics.

Data retrieval

As mentioned earlier in this chapter, ML systems need data for functioning. It is not available all of the time, in fact, most of the time, the data itself is not available in a format with which we can actually start training ML models. But what if there is no standard dataset for a particular problem that we are trying to solve using ML? Welcome to reality! This happens for most real-life ML projects. For example, let's say we are trying to analyze the sentiments of tweets regarding the New Year resolutions of 2018 and trying to estimate the most meaningful ones. This is actually a problem for which there is no standard dataset available. We will have to scrape it from Twitter using its APIs. Another great example is business logs. Business logs are treasures of knowledge. If effectively mined and modeled, they can help in many decision-making processes. But often, logs are not available directly to the ML engineer. So, the ML engineer needs to spend a considerable amount of time figuring out the structure of the logs and they might write a script so that the logs are captured as required. All of these processes are collectively called data retrieval or data collection.

Data preparation

After the data collection phase, we tend to prepare the data to feed it to the ML systems and this is known as data preparation. It is worth mentioning that this is the most time-consuming part of an ML workflow/pipeline. Data preparation includes a series of steps and they are as follows:

Exploratory data analysis
Data processing and wrangling
Feature engineering and extraction
Feature scaling and selection

This is one of the most time-consuming parts of an ML project. When we take a broader look at the process, we find that data identification and collection are also sometimes really important aspects as the correct format, as mentioned previously, might not always be available.

Exploratory Data Analysis (EDA)

After the data is collected, the first step in the data preparation stage is Exploratory Data Analysis, which is very popularly known as EDA. EDA techniques allow us to know the data in a detailed manner for better understanding. This is an extremely vital step in the overall ML pipeline because without good knowledge about the data itself, if we blindly fit an ML model to the data, it most likely will not produce good results. EDA gives us a direction in which to proceed and helps us to decide further steps in the pipeline. EDA involves many things such as calculating useful statistics about the data and determining whether the data suffers from any outliers. It also comprises effective data visualization, which helps us to interpret the data graphically and therefore helps us to communicate vital facts about the data in a meaningful way.

In short, EDA is all about getting to know about the data better.

Data processing and wrangling

We have performed some statistical analyses on the data. Now what? Most of the time, the data that is collected from several data sources is present in its raw form, which cannot be fed to an ML model, hence the need for further data processing.

But you might ask, why not collect the data in a way so that it gets retrieved with all of the necessary processing done? This is typically not a good practice as it breaks the modularity of the workflow.

This is why to make the data consumable in the later steps in the workflow, we need to clean, transform, and persist it. This includes several things such as data normalization, data standardization, missing value imputation, encoding from one value to another, and outlier treatment. All of these are collectively named data wrangling.

Feature engineering and extraction/selection

Consider a situation where an employee from an analytics firm is given the company's billing data and is asked by their manager to build a machine learning system with it so the company's overall financial budget could be optimized. Now, this data is not in a format that can be given directly to an ML model since ML models expect data in the form of numeric vectors.

Although the data might be in good shape, the employee will still have to do something to convert that data into a favorable form. Given that the data is already wrangled, they still need to decide what features he is they are going to include in the final dataset. Practically, anything measurable can be a feature here. This is where good domain knowledge comes. This knowledge can help the employee to choose the features that have high predictive power. It may sound a bit light-weight, but it requires a lot of skills and it is definitely a challenging task. This is a classic example of feature engineering.

Sometimes, we employ several techniques that help us in the automatic extraction of the most meaningful features from a given dataset. This is particularly useful when the data is very high dimensional and the features are hard to interpret. This is known as feature selection. Feature selection not only helps to develop an ML model with the data that has the most relevant features but it also helps to enhance the model's predictive performance and to reduce its computation time.

Apart from feature selection, we might want to reduce the dimensionality of the data to better visualize it. Besides, dimensionality reduction is also employed to capture a representative set of features from the complete set of data features. Principal Component Analysis (PCA) is one such very popular dimensionality reduction technique.

It is important to keep in mind that feature selection and dimensionality reduction are not the same.

Modeling

We have finally come to the step that appears to be the most exciting one—the ML modeling part. But it is worth noting here that a good ML project is not just about this part. All of the previously mentioned parts contribute equally to the standard of the project. In fact, it matters a lot how the data is being collected for the project, and for this, we are helped by powerful data engineers. For now, let's leave that part aside.

We already have the data in pretty good shape by now. In the process of modeling the data, we feed the training data to ML models for training them, we monitor their training progress and tune different hyperparameters so their performance is optimized, and we evaluate the model on the test set. Model comparison is also a part of this phase. It is indeed an iterative process and involves trial and error to some extent.

The main objective here is to come up with an ML model that best represents the data, that is, it generalizes well. Computation time is another factor we must consider here because we want a model that performs well but within a feasible time frame and thereby optimizing a certain business outcome.

Following are the parts that constitute the core of modeling:

Model training
Model evaluation
Model tuning

Model training

This is the fundamental part of modeling as we introduce the data to different ML models and train the model so that it can learn the representations of the data holistically. We can see how a model is making progress during its training using training error. We often bring validation error (which means we validate the model training simultaneously) into this picture as well, which is a standard practice. Most of the modern libraries today allow us to do this and we will see it in the upcoming chapters of this book. We will now discuss some of the most commonly used error metrics.

Model evaluation

We have trained an ML model but how well will the model perform on the data it has never seen before? We answer this question using model evaluation.

Different machine learning algorithms call for different evaluation metrics.

For supervised learning methods, we usually use the following:

The confusion matrix, which is a matrix consisting of four values: True Positive, False Positive, True Negative, and False Negative
Accuracy, precision, recall, and F1-score (these are all byproducts of the confusion matrix)
The Receiver Operator Characteristic (ROC) curve and the Area Under Curve (AUC) metric
R-square (coefficient of determination), Root Mean Square Error (RMSE), F-statistic, Akaike Information Criterion (AIC), and p-values specifically for regression models

Throughout this book, we will be incorporating these metrics to evaluate our models. Although these are the most common evaluation metrics, be it for ML or DL, there are more specific evaluation metrics that correspond to different domains. We will get to that as well as we go along.

It worth mentioning here that we often tend to fall into the trap of the accuracy paradox in the case of classification problems where the data is imbalanced. In these cases, classification accuracy only tells one part of the story, that is, it gives the percentage of correct predictions made out of the total number of predictions made. This system fails miserably in the case of imbalanced datasets because accuracy does not capture how well a model is performing at predicting the negative instances of the dataset (which is originally the problem—predicting the uncommon class(es)).

Following are the most commonly used metrics for evaluating unsupervised methods such as clustering:

Silhouette coefficients
Sum of squared errors
Homogeneity, completeness, and the V-measure
The Calinski-Harabasz index

The evaluation metrics/error metrics remain the same for a train set, a test set, or a validation set. We cannot just jump to a conclusion just by looking at the performance of a model on the train set.

Model tuning

By this phase, we should have a baseline model with which we can go further for tuning the model to make it perform even better. Model tuning corresponds to hyperparameter tuning/optimization.

ML models come with different hyperparameters that cannot be learned from model training. Their values are set by the practitioners. You can compare the hyperparameter values to the knobs of an audio equalizer where we manually adjust the knobs to have the perfect aural experience. We will see how hyperparameter tuning can drastically enhance the performance of a model in later chapters.

There are several techniques for tuning hyperparameters and the most popularly incorporated are the following:

Grid searching
Random searching
Bayesian optimization
Gradient-based optimization
Evolutionary optimization

Model comparison and selection

After we are done with the model tuning part, we would definitely want to repeat the whole modeling part for models other than the current one in the hope that we might get better results. As ML practitioners, it is our job to ensure that the model we have finally come up with is better than the other ones (obviously in various aspects). Naturally, comparing different ML models is a time-consuming task and we may not be able to always afford to do this when we need to meet short deadlines. In cases like this, we incorporate the following aspects of an ML model:

Explainability, which answers a given question (how interpretable is the model and how easily it can be explained and communicated?)
In-memory versus out-of-memory modeling
The number of features and instances in the dataset
Categorical versus numerical features
The nonlinearity of the data
Training speed
Prediction speed

These metrics are the most popular ones but it hugely depends on the problem at hand. When these metrics do not apply, a good rule of thumb is to see how a model is performing on the validation set.

Deployment and monitoring

After a machine learning model is built, it is merged with the other components of an application and is taken into production. This phase is referred to as model deployment. The true performance of the developed ML model is evaluated after it is deployed into real systems. This phase also involves thorough monitoring of the model to figure out the areas where the model is not performing well and which aspects of the model can be improved further. Monitoring is extremely crucial as it provides the means to enhance the model's performance and thereby enhance the performance of the overall application.

So, that was a kind of a primer of the most important terminologies/concepts required for an ML project.

For a more rigorous study of the basics of ML, you are encouraged to go through these resources: Machine Learning Crash Course by Google (https://developers.google.com/machine-learning/crash-course/) and Python Machine Learning by Sebastian Raschka (https://india.packtpub.com/in/big-data-and-business-intelligence/python-machine-learning).

For easy reference, you may refer to the following diagram as given in the book, Hands-on Transfer Learning with Python (by Dipanjan et. al), which depicts all of the preceding steps pictorially:

Practically, ML has brought about a lot of enhancements across a wide range of sectors and almost none are left to be impacted by it. This book is focused on building intelligent web applications. Therefore, we will start the next section by discussing the web in general and how it has changed since the advent of AI from a before-and-after perspective. Eventually, we will study some big names and how they are facilitating AI for building world-class web applications that are not only intelligent but also solve some real problems.

The web before and after AI

If you have been a regular user of the World Wide Web since 2014, you'd agree to a visible rapid flurry of changes in websites. From solving ReCaptcha challenges of increasingly illegible writing to being automatically marked as human in the background, web development has been one of the forerunners in the display of the wealth of artificial intelligence that has been created over the last two decades.

Sir Tim Berners-Lee, attributed as the inventor of the internet, has put forward his views on a Semantic Web:

"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A "Semantic Web", which makes this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy, and our daily lives will be handled by machines talking to machines. The "intelligent agents" people have touted for ages will finally materialize."

From serving static pages with tons of information visible in them and links that permanently take you to related resources, the web is now an ever-changing portal of information generated dynamically. You might never see the same view of a web page again if you refresh it.

Let's understand some of the most important shifts in web development that have come about due to the rise of AI.

Chatbots

If you have ever wondered how some web pages provide 24/7 live help through chat on their websites, the answer would almost always be a chatbot is answering your queries from the other end. When in 1966 Joseph Weizenbaum's ELIZA chatbot created waves across the world by beating the Turing Test, we would never have thought of the impact chatbots would create in the World Wide Web (a reason for this, though, could be that ARPANET was itself only created in 1969).

Today, chatbots are everywhere. Many Fortune 500 companies are pursuing research in the domain and have come out with implementations of chatbots for their products and services. In a recent survey done by Oracle, featuring responses from 800 top executives of several companies and startups, it was found that nearly 80% of them said they had already used or were planning to use a chatbot in their customer-facing products by 2020.

Before AI began powering chatbots, as in the case with ELIZA (and its successor ALICE), chatbots were mostly about a fixed set of responses mapped to several input patterns. Coming across the word mother or father in a sentence entered by the user would almost certainly produce a response asking about the family of the user or their well-being. This clearly wasn't the response desired if the user wrote something like "I do not want to talk about XYZ's family".

And then, there is the famous "sorry, I did not get that" response of such rule-based chatbots, which made them appear quite stupid at times. The advent of neural-network-based algorithms saw chatbots being able to understand and customize responses based on user emotion and the context of the user input. Also, some chatbots scrape online data in case of encountering any new query and build up answers in real time about the topics mentioned in the new, unknown queries. Apart from that, chatbots have been used to provide alternative interfaces to business portals. It is now possible to book hotels or flights over a chatbot platform provided by WhatsApp.

Facebook Messenger's bot platform saw over 100,000 bots created in the first 17 months of its being opened to the public. Hundreds of pages on the social networking giant today have automated responses for users who send messages to their pages. Several bots are running on Twitter that can create content, closely mimicking a human user, and can respond to messages or comments made on their posts.

You can chat with an online version of ELIZA at eliza.botlibre.com.

Web analytics

In the early years of the internet, many websites carried odometer-style counters embedded in them. These were simple counts of the number of hits the website or a particular page had received. Then, they grew in their available formats—plain counters, counters per day/week/month, and even geolocation-based counters.

The collection of data, which is essentially the logs of the interactions of users and how they interact with a web-based application, processing this data to produce performance indicators, and then finally to identify measures that can be taken by a company to improve their web application is collectively known as web analytics.

Since the invention of the internet, web applications today generate a huge amount of logs every moment. Even leaving your mouse pointer idle on a web page might be getting reported to a Google Analytics dashboard, from where the webmaster would be able to see which pages are being viewed by users and how much time they are spending on the pages. Also, the flow users take between pages would be a very interesting metric.

While the earliest web analytics tools would merely measure page hits, being able to create a map of how many times a given page was visited and how many times it was a unique user, they could hardly provide anything about the visiting patterns of users, unless they were specifically hardcoded, which would be presented in very generalized manners and were never website specific. The same form of analytics was being provided to a company doing e-commerce as was being provided to a personal website.

With the revolution that AI brought around in the web analytics domain, tools today that deploy the power of artificial intelligence can come up with future predictions of website performance and even suggest removing or adding specific content on a web page to improve user engagement with that page.

Spam filtering

When half the emails being sent across the world are marked spam, it's an issue. While at first thought, we associate fraudulent and unnecessary emails promoting businesses and products as spam, that's only a part of the definition. It is important to realize that even good, quality content when posted on the same document several times over is spam. Furthermore, the web has evolved since the term spam was first used in Usenet groups. What was initially an activity performed with the intention of annoying people, or driving in messages forcefully to certain target users, spam today is much more evolved and potentially a lot more dangerous—from being able to track your browser activity to identity theft, there is a lot of malicious spam on the internet today that compromises user security and privacy.

Today, we have spam of various kinds—instant messenger spam, website spam, advertisement spam, SMS spam, social media spam, and many other forms.

Apart from a few, most types of spam are exhibited on the internet. It is hence critical to be able to filter spam and take protective measures against it. While the most initial spam-fighting began as early as the 1990s with identifying the IP addresses that were sending out spam emails, it was soon realized to be a highly inefficient method to do so as the blacklist grew large and its distribution and maintenance became a pain.

In the early 2000s, when Paul Graham published a paper titled A Plan for Spam, for the first time, an ML model—Bayesian filtering—was deployed to fight spam. Soon, several spam-fighting tools were spun from the paper and proved to be efficient.

Such was the impact of Bayesian filtering method against spam that, at the World Economic Forum in 2004, the founder of Microsoft, Bill Gates went forward to say that:

"Two years from now, spam will be solved."

Bill Gates, however, as we know today, could not have been more wrong in this one prediction. Spam evolved, with spammers studying Bayesian filtering and finding out ways to avoid being marked as spam in the detection phase. Today, neural networks are deployed on large scale, continuously scanning new emails and taking decisions on determining spam or non-spam content, which could not have been logically reached by a human by merely studying logs of email spam.

Search

One of the most strongly impacted domains by the rise of AI has been web search. From its humble beginnings of having to know the exact wording of the particular web page's title that you wished to visit, to search engines being able to identify songs that are audible in your environment, the domain has been entirely transformed due to AI.

When in 1991, Tim Berners-Lee set up the World Wide Web Virtual Library, it looked something like this:

It was a collection of manually listed web pages, filterable by the search box, which appeared at the right-top. Clearly, instead of trying to predict what the user was intending to find, the user himself/herself had to decide the category to which their search term would belong to.

The current face of the web search engines was introduced by Johnathan Fletcher in December 1993, when he created JumpStation, the first search engine to use the modern-day concepts of crawling, indexing, and searching. The appearance used by JumpStation was how we see the leading search providers such as Google and Bing today, and made Johnathan the "Father of the search engine".

Two years later, in December 1995, when AltaVista was launched, it brought a radical shift in search technology—unlimited bandwidth, search tips, and even allowing natural language queries—a feature brought in more strongly by Ask Jeeves in 1997.

Google came around in 1998. And it brought with itself the technology of PageRank. However, several contenders were present in the market, and Google didn't dominate the search engine game right then. Five years later, when Google filed its patent for using neural networks to customize search results based on users' previous search history and record of visited websites, the game shifted very quickly toward Google becoming the strongest provider in the search domain.

Today, a huge code base, deploying several deep neural networks working in coherence, powers Google Search. Natural language processing, which majorly deploys neural networks, has allowed Google to determine the content relevancy of web pages, and machine vision thanks to Convolutional Neural Networks (CNNs) has been able to produce accurate results visible to us in the Google Image Search. It should not come as a surprise that John Ginnandrea led Google Search and introduced the Knowledge Graph (the answers Google sometimes comes up with on certain questions such as queries); he's one of the most sought-after specialists in AI and has now been recruited by Apple, to improve Siri, which is again a neural network product.

Biggest web-AI players and what are they doing with AI

The growth spurt of AI saw several contenders running to make the most out of it. Over the last two decades, several individuals, start-ups, and even huge industrialists have sought to reap the benefits offered by the applications of AI. There are products in the market to whom artificial intelligence serves as the very heart of their business.

"War is 90% information."
- Napoleon Bonaparte, 18th Century A.D.

In the Second World War, the Allied forces deployed bomber aircraft. These were key to the strategies employed by the Allied forces. But somehow, these bombers failed to deliver due to them being shot down in large numbers when in enemy territory. It was clear that the bombers needed more armor. But due to the weight of armor, it was not possible to entirely cover the aircraft. Hence, it was decided that the most critical areas of the aircraft should be covered up with extra armor. Abraham Wald, a Jewish mathematician, was asked to come up with a way to determine which areas of the aircraft had to be armor-plated. He studied the aircraft that had come back from battle and made note of which areas carried the most bullet marks.

It was found that the wings, the nose, and tail were the parts that carried the highest number of bullet marks, and it was concluded that these were the parts that needed more armor, while the cockpit and the engines displayed the least bullet holes:

But surprisingly, going against the regular method of thought, Wald suggested that it was the cockpit and the engines that needed armor because it was those bombers that were not returning. Bullets in the tail, wings, and nose could not deal fatal damage to the aircraft and hence they returned successfully.

This is how, working with data and identifying the correct pattern, the entire course of the Second World War was changed by a mathematician. Data has been termed as the new oil. What makes it more interesting is that when you have oil, you burn it to produce electricity and energy, to drive vehicles. But with data, you use it to improve business and make decisions, which, in the future, produce more data. The companies that realized this and took the most benefit out of the data available have seen huge growth in recent times. Let's explore what few of such companies are doing with all of the data available, using AI.

Google

A name that comes to almost every mind as soon as the term AI is mentioned, Google has revolutionized and pushed the edges of AI continuously.

"We are now witnessing a new shift in computing: the move from a mobile-first to an AI-first world." -Sundar Pichai, CEO, Google

Google has been using AI across several of its products; let's go through some of them here.

Google Search

Searching for who is the google ceo on December 14, 2018 brought up a results page resembling the following screenshot:

The preceding feature, which generates answers to commonly asked questions, is known as the Google Knowledge Graph, which we mentioned in an earlier section. Besides this one feature, Google Search has grown exponentially more powerful due to AI techniques such as natural language processing and information extraction.

The ability to come up with exact timings in a video that relate to a query made by the user is possible, all thanks to AI:

Next, we will look at Google Translate.

Google Translate

Supporting over 100 languages, Google Translate is probably the best translation tool publicly available on the internet. From being able to detect the language being fed into it to converting it into the desired language as set by the user, there's a deep mesh of neural networks running in the background to produce the best results. This algorithm, to which Google switched in November 2016, was named the Google Neural Machine Translation algorithm. It is available on the web as an API to web developers who wish to translate their website's content in real time to be able to cater to users of different locales. Also, the service is integrated with Google Chrome, the browser made by Google, and provides real-time translations of web pages as soon as the user visits them in the browser.

Google Assistant

One of the most recent ventures of Google, Google Assistant, is a competitor to Apple's Siri and Microsoft's Cortana and a successor of Google Now. It is an AI-powered virtual assistant available on mobile and smart home devices (branded as Google Home). Currently, it can make searches on the user's Google Drive data, produce results based on the user's preferences, provide reminders of notes given by the user, dial numbers, send text messages, and much more as directed by the user either by normal tap-input on touch screens or by voice input:

Next, we will look at other products.

Other products

AI is one of the primary technologies powering Google Ads. Click baiting or the problem of fake clicks was solved using neural networks. Further, determining which type of ads performed best down to the level of each single web page is efficiently facilitated by the use of AI. These technological advancements of Google's ad services made it rapidly grab the internet advertisement space from the preexisting advertising platforms.

Google projects such as Google Lens, self-driving cars, and many others have been primarily AI-based projects.

Facebook

Being the largest social networking platform on the internet with several profiles, Facebook generates a huge amount of data on a daily basis. Data of its users posting content, reports made by the users, logs of the various APIs provided by Facebook, and so on all add up to nearly 4 petabytes of data generated every day. Needless to say, the tech giant has capitalized on this data gold and come up with ways to make its platform safer for users and to boost user engagement.

Fake profiles

A primary issue faced by Facebook was the presence of fake profiles in huge numbers. To deal with them, Facebook deployed AI-based solutions to automatically mark and challenge such profiles to confirm their identity. In the first quarter of 2018 alone, Facebook disabled nearly 583 million fake or clone accounts.

Fake news and disturbing content

Another issue faced by Facebook and their acquired messaging service, WhatsApp, was the issue of fake news or misleading news. Also, adding to the degradation of user experience was the presence of visually and/or emotionally disturbing content on the platform. And finally, there was something that nearly all online platforms had to fight: spam. Facebook's AI algorithms over the years have become very good at identifying and removing spam. By the application of computer vision solutions facilitated by the usage of CNNs, Facebook has been able to come up with a feature that covers/blurs visually disturbing images and videos and asks for user consent before allowing users to view them.

Work on identifying and taking down fake news is currently under progress and is almost entirely being done by the application of AI.

Other uses

Facebook provides its own Messenger bot platform, which is hugely used by Facebook pages and developers to add rich interaction into the instant messaging service provided by the company.

Amazon

The leading e-commerce platform on the internet, Amazon has incorporated AI in almost all of its products and services. While a late-comer to the AI party being enjoyed by Google, Facebook, Microsoft, and IBM, Amazon quickly grew and attracted attention to the various uses it put AI to. Let's go through some of the major applications that Amazon came out with.

Alexa

The AI that powers all Alexa and Echo devices produced by the company, Alexa is the name given to the virtual assistant AI developed in direct competition with Google Home, which was powered by Google Assistant (formerly Google Now). Not debating on which is better, Alexa is a fairly advanced AI, being able to produce answers to questions that many users have found interesting and witty. Alexa products have recently seen a rise in adoption with Amazon's move to make Alexa Skills Studio available to developers publicly, who added greatly to the actions that Alexa can perform.

Amazon robotics

As soon as a user buys a product from the website, a robot sitting in the sprawling huge 855,000 square-foot fulfillment center at Kent, Washington (obviously, only for products available there) stirs up, lifts a large crate of products, and makes its way toward the site, carrying the very product sold on the platform, where a worker picks it up from the crates to further process it. Amazon recently equipped its Milwaukee fulfillment center with the same technology after a very successful run previously and plans to extend it to 10 other large centers soon.

DeepLens

An artificial intelligence-enabled video camera would have been the ultimate geek fantasy in the early 2000s. With the coming of Amazon's DeepLens, which is exactly that, the possibilities opened up are endless. Imagine a situation where you are a host to a party and you get notified of every guest who comes in, directly on your phone. Surprisingly enough, this has been achieved and experiments have even been done on equipping public places with CCTV cameras that can identify criminals and trigger alerts automatically.

Kaushik Jun 23, 2021

Very nice book, it contains a lot of knowledge that I ever wanted.Love at first sight for me🤭🤭💓

Amazon Verified review

Rebeka Mukherjee Jul 01, 2020

Disclaimer: I was sent a copy by the publisher and was asked to review this book.Pros: I love that anybody with basic coding skills and an interest in exploring deep learning and web development can work through the projects in this book. The book begins with a nice refresher of introductory deep learning concepts and then moves on to the implementation of web-based deep learning projects that are easy to follow. I also like the variety of the projects in this book.Cons: Since this is a hand-on book and not a theory book, the mathematics behind deep learning concepts is kept to a minimum. If one wants deeper knowledge of deep learning, they have to look elsewhere. I also feel this book is a bit more focused on the web development part than the deep learning part.

Hands-On Python Deep Learning for the Web: Integrating neural network architectures to build smart web apps with Flask, Django, and TensorFlow

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

People who bought this also bought

About the authors

FAQs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access