Home Data Deep Learning and XAI Techniques for Anomaly Detection

Deep Learning and XAI Techniques for Anomaly Detection

By Cher Simon
books-svg-icon Book
eBook $37.99 $25.99
Print $46.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $37.99 $25.99
Print $46.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Chapter 1: Understanding Deep Learning Anomaly Detection
About this book
Despite promising advances, the opaque nature of deep learning models makes it difficult to interpret them, which is a drawback in terms of their practical deployment and regulatory compliance. Deep Learning and XAI Techniques for Anomaly Detection shows you state-of-the-art methods that’ll help you to understand and address these challenges. By leveraging the Explainable AI (XAI) and deep learning techniques described in this book, you’ll discover how to successfully extract business-critical insights while ensuring fair and ethical analysis. This practical guide will provide you with tools and best practices to achieve transparency and interpretability with deep learning models, ultimately establishing trust in your anomaly detection applications. Throughout the chapters, you’ll get equipped with XAI and anomaly detection knowledge that’ll enable you to embark on a series of real-world projects. Whether you are building computer vision, natural language processing, or time series models, you’ll learn how to quantify and assess their explainability. By the end of this deep learning book, you’ll be able to build a variety of deep learning XAI models and perform validation to assess their explainability.
Publication date:
January 2023
Publisher
Packt
Pages
218
ISBN
9781804617755

 

Understanding Deep Learning Anomaly Detection

Anomaly detection is an active research field widely applied to many commercial and mission-critical applications, including healthcare, fraud detection, industrial predictive maintenance, and cybersecurity. It is a process of discovering outliers, abnormal patterns, and unusual observations that deviate from established normal behaviors and expected characteristics in a system, dataset, or environment.

Many anomaly detection applications require domain-specific knowledge to extract actionable insights in a timely manner for informed decision-making and risk mitigation. For example, early detection of equipment performance degradation prevents unplanned downtime, whereas early discovery of disease threats prevents a pandemic outbreak.

The advent of cloud technologies, unlimited digital storage capacity, and a plethora of data have motivated deep learning research for anomaly detection. Detecting outliers requires an enormous dataset because anomalies are rare by nature in the presence of abundance. For example, detecting abnormal machinery vibrations and unusual power consumption or temperature increases allows companies to plan for predictive maintenance and avoid expensive downtime.

Deep learning anomaly detection has shown promising results in addressing challenges with the rare nature of anomalies, complex modeling of high-dimensional data, and identifying novel anomalous classes. The primary interest in anomaly detection is often focused on isolating undesirable data instances, such as product defects and safety risks, from the targeted domain. Other interests include improving model performance by removing noisy data or irrelevant outliers and identifying emerging trends from the dataset for a competitive advantage.

This chapter covers an overview of anomaly detection with the following topics:

  • Exploring types of anomalies
  • Discovering real-world use cases
  • Considering when to use deep learning and what for
  • Understanding challenges and opportunities

By the end of this chapter, you will have an understanding of the basics of anomaly detection, including real-world use cases, and the role of deep learning in accelerating outlier discovery. You will also have gained a sense of existing challenges and growth potential in leveraging deep learning techniques for anomaly detection.

 

Technical requirements

For this chapter, you will need the following components for the example walkthrough:

  • PyOD – An open-source Python library for outlier detection on multivariate data
  • Matplotlib – A plotting library for creating data visualizations
  • NumPy – An open-source library that provides mathematical functions when working with arrays
  • Pandas – A library that offers data analysis and manipulation tools
  • Seaborn – A Matplotlib-based data visualization library
  • TensorFlow – An open-source framework for building deep learning applications

Sample Jupyter notebooks and requirements files for package dependencies discussed in this chapter are available at https://github.com/PacktPublishing/Deep-Learning-and-XAI-Techniques-for-Anomaly-Detection/tree/main/Chapter1.

You can experiment with this example on Amazon SageMaker Studio Lab, https://aws.amazon.com/sagemaker/studio-lab/, a free ML development environment that provides up to 12 hours of CPU or 4 hours of GPU per user session and 15 GiB storage at no cost. Alternatively, you can try this on your preferred Integrated Development Environment (IDE).

Before exploring the sample notebooks, let’s cover the types of anomalies in the following section.

 

Exploring types of anomalies

Before choosing appropriate algorithms, a fundamental understanding of what constitutes an anomaly is essential to enhance explainability. Anomalies manifest in many shapes and sizes, including objects, vectors, events, patterns, and observations. They can exist in static entities or temporal contexts. Here is a comparison of different types of anomalies:

  • A point anomaly exists in any dataset where an individual data point is out of the boundary of normal distribution. For example, an out-of-norm expensive credit card purchase is a point anomaly.
  • A collective anomaly only occurs when a group of related data records or sequences of observations appear collectively and significantly differ from the remaining dataset. A spike of errors from multiple systems is a collective anomaly that might indicate problems with downstream e-commerce systems.
  • A contextual anomaly occurs when viewed against contextual attributes such as day and time. An example of a temporal contextual anomaly is a sudden increase in online orders outside of expected peak shopping hours.

An anomaly has at least one (univariate) or multiple attributes (multivariate) in numerical, binary, continuous, or categorical data types. These attributes describe the characteristics, features, and dimensions of an anomaly. Figure 1.1 shows examples of common anomaly types:

Figure 1.1 – Types of anomalies

Figure 1.1 – Types of anomalies

Defining an anomaly is not a straightforward task because boundaries between normal and abnormal behaviors can be domain-specific and subject to risk tolerance levels defined by the business, organization, and industry. For example, an irregular heart rhythm from electrocardiogram (ECG) time series data may signal cardiovascular disease risk, whereas stock price fluctuations might be considered normal based on market demand. Thus, there is no universal definition of an anomaly and no one-size-fits-all solution for anomaly detection.

Let’s look at a point anomaly example using PyOD and a diabetes dataset from Kaggle, https://www.kaggle.com/datasets/mathchi/diabetes-data-set. PyOD, https://github.com/yzhao062/pyod. PyOD is an open source Python library that provides over 40 outlier detection algorithms, covering everything from outlier ensembles to neural network-based methods on multivariate data.

Sample Jupyter notebooks and requirements files for package dependencies discussed in this chapter are available at https://github.com/PacktPublishing/Deep-Learning-and-XAI-Techniques-for-Anomaly-Detection/tree/main/Chapter1.

You can experiment with this example on Amazon SageMaker Studio Lab, https://aws.amazon.com/sagemaker/studio-lab/, a free ML development environment that provides up to 12 hours of CPU or 4 hours of GPU per user session and 15 GiB storage at no cost. Alternatively, you can try this on your preferred Integrated Development Environment (IDE). A sample notebook, chapter1_pyod_point_anomaly.ipynb, can be found in the book's GitHub repo. Let’s get started:

  1. First, install the required packages using provided requirements file.
    import sys
    !{sys.executable} -m pip install -r requirements.txt
  2. Import essential libraries.
    %matplotlib inline
    import pandas as pd
    import numpy as np
    import warnings
    from pyod.models.knn import KNN
    from platform import python_version
    warnings.filterwarnings('ignore')
    print(f'Python version: {python_version()}')
  3. Load and preview the dataset, as shown in Figure 1.2:
    df = pd.read_csv('diabetes.csv')
    df.head()
Figure 1.2 – Preview dataset

Figure 1.2 – Preview dataset

  1. The dataset contains the following columns:
    • Pregnancies: Number of times pregnant
    • Glucose: Plasma glucose concentration in an oral glucose tolerance test
    • BloodPressure: Diastolic blood pressure (mm Hg)
    • SkinThickness: Triceps skin fold thickness (mm)
    • Insulin: 2-hour serum insulin (mu U/ml)
    • BMI: Body mass index (weight in kg/(height in m)^2)
    • DiabetesPedigreeFunction: Diabetes pedigree function
    • Age: Age (years)
    • Outcome: Class variable (0 is not diabetic and 1 is diabetic)
  2. Figure 1.3 shows the descriptive statistics about the dataset:
    df.describe()
Figure 1.3 – Descriptive statistics

Figure 1.3 – Descriptive statistics

  1. We will focus on identifying point anomalies using the Glucose and Insulin features. Assign model feature and target column to the variables:
    X = df['Glucose']
    Y = df['Insulin']
  2. Figure 1.4 is a scatter plot that shows the original data distribution using the following code:
    import matplotlib.pyplot as plt
    plt.scatter(X, Y)
    plt.xlabel('Glucose')
    plt.ylabel('Blood Pressure')
    plt.show()
Figure 1.4 – Original data distribution

Figure 1.4 – Original data distribution

  1. Next, load a K-nearest neighbors (KNN) model from PyOD. Before predicting outliers, we must reshape the target column into the desired input format for KNN:
    from pyod.models.knn import KNN
    Y = Y.values.reshape(-1, 1)
    X = X.values.reshape(-1, 1)
    clf = KNN()
    clf.fit(Y)
    outliers = clf.predict(Y)
  2. List the identified outliers. You should see the output as shown in Figure 1.5:
    anomaly = np.where(outliers==1)
    anomaly
Figure 1.5 – Outliers detected by KNN

Figure 1.5 – Outliers detected by KNN

Figure 1.6 shows a preview of the identified outliers:

Figure 1.6 – Preview outliers

Figure 1.6 – Preview outliers

  1. Visualize the outliers and inliers distribution, as shown in Figure 1.7:
    Y_outliers = Y[np.where(outliers==1)]
    X_outliers = X[np.where(outliers==1)]
    Y_inliers = Y[np.where(outliers==0)]
    X_inliers = X[np.where(outliers==0)]
    plt.scatter(X_outliers, Y_outliers, edgecolor='black',color='red', label= 'Outliers')
    plt.scatter(X_inliers, Y_inliers, edgecolor='black',color='cyan', label= 'Inliers')
    plt.legend()
    plt.ylabel('Blood Pressure')
    plt.xlabel('Glucose')
    plt.savefig('outliers_distribution.png', bbox_inches='tight')
    plt.show()
Figure 1.7 – Outliers versus inliers

Figure 1.7 – Outliers versus inliers

  1. PyOD computes anomaly scores using decision_function for the trained model. The larger the anomaly score, the higher the probability that the instance is an outlier:
    anomaly_score = clf.decision_function(Y)
  2. Visualize the calculated anomaly score distribution with a histogram:
    n_bins = 5
    min_outlier_anomaly_score = np.floor(np.min(anomaly_score[np.where(outliers==1)])*10)/10
    plt.figure(figsize=(6, 4))
    values, bins, bars = plt.hist(anomaly_score, bins=n_bins, edgecolor='white')
    plt.axvline(min_outlier_anomaly_score, c='r')
    plt.bar_label(bars, fontsize=12)
    plt.margins(x=0.01, y=0.1)
    plt.xlabel('Anomaly Score')
    plt.ylabel('Number of Instances')
    plt.savefig('outliers_min.png', bbox_inches='tight')
    plt.show()

In Figure 1.8, the red vertical line indicates the minimum anomaly score to flag an instance as an outlier:

Figure 1.8 – Anomaly score distribution

Figure 1.8 – Anomaly score distribution

  1. We can change the anomaly score threshold. Increasing the threshold should reduce the number of outputs. In this case, we only have one outlier after increasing the anomaly score threshold to over 250, as shown in Figure 1.9:
    raw_outliers = np.where(anomaly_score >= 250)
    raw_outliers
Figure 1.9 – Outlier with a higher anomaly score

Figure 1.9 – Outlier with a higher anomaly score

  1. Figure 1.10 shows another outlier distribution with a different threshold:
    n_bins = 5
    min_anomaly_score = 50
    values, bins, bars = plt.hist(anomaly_score, bins=n_bins, edgecolor='white', color='green')
    plt.axvline(min_anomaly_score, c='r')
    plt.bar_label(bars, fontsize=12)
    plt.margins(x=0.01, y=0.1)
    plt.xlabel('Anomaly Score')
    plt.ylabel('Number of Instances')
    plt.savefig('outliers_modified.png', bbox_inches='tight')
    plt.show()
Figure 1.10 – Modified anomaly threshold

Figure 1.10 – Modified anomaly threshold

You completed a walk-through of point anomaly detection using a KNN model. Feel free to explore other outlier detection algorithms provided by PyOD. With a foundational knowledge of anomaly types, you are ready to explore various real-world use cases for anomaly detection in the following section.

 

Discovering real-world use cases

Anomaly detection plays a crucial role in extracting valuable insights for risk management. Over the years, anomaly detection applications have diversified across various domains, including medical diagnosis, fraud discovery, quality control analysis, predictive maintenance, security scanning, and threat intelligence. In this section, let’s look at some practical industry use cases of anomaly detection, including the following:

  • Detecting fraud
  • Predicting industrial maintenance
  • Diagnosing medical conditions
  • Monitoring cybersecurity threats
  • Reducing environmental impact
  • Recommending financial strategies

Detecting fraud

The continued growth of the global economy and increased business demand for real-time and ubiquitous digital payment methods open the door to fraud exposure, causing electronic commerce systems to be vulnerable to organized crimes. Fraud prevention mechanisms that protect technological systems from potential fraud risks are insufficient to cover all possible fraudulent scenarios. Thus, fraud detection systems provide an additional layer of protection in detecting suspicious and malicious activities.

Discovery sampling is an auditing technique that determines whether to approve or reject a sampled audit population if the percentage error rate is below the defined minimum unacceptable threshold. Manual fraud audit techniques based on discovery sampling require domain knowledge across multiple disciplines and are time-consuming. Leveraging machine learning (ML) in fraud detection systems has proven to produce higher model accuracy and detect novel anomaly classes.

Fraud detection systems leverage behavioral profiling methods to prevent fraud by modeling individual behavioral patterns and monitoring deviations from the norms, such as daily banking activities, spending velocity, transacted foreign countries, and beneficiaries based on historical transactions. Nevertheless, an individual’s spending habits are influenced by changes in income, lifestyle, and other external factors. Such unpredicted changes can introduce concept drift with the underlying model. Hence, a fraud detection model and an individual’s transaction profiling must be recursively and dynamically updated by correlating input data changes and various parameters to enable adaptive behavioral profiling.

Let’s review a fraud detection example using an anonymized multivariate credit card transactions dataset from https://www.kaggle.com/datasets/whenamancodes/fraud-detection and AutoEncoder provided by PyOD, https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.auto_encoder.

AutoEncoder is an unsupervised deep learning algorithm that can reconstruct high dimensional input data using a compressed latent representation of the input data. AutoEncoder helps detect abnormalities in the data by calculating the reconstruction errors.

Figure 1.11 shows a high-level AutoEncoder architecture that consists of three components:

  • Encoder – Translates high dimensional input data into a low dimensional latent representation
  • Code – Learns the latent-space representation of the input data
  • Decoder – Reconstructs the input data based on the encoder’s output
Figure 1.11 – The AutoEncoder architecture

Figure 1.11 – The AutoEncoder architecture

A sample notebook, chapter1_pyod_autoencoder.ipynb, can be found in the book's GitHub repo.

You can also experiment with this example on Amazon SageMaker Studio Lab, https://aws.amazon.com/sagemaker/studio-lab/, a free notebook development environment that provides up to 12 hours of CPU or 4 hours of GPU per user session and 15 GiB storage at no cost. Alternatively, you can try this on your preferred IDE. Let’s get started:

  1. First, install the required packages using provided requirements.txt file.
    import sys
    !{sys.executable} -m pip install -r requirements.txt
  2. Load the essential libraries:
    %matplotlib inline
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    import os
    from platform import python_version
    import tensorflow as tf
    from pyod.models.auto_encoder import AutoEncoder
    os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
    print(f'TensorFlow version: {tf.__version__}')
    print(f'Python version: {python_version()}')
  3. Load and preview the anonymized credit card transactions dataset:
    df = pd.read_csv('creditcard.csv')
    df.head()

The result will be as follows:

Figure 1.12 – Preview anonymized credit card transactions dataset

Figure 1.12 – Preview anonymized credit card transactions dataset

  1. Assign model features and the target label to variables:
    model_features = df.columns.drop('Class')
    X = df[model_features]
    y = df['Class']
  2. View the frequency distribution for target labels. You should have 284,315 non-fraudulent transactions for class 0 and 492 fraudulent transactions for class 1:
    y.value_counts()
  3. Set the contamination rate for the amount of contamination or the proportion of outliers in the training dataset. The default contamination value is 0.1. Here, we are setting contamination to the maximum value, 0.5. PyOD uses this setting to calculate the threshold. Fix the number of epochs for training:
    contamination = 0.5
    epochs = 30
  4. Set the number of neurons per hidden layer and initialize AutoEncoder for training:
    hn = [64, 30, 30, 64]
    clf = AutoEncoder(epochs=epochs, contamination=contamination, hidden_neurons=hn)
    clf.fit(X)

Figure 1.13 shows a model summary for AutoEncoder in this example:

Figure 1.13 – The AutoEncoder model summary

Figure 1.13 – The AutoEncoder model summary

  1. Obtain predictions on outliers:
    outliers = clf.predict(X)
  2. Filter outliers from the model’s predictions. The anomaly variable contains the identified outliers:
    anomaly = np.where(outliers==1)
    anomaly
  3. View the output of a particular instance. You should see the output is 1, indicating this is predicted as a fraudulent transaction. Validate the result with the ground truth:
    sample = X.iloc[[4920]]
    clf.predict(sample, return_confidence=False)

The result displayed is shown in Figure 1.14:

Figure 1.14 – Prediction versus ground truth

Figure 1.14 – Prediction versus ground truth

  1. Evaluate the model’s prediction if using a perturbed dataset:
    clf.predict_confidence(sample)
  2. Generate binary labels of the training data, where 0 means inliers and 1 means outliers:
    y_pred = clf.labels_
  3. Call the decision_scores_ function to calculate anomaly scores. Higher values represent a higher severity of abnormalities:
    y_scores = clf.decision_scores_
  4. Figure 1.15 shows anomaly scores calculated by decision_scores_ using the threshold value based on the contamination rate using the following code. The red horizontal line represents the threshold in use:
    plt.rcParams["figure.figsize"] = (15,8)
    plt.plot(y_scores);
    plt.axhline(y=clf.threshold_, c='r', ls='dotted', label='threshold');
    plt.xlabel('Instances')
    plt.ylabel('Decision Scores')
    plt.title('Anomaly Scores with Auto-Calculated Threshold');
    plt.savefig('auto_decision_scores.png', bbox_inches='tight')
    plt.show()

Figure 1.15 – Auto-calculated anomaly scores

  1. Figure 1.16 shows the modified threshold using the following code. The red horizontal line represents the new threshold:
    threshold = 50
    plt.rcParams["figure.figsize"] = (15,8)
    plt.plot(y_scores, color="green");
    plt.axhline(y=threshold, c='r', ls='dotted', label='threshold');
    plt.xlabel('Instances')
    plt.ylabel('Anomaly Scores')
    plt.title('Anomaly Scores with Modified Threshold');
    plt.savefig('modified_threshold.png', bbox_inches='tight')
    plt.show()

Figure 1.16 – Modified threshold

  1. We will use the following code to determine the error loss history:
    plt.rcParams["figure.figsize"] = (15,8)
    pd.DataFrame.from_dict(clf.history_).plot(title='Error Loss');
    plt.savefig('error_loss.png', bbox_inches='tight')
    plt.show()

Figure 1.17 shows the error loss history:

Figure 1.17 – Error loss history

  1. Visualize anomaly scores and outliers by comparing Time and Amount with a scatter plot:
    sns.scatterplot(x="Time", y="Amount", hue=y_scores, data=df, palette="RdBu_r", size=y_scores);
    plt.xlabel('Time (seconds elapsed from first transaction)')
    plt.ylabel('Amount')
    plt.legend(title='Anomaly Scores')
    plt.savefig('pca_anomaly_score.png', bbox_inches='tight')
    plt.show()

The result is shown in Figure 1.18:

Figure 1.18 – Anomaly scores and outlier

You completed a walk-through of a fraud detection example using AutoEncoder. The following section discusses a few more real-world anomaly detection examples.

Predicting industrial maintenance

The rise of Industry 4.0 transformed manufacturing technologies focusing on interconnectivity between machines and industrial equipment using the Internet of Things (IoT). Real-time data produced by interconnected devices presents enormous opportunities for predictive analytics in structural health checks and anomaly detection.

Inadequate machine maintenance is the primary cause of unplanned downtime in manufacturing. Improving equipment availability and performance is critical in preventing unplanned downtime, avoiding unnecessary maintenance costs, and increasing productivity in industrial workloads.

Although equipment health can deteriorate over time due to regular use, early discovery of abnormal symptoms helps optimize performance and uptime over a machine’s life expectancy and ensure business continuity.

Predictive maintenance techniques have evolved from reactive mode to ML approaches. Anomaly detection for predictive maintenance is challenging due to a lack of domain knowledge in defining anomaly classes and the absence of past anomalous behaviors in the available data. Many existing manufacturing processes can only detect a subset of anomalies, leaving the remaining anomalies undetected before the equipment goes into a nonfunctional state. Anomaly detection in predictive maintenance aims to predict the onset of equipment failure and perform prompt maintenance to avoid unnecessary downtime.

You can try implementing a predictive maintenance problem using PyOD with this dataset: https://www.kaggle.com/datasets/shivamb/machine-predictive-maintenance-classification.

Diagnosing medical conditions

The physiological data collection through medical diagnosis applications, such as magnetic resonance imaging (MRI), and wearable devices, such as glucose monitors, enables healthcare professionals to highlight abnormal readings that may be precursors of potential health risks to patients using anomaly detection approaches. Besides medical diagnosis, anomaly detection helps healthcare providers predict recovery rates and escalates medical risks by forecasting physiological signals, such as heart rate and blood pressure.

Detection and prediction accuracy is critical in medical anomaly detection as they involve time-sensitive decisions and life-and-death situations. Besides common challenges with class imbalance and scarcity of anomaly samples, medical anomaly detection faces challenges in distinguishing patient and demographic-specific characteristics.

Deep learning techniques have gained popularity in medical anomaly detection due to their feature learning and non-linearity modeling capabilities. However, current deep medical anomaly detection methods mainly correlate patients’ symptoms with a known disease category based on annotated data. Medical experts will be skeptical of trusting decisions made by black-box models without quantifiable causal estimation or explanation. Hence, the role of explainable artificial intelligence (XAI) is crucial in providing end users visibility into how a deep learning model derives a prediction that leads to informed decision-making.

Monitoring cybersecurity threats

Detecting zero-day attacks or unforeseen threats is highly desirable in security applications. Therefore, unsupervised deep learning techniques using unlabeled datasets are widely applied in security-related anomaly detection applications such as intrusion detection systems (IDSs), web attack detection, video surveillance, and advanced persistent threat (APT).

Two categories of IDSs are host-based and network-based. Host-based IDSs detect collective anomalies such as malicious applications, policy violations, and unauthorized access by analyzing sequential call traces at the operating system level. Network-based IDSs analyze high-dimensional network data to identify potential external attacks for unauthorized network access.

Web applications are now an appealing target to cybercriminals as data becomes ubiquitous. Existing signature-based techniques using static rules no longer provide sufficient web attack protection because the quality of rulesets depends on known attacks in the signature dataset. Anomaly-based web attack detection methods distinguish anomalous web requests by measuring the probability threshold of attributes in the request to the established normal request profiles.

Reducing environmental impact

The widespread climate change driven by human activities has contributed to the rise of the earth’s temperature by 0.14 degrees Fahrenheit or -17.7 degrees Celsius since 1880, and 2020 was marked as the second-warmest year according to the National Oceanic Atmospheric Administration (NOAA). Irreversible consequences of climate change can lead to other impacts, such as intense droughts, water shortages, catastrophic wildfires, and severe flooding. Detecting abnormal weather patterns and climatic events, such as the frequency of heat and cold waves, cyclones, and floods, provides a scientific understanding of the behaviors and relationships of climatological variables.

With almost 80% of the world’s energy produced by fossil fuels, it is crucial to develop green energy sources and reduce total energy consumption by identifying wasted energy using anomaly detection approaches through smart sensors. For example, buildings contribute 40% of global energy consumption and 33% of greenhouse gas emissions. Thus, reducing building energy consumption is a significant effort toward achieving net-zero carbon emissions by 2050.

Recommending financial strategies

Identifying anomalies in financial data, such as stock market indices, is instrumental for informed decision-making and competitive advantage. Characteristics of financial data include volume, velocity, and variety. For example, The New York Stock Exchange (NYSE) generates over one terabyte of stock market data daily that reflects continuous market changes at low latency. Market participants need a mechanism to identify anomalies in financial data that can cause misinterpretation of market behavior leading to poor trading decisions.

Now that we have covered some commercial and environmental use cases for anomaly detection, you are ready to explore various deep learning approaches and their appropriate use for detecting anomalies in the following section.

 

Considering when to use deep learning and what for

Deep learning forms the basis of neural networks in ML. A neural network contains many layers of densely interconnected neurons organized into input, hidden, and output layers. Information flows through a neural network in one direction and begins with the input layer receiving raw data for model training. The hidden layer uses backpropagation to calculate the gradient of errors and optimizes the learning process.

A neuron contains an activation function to produce prediction through the output layer. Figure 1.19 shows a basic deep learning architecture:

Figure 1.19 – A basic deep learning architecture

Figure 1.19 – A basic deep learning architecture

Anomaly detection techniques are generally available in three categories:

  • Supervised anomaly detection: Train an ML model with an imbalanced labeled dataset where each data instance is categorized into a normal or abnormal class. This approach is viable if ground truth or actual observation is available. An anomaly detection model determines the class of unseen data assuming outliers follow the same distribution as the training dataset. Limitations of supervised anomaly detection include scarcity of anomaly samples and challenges in identifying precise representation of the normal class.
  • Semi-supervised anomaly detection: Train an ML model with a large amount of unlabeled datasets supplemented by a small set of labeled data for expected behavior. This approach assumes outliers differ from training dataset distribution. Hence, semi-supervised is more applicable than supervised for detecting outliers since anomalies are rare.
  • Unsupervised anomaly detection: Train an ML model with an unlabeled dataset that contains normal and abnormal observations. This approach assumes normal and abnormal observations typically isolated in high-density and low-density regions. An anomaly detector looks for instances or potential outliers in the low-density region.

Besides identifying the class of unseen data, anomaly detection algorithms can produce anomaly scores to quantify the severity, help businesses determine an acceptable impact threshold, and manage risk tolerance levels.

The significance of anomaly detection is evident across many mission-critical domains. When choosing deep learning versus traditional anomaly detection methods, consider business objectives, data size, and training time versus trade-offs such as algorithmic scalability, model flexibility, explainability, and interpretability.

Identifying an ML problem to address a specific business problem is essential before embarking on an ML journey. Knowing the inputs, outputs, and success criteria, such as accuracy over interpretability, is critical for choosing the appropriate algorithms and lineage tracking. Consider traditional methods such as KNN and decision trees if interpretability is a higher priority to your business for regulatory compliance or auditability needs. Explore deep learning methods if high accuracy triumphs over interpretability for your use case, as XAI continues to mature for deep learning models.

Deep learning methods are capable and ideal for handling large datasets and complex problems. Deep learning can extract and correlate relationships across many interdependent features if your business aims to discover hidden patterns in large datasets. Otherwise, traditional methods might be a good start if your data size is small.

Training a deep learning anomaly detection model can be compute-intensive and time-consuming, depending on the number of parameters involved and available infrastructure, such as graphics processing units (GPUs). More computing power is needed as the size and complexity grow with deep learning models. Conversely, traditional anomaly detection methods can run and train faster on cheaper hardware, within hours.

Traditional rule-based anomaly detection methods created manually by domain experts are not scalable enough to handle high-dimensional data and are difficult to maintain. For instance, it can be challenging to develop security rules for every possible malicious behavior and keep those rules up to date. In contrast, deep learning-based anomaly detection methods are more adaptive by learning and extracting features incrementally from data in a nested hierarchy through hidden layers.

This section covered the basics and general best practices of deep learning. In the following section, we will discuss known challenges of deep learning anomaly detection and future opportunities of XAI in this field.

 

Understanding challenges and opportunities

The abundance of computing resources and available data accelerated the evolution of anomaly detection techniques over the years. According to the International Data Corporation (IDC), https://www.statista.com/statistics/871513/worldwide-data-created/, less than 2% of 64.2 zettabytes of data created during the COVID-19 pandemic was retained into 2021, presenting enormous opportunities for big data analytics and anomaly detection. However, challenges such as high false positives, scarcity of anomaly samples, and imbalanced distribution remain prevalent.

Establishing the boundary of normal versus abnormal behaviors is vital in anomaly detection. Nevertheless, this is not always a straightforward task due to the dynamic nature of anomalies. For example, defining normal behaviors is a moving target task when malicious adversaries adapt and appear as justifiable acts to anomaly detection algorithms. Noise and mislabeled data can cause a benign record to appear as an abnormal observation. An anomaly record might be undetected due to aggregated data or masking of hidden trends. Furthermore, concept drift can occur with input data and feature changes, causing the current notion of normal behavior invalid.

Generally, businesses believe ML improves decision-making and operational efficiency. With increased predictive accuracy and complexity, companies struggle to identify optimal trade-offs between model performance and interpretability for auditability and regulatory compliance. For example, individuals are entitled to the Right to Explanation under the European Union (EU) General Data Protection Regulation (GDPR). Therefore, there is a growing awareness of explainability across different maturity levels of ML adoption among enterprises.

Knowing the key challenges, let’s explore some future research focuses and opportunities for the next generation of deep learning anomaly detection practices. Instead of exclusively fitting limited labeled anomaly samples with a supervised technique or training with unlabeled data using unsupervised methods, there is an increasing interest in deep weakly-supervised anomaly detection using partially labeled anomaly data in the hope of getting the best of both worlds. Deep weakly-supervised anomaly detection aims to enhance model learning by training with a small set of accurately labeled anomaly samples and continuing to explore possible anomalies in unseen data.

Most existing deep learning methods focus on point anomalies. Complex interconnected devices such as climate control and electromechanical systems that generate continuous multidimensional data streams pose a new opportunity for multimodal anomaly detection. Multidimensional anomalies can occur when one or more dimensions exceed or fall below the expected range of values, or multiple dimensions no longer correlate.

XAI is an emerging research field that studies the tools and frameworks to provide human-legible explanations and increase confidence in model prediction with quantifiable factors. The earlier days of XAI in anomaly detection can be seen in rule-based expert systems where human experts formulated the rules and system knowledge, resulting in its inherent explainability. Further research on interpretable and actionable deep learning anomaly detection is significant in explaining model decisions and mitigating potential bias.

 

Summary

Despite the previously mentioned challenges, anomaly detection is highly applicable to various domains and will remain a diverse research field. In this chapter, you learned about the basics of anomaly detection, practical industry use cases, and considerations of deep learning versus traditional anomaly detection approaches. You also completed two example walkthroughs and explored challenges and exciting opportunities in this space. You also explored challenges and exciting opportunities in this space. In the next chapter, we will discuss XAI and its significance for anomaly detection in more depth.

About the Author
  • Cher Simon

    Cher Simon is a principal solutions architect specializing in artificial intelligence, machine learning, and data analytics at AWS. Cher has 20 years of experience in architecting enterprise-scale, data-driven, and AI-powered industry solutions. Besides building cloud-native solutions in her day-to-day role with customers, Cher is also an avid writer and a frequent speaker at AWS conferences.

    Browse publications by this author
Deep Learning and XAI Techniques for Anomaly Detection
Unlock this book and the full library FREE for 7 days
Start now