You're reading from Hands-On Graph Neural Networks Using Python

Product typeBook

Published inApr 2023

PublisherPackt

ISBN-139781804617526

Edition1st Edition

Concepts

Neural Networks

Author (1)

Maxime Labonne

Detecting Anomalies Using Heterogeneous GNNs

In machine learning, anomaly detection is a popular task that aims to identify patterns or observations in data that deviate from the expected behavior. This is a fundamental problem that arises in many real-world applications, such as detecting fraud in financial transactions, identifying defective products in a manufacturing process, and detecting cyber attacks in a computer network.

GNNs can be trained to learn the normal behavior of a network and then identify nodes or patterns that deviate from that behavior. Indeed, their ability to understand complex relationships makes them particularly appropriate to detect weak signals. Additionally, GNNs can be scaled to large datasets, making them an efficient tool for processing large amounts of data.

In this chapter, we will build a GNN application for anomaly detection in computer networks. First, we will introduce the CIDDS-001 dataset, which contains attacks and benign traffic in a...

Technical requirements

All the code examples from this chapter can be found on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python/tree/main/Chapter16.

The installation steps required to run the code on your local machine can be found in the Preface of this book. This chapter requires a large amount of GPU. You can lower it by decreasing the size of the training set in the code.

Exploring the CIDDS-001 dataset

This section will explore the dataset and get more insights about feature importance and scaling.

The CIDDS-001 dataset [1] is designed to train and evaluate anomaly-based network intrusion detection systems. It provides realistic traffic that includes up-to-date attacks to assess these systems. It was created by collecting and labeling 8,451,520 traffic flows in a virtual environment using OpenStack. Precisely, each row corresponds to a NetFlow connection, describing Internet Protocol (IP) traffic statistics, such as the number of bytes exchanged.

The following figure provides an overview of the simulated network environment in CIDDS-001.

Figure 16.1 – Overview of the virtual network simulated by CIDDS-001

We see four different subnets (developer, office, management, and server) with their respective IP address ranges. All these subnets are linked to a single server connected to the internet through a firewall...

Preprocessing the CIDDS-001 dataset

In the last section, we identified some issues with the dataset we need to address to improve the accuracy of our model.

The CIDDS-001 dataset includes diverse types of data: we have numerical values such as duration, categorical features such as protocols (TCP, UDP, ICMP, and IGMP), and others such as timestamps or IP addresses. In the following exercise, we will choose how to represent these data types based on the information from the previous section and expert knowledge:

First, we can one-hot-encode the day of the week by retrieving this information from the timestamp. We will rename the resulting columns to make them more readable:

df['weekday'] = df['Date first seen'].dt.weekday
df = pd.get_dummies(df, columns=['weekday']).rename(columns = {'weekday_0': 'Monday','weekday_1': 'Tuesday','weekday_2': 'Wednesday', 'weekday_3': 'Thursday...

Implementing a heterogeneous GNN

In this section, we will implement a heterogeneous GNN using a GraphSAGE operator. This architecture will allow us to consider both node types (hosts and flows) to build better embeddings. This is done by duplicating and sharing messages across different layers, as shown in the following figure.

Figure 16.5 – Architecture of the heterogeneous GNN

We will implement three layers of SAGEConv with LeakyRELU for each node type. Finally, a linear layer will output a five-dimensional vector, where each dimension corresponds to a class. Furthermore, we will train this model in a supervised way using the cross-entropy loss and the Adam optimizer:

We import the relevant neural network layers from PyTorch Geometric:

import torch_geometric.transforms as T
from torch_geometric.nn import Linear, HeteroConv, SAGEConv

We define the heterogeneous GNN with three parameters: the number of hidden dimensions, the number of...

Summary

In this chapter, we explored the use of GNNs for detecting anomalies in a new dataset, the CIDDS-001 dataset. First, we preprocessed the dataset and converted it into a graph representation, allowing us to capture the complex relationships between the different components of the network. We then implemented a heterogeneous GNN with GraphSAGE operators. It captured the heterogeneity of the graph and allowed us to classify the flows as benign or malicious.

The application of GNNs in network security has shown promising results and opened up new avenues for research. As technology continues to advance and the amount of network data increases, GNNs will become an increasingly important tool for detecting and preventing security breaches.

In Chapter 17, Recommending Books Using LightGCN, we will explore the most popular application of GNNs with recommender systems. We will implement a lightweight GNN on a large dataset and produce book recommendations for given users.

...

Maxime Labonne is currently a senior applied researcher at Airbus. He received a M.Sc. degree in computer science from INSA CVL, and a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his career, he worked on computer networks and the problem of representation learning, which led him to explore graph neural networks. He applied this knowledge to various industrial projects, including intrusion detection, satellite communications, quantum networks, and AI-powered aircrafts. He is now an active graph neural network evangelist through Twitter and his personal blog.
Read more about Maxime Labonne

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Hands-On Graph Neural Networks Using Python

Detecting Anomalies Using Heterogeneous GNNs

Technical requirements

Exploring the CIDDS-001 dataset

Preprocessing the CIDDS-001 dataset

Implementing a heterogeneous GNN

Summary

Further reading

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook