Reader small image

You're reading from  Hands-On Graph Neural Networks Using Python

Product typeBook
Published inApr 2023
PublisherPackt
ISBN-139781804617526
Edition1st Edition
Right arrow
Author (1)
Maxime Labonne
Maxime Labonne
author image
Maxime Labonne

Maxime Labonne is currently a senior applied researcher at Airbus. He received a M.Sc. degree in computer science from INSA CVL, and a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his career, he worked on computer networks and the problem of representation learning, which led him to explore graph neural networks. He applied this knowledge to various industrial projects, including intrusion detection, satellite communications, quantum networks, and AI-powered aircrafts. He is now an active graph neural network evangelist through Twitter and his personal blog.
Read more about Maxime Labonne

Right arrow

Detecting Anomalies Using Heterogeneous GNNs

In machine learning, anomaly detection is a popular task that aims to identify patterns or observations in data that deviate from the expected behavior. This is a fundamental problem that arises in many real-world applications, such as detecting fraud in financial transactions, identifying defective products in a manufacturing process, and detecting cyber attacks in a computer network.

GNNs can be trained to learn the normal behavior of a network and then identify nodes or patterns that deviate from that behavior. Indeed, their ability to understand complex relationships makes them particularly appropriate to detect weak signals. Additionally, GNNs can be scaled to large datasets, making them an efficient tool for processing large amounts of data.

In this chapter, we will build a GNN application for anomaly detection in computer networks. First, we will introduce the CIDDS-001 dataset, which contains attacks and benign traffic in a...

Technical requirements

All the code examples from this chapter can be found on GitHub at https://github.com/PacktPublishing/Hands-On-Graph-Neural-Networks-Using-Python/tree/main/Chapter16.

The installation steps required to run the code on your local machine can be found in the Preface of this book. This chapter requires a large amount of GPU. You can lower it by decreasing the size of the training set in the code.

Exploring the CIDDS-001 dataset

This section will explore the dataset and get more insights about feature importance and scaling.

The CIDDS-001 dataset [1] is designed to train and evaluate anomaly-based network intrusion detection systems. It provides realistic traffic that includes up-to-date attacks to assess these systems. It was created by collecting and labeling 8,451,520 traffic flows in a virtual environment using OpenStack. Precisely, each row corresponds to a NetFlow connection, describing Internet Protocol (IP) traffic statistics, such as the number of bytes exchanged.

The following figure provides an overview of the simulated network environment in CIDDS-001.

Figure 16.1 – Overview of the virtual network simulated by CIDDS-001

Figure 16.1 – Overview of the virtual network simulated by CIDDS-001

We see four different subnets (developer, office, management, and server) with their respective IP address ranges. All these subnets are linked to a single server connected to the internet through a firewall...

Preprocessing the CIDDS-001 dataset

In the last section, we identified some issues with the dataset we need to address to improve the accuracy of our model.

The CIDDS-001 dataset includes diverse types of data: we have numerical values such as duration, categorical features such as protocols (TCP, UDP, ICMP, and IGMP), and others such as timestamps or IP addresses. In the following exercise, we will choose how to represent these data types based on the information from the previous section and expert knowledge:

  1. First, we can one-hot-encode the day of the week by retrieving this information from the timestamp. We will rename the resulting columns to make them more readable:
    df['weekday'] = df['Date first seen'].dt.weekday
    df = pd.get_dummies(df, columns=['weekday']).rename(columns = {'weekday_0': 'Monday','weekday_1': 'Tuesday','weekday_2': 'Wednesday', 'weekday_3': 'Thursday...

Implementing a heterogeneous GNN

In this section, we will implement a heterogeneous GNN using a GraphSAGE operator. This architecture will allow us to consider both node types (hosts and flows) to build better embeddings. This is done by duplicating and sharing messages across different layers, as shown in the following figure.

Figure 16.5 – Architecture of the heterogeneous GNN

Figure 16.5 – Architecture of the heterogeneous GNN

We will implement three layers of SAGEConv with LeakyRELU for each node type. Finally, a linear layer will output a five-dimensional vector, where each dimension corresponds to a class. Furthermore, we will train this model in a supervised way using the cross-entropy loss and the Adam optimizer:

  1. We import the relevant neural network layers from PyTorch Geometric:
    import torch_geometric.transforms as T
    from torch_geometric.nn import Linear, HeteroConv, SAGEConv
  2. We define the heterogeneous GNN with three parameters: the number of hidden dimensions, the number of...

Summary

In this chapter, we explored the use of GNNs for detecting anomalies in a new dataset, the CIDDS-001 dataset. First, we preprocessed the dataset and converted it into a graph representation, allowing us to capture the complex relationships between the different components of the network. We then implemented a heterogeneous GNN with GraphSAGE operators. It captured the heterogeneity of the graph and allowed us to classify the flows as benign or malicious.

The application of GNNs in network security has shown promising results and opened up new avenues for research. As technology continues to advance and the amount of network data increases, GNNs will become an increasingly important tool for detecting and preventing security breaches.

In Chapter 17, Recommending Books Using LightGCN, we will explore the most popular application of GNNs with recommender systems. We will implement a lightweight GNN on a large dataset and produce book recommendations for given users.

...

Further reading

  • [1] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, and A. Hotho, Flow-based benchmark data sets for intrusion detection, in Proceedings of the 16th European Conference on Cyber Warfare and Security (ECCWS), ACPI, 2017, pp. 361–369.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Graph Neural Networks Using Python
Published in: Apr 2023Publisher: PacktISBN-13: 9781804617526
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Maxime Labonne

Maxime Labonne is currently a senior applied researcher at Airbus. He received a M.Sc. degree in computer science from INSA CVL, and a Ph.D. in machine learning and cyber security from the Polytechnic Institute of Paris. During his career, he worked on computer networks and the problem of representation learning, which led him to explore graph neural networks. He applied this knowledge to various industrial projects, including intrusion detection, satellite communications, quantum networks, and AI-powered aircrafts. He is now an active graph neural network evangelist through Twitter and his personal blog.
Read more about Maxime Labonne