Reader small image

You're reading from  10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781804619476
Edition1st Edition
Right arrow
Author (1)
Rajvardhan Oak
Rajvardhan Oak
author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Right arrow

Detecting Fake News with Graph Neural Networks

In the previous chapters, we looked at tabular data, which was comprised of individual data points with their own features. While modeling and running our experiments, we did not consider any features of the relationship among the data points. Much real-world data, particularly that in the domain of cybersecurity, can naturally occur as graphs and be represented as a set of nodes, some of which are connected using edges. Examples include social networks, where users, photos, and posts can be connected using edges. Another example is the internet, which is a large graph of computers connected to each other.

Traditional machine learning algorithms cannot directly learn from graphs. Algorithms such as regression, neural networks, and trees, and optimization techniques such as gradient descent are designed to operate on Euclidean (flat) data structures. This has led to the development of Graph Neural Networks (GNNs), an upcoming area of...

Technical requirements

An introduction to graphs

First, let us understand what graphs are and the key terms related to graphs.

What is a graph?

A graph is a data structure that is represented as a set of nodes connected by a set of edges. Mathematically, we specify a graph G as (V, E), where V represents the nodes or vertices and E represents the edges between them, as shown in Figure 8.1:

Figure 8.1 – A simple graph

Figure 8.1 – A simple graph

In the previous graph, we have the following:

V = {1, 2, 3, 4, 5, 6}

E = {(1,3), (2,3), (2,5), (3,6), (4,6), (5.6)}

Note that the order in which the nodes and edges are mentioned does not matter. The graph shown in Figure 8.1 is an undirected graph, which means that the direction of the edges does not matter. There can also be directed graphs in which the definition of the edge has some meaning, which gives importance to the direction of the edge. For example, a graph depicting the water flow of from various cities would have directed edges...

Machine learning on graphs

Machine learning techniques (such as classification or clustering) can nowadays be applied to nodes, edges, or entire graphs. The concepts remain the same, but we apply the algorithms to graph entities, and therefore, some of the tasks can be framed as a node, link, or subgraph classification. For example, in a network of users on social media, identifying abusive or bot users would be a node classification task. Identifying malicious messages or transactions would be an edge classification problem. Detecting groups of hate speech disseminators would be a graph classification problem.

In graph machine learning, the challenge lies in extracting features from a graph. A possible approach would be using the adjacency matrix and node features as an attribute vector and feeding it to a traditional machine learning algorithm. However, the model produced will not be permutation-invariant, as there is no inherent order within the nodes in a graph; models based...

Fake news detection with GNN

In this section, we will learn how fake news can be detected using a GNN.

Modeling a GNN

While some problems can naturally be thought of as graphs, as data scientists, you need to conceptualize and build a graph. Data may still be available to you in tabular form, but it will be up to you to build a meaningful graph from it.

Solving any task with a GNN involves the following high-level steps:

  1. Identifying the entities that will be your nodes.
  2. Defining a rule or metric to connect nodes via edges.
  3. Defining a set of features for nodes and edges.
  4. Determining the kind of graph task the given problem can translate into (node classification, edge classification, or subgraph classification).

In social media-related domains, such as friend recommendation, post virality, and fake news detection, we have multiple choices for nodes, their features, and the methodology for edges between them, such as the following:

  • Nodes can...

Summary

A lot of real-world data can be naturally represented as graphs. Graphs are especially important in a social network context where multiple entities (users, posts, or media) are linked together, forming natural graphs. In recent times, the spread of misinformation and fake news is a problem of growing concern. This chapter focused on detecting fake news using GNNs.

We began by first learning some basic concepts about graphs and techniques to learn on graphs. This included using static features extracted from graph analytics (such as degrees and path lengths), node and graph embeddings, and finally, neural message passing, using GNNs. We looked at the UPFD framework and how a graph can be built for a news article, complete with node features that incorporate historical user behavior. Finally, we trained a GNN model to build a graph classifier that detects whether a news article is fake or not.

In the field of cybersecurity, graphs are especially important. This is because...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
10 Machine Learning Blueprints You Should Know for Cybersecurity
Published in: May 2023Publisher: PacktISBN-13: 9781804619476
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak