You're reading from 10 Machine Learning Blueprints You Should Know for Cybersecurity

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781804619476

Edition1st Edition

Concepts

Machine Learning

Author (1)

Rajvardhan Oak

Detecting Fake News with Graph Neural Networks

In the previous chapters, we looked at tabular data, which was comprised of individual data points with their own features. While modeling and running our experiments, we did not consider any features of the relationship among the data points. Much real-world data, particularly that in the domain of cybersecurity, can naturally occur as graphs and be represented as a set of nodes, some of which are connected using edges. Examples include social networks, where users, photos, and posts can be connected using edges. Another example is the internet, which is a large graph of computers connected to each other.

Traditional machine learning algorithms cannot directly learn from graphs. Algorithms such as regression, neural networks, and trees, and optimization techniques such as gradient descent are designed to operate on Euclidean (flat) data structures. This has led to the development of Graph Neural Networks (GNNs), an upcoming area of...

Technical requirements

You can find the code files for this chapter on GitHub at https://github.com/PacktPublishing/10-Machine-Learning-Blueprints-You-Should-Know-for-Cybersecurity/tree/main/Chapter%208.

An introduction to graphs

First, let us understand what graphs are and the key terms related to graphs.

What is a graph?

A graph is a data structure that is represented as a set of nodes connected by a set of edges. Mathematically, we specify a graph G as (V, E), where V represents the nodes or vertices and E represents the edges between them, as shown in Figure 8.1:

Figure 8.1 – A simple graph

In the previous graph, we have the following:

V = {1, 2, 3, 4, 5, 6}

E = {(1,3), (2,3), (2,5), (3,6), (4,6), (5.6)}

Note that the order in which the nodes and edges are mentioned does not matter. The graph shown in Figure 8.1 is an undirected graph, which means that the direction of the edges does not matter. There can also be directed graphs in which the definition of the edge has some meaning, which gives importance to the direction of the edge. For example, a graph depicting the water flow of from various cities would have directed edges...

Machine learning on graphs

Machine learning techniques (such as classification or clustering) can nowadays be applied to nodes, edges, or entire graphs. The concepts remain the same, but we apply the algorithms to graph entities, and therefore, some of the tasks can be framed as a node, link, or subgraph classification. For example, in a network of users on social media, identifying abusive or bot users would be a node classification task. Identifying malicious messages or transactions would be an edge classification problem. Detecting groups of hate speech disseminators would be a graph classification problem.

In graph machine learning, the challenge lies in extracting features from a graph. A possible approach would be using the adjacency matrix and node features as an attribute vector and feeding it to a traditional machine learning algorithm. However, the model produced will not be permutation-invariant, as there is no inherent order within the nodes in a graph; models based...

Fake news detection with GNN

In this section, we will learn how fake news can be detected using a GNN.

Modeling a GNN

While some problems can naturally be thought of as graphs, as data scientists, you need to conceptualize and build a graph. Data may still be available to you in tabular form, but it will be up to you to build a meaningful graph from it.

Solving any task with a GNN involves the following high-level steps:

Identifying the entities that will be your nodes.
Defining a rule or metric to connect nodes via edges.
Defining a set of features for nodes and edges.
Determining the kind of graph task the given problem can translate into (node classification, edge classification, or subgraph classification).

In social media-related domains, such as friend recommendation, post virality, and fake news detection, we have multiple choices for nodes, their features, and the methodology for edges between them, such as the following:

Nodes can...

Summary

A lot of real-world data can be naturally represented as graphs. Graphs are especially important in a social network context where multiple entities (users, posts, or media) are linked together, forming natural graphs. In recent times, the spread of misinformation and fake news is a problem of growing concern. This chapter focused on detecting fake news using GNNs.

We began by first learning some basic concepts about graphs and techniques to learn on graphs. This included using static features extracted from graph analytics (such as degrees and path lengths), node and graph embeddings, and finally, neural message passing, using GNNs. We looked at the UPFD framework and how a graph can be built for a news article, complete with node features that incorporate historical user behavior. Finally, we trained a GNN model to build a graph classifier that detects whether a news article is fake or not.

In the field of cybersecurity, graphs are especially important. This is because...

The rest of the chapter is locked

You have been reading a chapter from

10 Machine Learning Blueprints You Should Know for Cybersecurity

Published in: May 2023Publisher: PacktISBN-13: 9781804619476

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Rajvardhan Oak

Rajvardhan Oak is a cybersecurity expert, researcher, and scientist with a focus on machine learning solutions to security issues such as fake news, malware, and botnets. He obtained his bachelor's degree from the University of Pune, India, and his master's degree from the University of California, Berkeley. He has served on the editorial committees of multiple technical conferences and journals. His work has been featured by prominent news outlets such as WIRED magazine and the Daily Mail. In 2022, he received the ISC2 Global Achievement Award for Excellence in Cybersecurity. He is based in the Seattle area and works for Microsoft as an applied scientist in the ads fraud division.
Read more about Rajvardhan Oak

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages