You're reading from Graph Machine Learning

Product typeBook

Published inJun 2021

PublisherPackt

ISBN-139781800204492

Edition1st Edition

Concepts

Machine Learning

Authors (3):

Claudio Stamile

Aldo Marzullo

Enrico Deusebio

View More author details

Chapter 8:Graph Analysis for Credit Card Transactions

Analysis of financial data is one of the most common and important domains in big data and data analysis. Indeed, due to the increasing number of mobile devices and the introduction of a standard platform for online payment, the amount of transactional data that banks are producing and consuming is increasing exponentially.

As a consequence, new tools and techniques are needed to exploit as much as we can from this huge amount of information in order to better understand customers' behavior and support data-driven decisions in business processes. Data can also be used to build better mechanisms to improve security in the online payment process. Indeed, as online payment systems are becoming increasingly popular due to e-commerce platforms, at the same time, cases of fraud are also increasing. An example of a fraudulent transaction is a transaction performed with a stolen credit card. Indeed, in this case, the fraudulent...

Technical requirements

We will be using Jupyter notebooks with Python 3.8 for all of our exercises. The following is a list of Python libraries that will be installed for this chapter using pip. For example, run pip install networkx==2.5 on the command line:

Jupyter==1.0.0
networkx==2.5
scikit-learn==0.24.0
pandas==1.1.3
node2vec==0.3.3
numpy==1.19.2
communities==2.2.0

In the rest of this book, unless clearly stated to the contrary, we will refer to nx as the results of the Python import networkx as nx command.

All code files relevant to this chapter are available at https://github.com/PacktPublishing/Graph-Machine-Learning/tree/main/Chapter08.

Overview of the dataset

The dataset used in this chapter is the Credit Card Transactions Fraud Detection Dataset available on Kaggle at the following URL: https://www.kaggle.com/kartik2112/fraud-detection?select=fraudTrain.csv.

The dataset is made up of simulated credit card transactions containing legitimate and fraudulent transactions for the period January 1, 2019 – December 31, 2020. It includes the credit cards of 1,000 customers performing transactions with a pool of 800 merchants. The dataset was generated using Sparkov Data Generation. More information about the generation algorithm is available at the following URL: https://github.com/namebrandon/Sparkov_Data_Generation.

For each transaction, the dataset contains 23 different features. In the following table, we will show only the information that will be used in this chapter:

Table 8.1 – List of variables used in the dataset

For the purposes of our analysis, we will use the fraudTrain...

Network topology and community detection

In this section, we are going to analyze some graph metrics to have a clear picture of the general structure of the graph. We will be using networkx to compute most of the useful metrics we have seen in Chapter 1, Getting Started with Graphs. We will try to interpret the metrics to gain insights into the graph.

Network topology

A good starting point for our analysis is the extraction of simple graph metrics to have a general understanding of the main properties of bipartite and tripartite transaction graphs.

We start by looking at the distribution of the degree for both bipartite and tripartite graphs using the following code:

for G in [G_bu, G_tu]:
  plt.figure(figsize=(10,10))
  degrees = pd.Series({k: v for k, v in nx.degree(G)})
  degrees.plot.hist()
  plt.yscale("log")

By way of a result, we get the plot in the following diagram:

Figure 8.3 – Degree...

Embedding for supervised and unsupervised fraud detection

In this section, we will describe how the bipartite and tripartite graphs described previously can be used by graph machine learning algorithms to build automatic procedures for fraud detection using supervised and unsupervised approaches. As we already discussed at the beginning of this chapter, transactions are represented by edges, and we then want to classify each edge in the correct class: fraudulent or genuine.

The pipeline we will use to perform the classification task is the following:

A sampling procedure for the imbalanced task
The use of an unsupervised embedding algorithm to create a feature vector for each edge
The application of supervised and unsupervised machine learning algorithms to the feature space defined in the previous point

Supervised approach to fraudulent transaction identification

Since our dataset is strongly imbalanced, with fraudulent transactions representing 2.83%...

Summary

In this chapter, we described how a classical fraud detection task can be described as a graph problem and how the techniques described in the previous chapter can be used to tackle the problem. Going into more detail, we introduced the dataset we used and described the procedure to transform the transactional data into two types of graph, namely, bipartite and tripartite undirected graphs. We then computed local (along with their distributions) and global metrics for both graphs, comparing the results.

Moreover, a community detection algorithm was applied to the graphs in order to spot and plot specific regions of the transaction graph where the density of fraudulent transactions is higher compared to the other communities.

Finally, we solved the fraud detection problem using supervised and unsupervised algorithms, comparing the performances of the bipartite and tripartite graphs. As the first step, since the problem was unbalanced with a higher presence of genuine transactions...

The rest of the chapter is locked

You have been reading a chapter from

Graph Machine Learning

Published in: Jun 2021Publisher: PacktISBN-13: 9781800204492

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Claudio Stamile

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.
Read more about Claudio Stamile

Aldo Marzullo

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.
Read more about Aldo Marzullo

Enrico Deusebio

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.
Read more about Enrico Deusebio

Other recommended products

Related to this chapter

Network Science with Python and NetworkX Quick Start Guide

The emerging field of Network Science is about understanding different kind of relationships. This book covers the latest version 2.x of NetworkX for performing Network Science with Python.You will also learn the fundamentals of network theory and see practical examples of how they are applied to real-world problems using Python and NetworkX.

BookApr 2019190 pages

Hands-On Graph Analytics with Neo4j

To start with you will cover the basics of graph analytics, Cypher querying language, components of graph architecture, and more. You will implement Neo4j techniques to understand various graph analytics methods to reveal complex relationships in data. You will understand how machine learning can be used to perform smarter graph analytics.

BookAug 2020510 pages

Geospatial Data Science Quick Start Guide

This book will help you leverage the power of data analysis and apply it to location and geospatial data to gain interesting insights. It presents useful tools and location intelligence techniques in Python to implement geospatial analytics from scratch.

BookMay 2019170 pages

Advanced Deep Learning with Python

This book is an expert-level guide to master the neural network variants using the Python ecosystem. You will gain the skills to build smarter, faster, and efficient deep learning systems with practical examples. By the end of this book, you will be up to date with the latest advances and current researches in the deep learning domain.

BookDec 2019468 pages

Mastering Machine Learning with Spark 2.x

The purpose of machine learning is to build systems that learn from data. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment.

BookAug 2017340 pages

Hands-On Mathematics for Deep Learning

The main aim of this book is to make the advanced mathematical background accessible to someone with a programming background. This book will equip the readers with not only deep learning architectures but the mathematics behind them. With this book, you will understand the relevant mathematics that goes behind building deep learning models.

BookJun 2020364 pages

Mastering Machine Learning Algorithms

A new second edition of the bestselling guide to exploring and mastering the most important algorithms for solving complex machine learning problems, updated to include Python 3.8 and TensorFlow 2.x as well as the latest in new algorithms and techniques.

BookJan 2020798 pages

Keras Reinforcement Learning Projects

Keras Reinforcement Learning Projects book teaches you essential concept, techniques and, models of reinforcement learning using best real-world demonstrations. You will explore popular algorithms such as Markov decision process, Monte Carlo, Q-learning making you equipped with complex statistics in various projects with the help of Keras

BookSep 2018288 pages

Mastering Java Machine Learning

Master key Java machine learning libraries and their applications with the help of real-world case studies. Explore advanced machine learning techniques such as anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modeling, text mining, deep learning, and big data batch and stream machine learning.

BookJul 2017556 pages

40 Algorithms Every Programmer Should Know

Algorithms play an important role in both the science and practice of computing. To optimally use algorithms, a deeper understanding of their logic and mathematics is essential. Beyond traditional computing, the ability to apply these algorithms to solve real-world problems is a necessary skill, and this is what this book focuses on.

BookJun 2020382 pages5

40 Algorithms Every Programmer Should Know

Algorithms play an important role in both the science and practice of computing. To optimally use algorithms, a deeper understanding of their logic and mathematics is essential. Beyond traditional computing, the ability to apply these algorithms to solve real-world problems is a necessary skill, and this is what this book focuses on.

BookJun 2020382 pages5

Mastering Machine Learning Algorithms

This book is your guide to quickly get to grips with the most widely used machine learning algorithms. As a data science professional, this book will help you design and train better machine learning models to solve a variety of complex problems, and make the machine learn your requirements.

BookMay 2018576 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages