Reader small image

You're reading from  Graph Machine Learning

Product typeBook
Published inJun 2021
PublisherPackt
ISBN-139781800204492
Edition1st Edition
Right arrow
Authors (3):
Claudio Stamile
Claudio Stamile
author image
Claudio Stamile

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.
Read more about Claudio Stamile

Aldo Marzullo
Aldo Marzullo
author image
Aldo Marzullo

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.
Read more about Aldo Marzullo

Enrico Deusebio
Enrico Deusebio
author image
Enrico Deusebio

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.
Read more about Enrico Deusebio

View More author details
Right arrow

Chapter 8:Graph Analysis for Credit Card Transactions

Analysis of financial data is one of the most common and important domains in big data and data analysis. Indeed, due to the increasing number of mobile devices and the introduction of a standard platform for online payment, the amount of transactional data that banks are producing and consuming is increasing exponentially.

As a consequence, new tools and techniques are needed to exploit as much as we can from this huge amount of information in order to better understand customers' behavior and support data-driven decisions in business processes. Data can also be used to build better mechanisms to improve security in the online payment process. Indeed, as online payment systems are becoming increasingly popular due to e-commerce platforms, at the same time, cases of fraud are also increasing. An example of a fraudulent transaction is a transaction performed with a stolen credit card. Indeed, in this case, the fraudulent...

Technical requirements

We will be using Jupyter notebooks with Python 3.8 for all of our exercises. The following is a list of Python libraries that will be installed for this chapter using pip. For example, run pip install networkx==2.5 on the command line:

Jupyter==1.0.0
networkx==2.5
scikit-learn==0.24.0
pandas==1.1.3
node2vec==0.3.3
numpy==1.19.2
communities==2.2.0

In the rest of this book, unless clearly stated to the contrary, we will refer to nx as the results of the Python import networkx as nx command.

All code files relevant to this chapter are available at https://github.com/PacktPublishing/Graph-Machine-Learning/tree/main/Chapter08.

Overview of the dataset

The dataset used in this chapter is the Credit Card Transactions Fraud Detection Dataset available on Kaggle at the following URL: https://www.kaggle.com/kartik2112/fraud-detection?select=fraudTrain.csv.

The dataset is made up of simulated credit card transactions containing legitimate and fraudulent transactions for the period January 1, 2019 – December 31, 2020. It includes the credit cards of 1,000 customers performing transactions with a pool of 800 merchants. The dataset was generated using Sparkov Data Generation. More information about the generation algorithm is available at the following URL: https://github.com/namebrandon/Sparkov_Data_Generation.

For each transaction, the dataset contains 23 different features. In the following table, we will show only the information that will be used in this chapter:

Table 8.1 – List of variables used in the dataset

For the purposes of our analysis, we will use the fraudTrain...

Network topology and community detection

In this section, we are going to analyze some graph metrics to have a clear picture of the general structure of the graph. We will be using networkx to compute most of the useful metrics we have seen in Chapter 1, Getting Started with Graphs. We will try to interpret the metrics to gain insights into the graph.

Network topology

A good starting point for our analysis is the extraction of simple graph metrics to have a general understanding of the main properties of bipartite and tripartite transaction graphs.

We start by looking at the distribution of the degree for both bipartite and tripartite graphs using the following code:

for G in [G_bu, G_tu]:
  plt.figure(figsize=(10,10))
  degrees = pd.Series({k: v for k, v in nx.degree(G)})
  degrees.plot.hist()
  plt.yscale("log")

By way of a result, we get the plot in the following diagram:

Figure 8.3 – Degree...

Embedding for supervised and unsupervised fraud detection

In this section, we will describe how the bipartite and tripartite graphs described previously can be used by graph machine learning algorithms to build automatic procedures for fraud detection using supervised and unsupervised approaches. As we already discussed at the beginning of this chapter, transactions are represented by edges, and we then want to classify each edge in the correct class: fraudulent or genuine.

The pipeline we will use to perform the classification task is the following:

  • A sampling procedure for the imbalanced task
  • The use of an unsupervised embedding algorithm to create a feature vector for each edge
  • The application of supervised and unsupervised machine learning algorithms to the feature space defined in the previous point

Supervised approach to fraudulent transaction identification

Since our dataset is strongly imbalanced, with fraudulent transactions representing 2.83%...

Summary

In this chapter, we described how a classical fraud detection task can be described as a graph problem and how the techniques described in the previous chapter can be used to tackle the problem. Going into more detail, we introduced the dataset we used and described the procedure to transform the transactional data into two types of graph, namely, bipartite and tripartite undirected graphs. We then computed local (along with their distributions) and global metrics for both graphs, comparing the results.

Moreover, a community detection algorithm was applied to the graphs in order to spot and plot specific regions of the transaction graph where the density of fraudulent transactions is higher compared to the other communities.

Finally, we solved the fraud detection problem using supervised and unsupervised algorithms, comparing the performances of the bipartite and tripartite graphs. As the first step, since the problem was unbalanced with a higher presence of genuine transactions...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Graph Machine Learning
Published in: Jun 2021Publisher: PacktISBN-13: 9781800204492
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Claudio Stamile

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.
Read more about Claudio Stamile

author image
Aldo Marzullo

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.
Read more about Aldo Marzullo

author image
Enrico Deusebio

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.
Read more about Enrico Deusebio