Reader small image

You're reading from  Codeless Deep Learning with KNIME

Product typeBook
Published inNov 2020
Reading LevelIntermediate
PublisherPackt
ISBN-139781800566613
Edition1st Edition
Languages
Tools
Right arrow
Authors (3):
Kathrin Melcher
Kathrin Melcher
author image
Kathrin Melcher

Kathrin Melcher is a data scientist at KNIME. She holds a master's degree in mathematics from the University of Konstanz, Germany. She joined the evangelism team at KNIME in 2017 and has a strong interest in data science and machine learning algorithms. She enjoys teaching and sharing her data science knowledge with the community, for example, in the book From Excel to KNIME, as well as on various blog posts and at training courses, workshops, and conference presentations.
Read more about Kathrin Melcher

Rosaria Silipo
Rosaria Silipo
author image
Rosaria Silipo

Rosaria Silipo, Ph.D., now head of data science evangelism at KNIME, has spent 25+ years in applied AI, predictive analytics, and machine learning at Siemens, Viseca, Nuance Communications, and private consulting. Sharing her practical experience in a broad range of industries and deployments, including IoT, customer intelligence, financial services, social media, and cybersecurity, Rosaria has authored 50+ technical publications, including her recent books Guide to Intelligent Data Science (Springer) and Codeless Deep Learning with KNIME (Packt).
Read more about Rosaria Silipo

View More author details
Right arrow

Chapter 5: Autoencoder for Fraud Detection

At this point in the book, you should already know the basic math and concepts behind neural networks and some deep learning paradigms, as well as the most useful KNIME nodes for data preparation, how to build a neural network, how to train it and test it, and finally, how to evaluate it. We have built together, in Chapter 4, Building and Training a Feedforward Neural Network, two examples of fully connected feedforward neural networks: one to solve a multiclass classification problem on the Iris dataset and one to solve a binary classification problem on the Adult dataset.

Those were two simple examples using quite small datasets, in which all the classes were adequately represented, with just a few hidden layers in the network and a straightforward encoding of the output classes. However, they served their purpose: to teach you how to assemble, train, and apply a neural network in KNIME Analytics Platform.

Now, the time has come to...

Introducing Autoencoders

In previous chapters, we have seen that neural networks are very powerful algorithms. The power of each network lies in its architecture, activation functions, and regularization terms, plus a few other features. Among the varieties of neural architectures, there is a very versatile one, especially useful for three tasks: detecting unknown events, detecting unexpected events, and reducing the dimensionality of the input space. This neural network is the autoencoder.

Architecture of the Autoencoder

The autoencoder (or autoassociator) is a multilayer feedforward neural network, trained to reproduce the input vector onto the output layer. Like many neural networks, it is trained using the gradient descent algorithm, or one of its modern variations, against a loss function, such as the Mean Squared Error (MSE). It can have as many hidden layers as desired. Regularization terms and other general parameters that are useful for avoiding overfitting or for improving...

Why is Detecting Fraud so Hard?

Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses. Fraud detection is applied in many industries, such as banking or insurance. In banking, fraud may include forging checks or using stolen credit cards. For this example, we will focus on fraud in credit card transactions.

This kind of fraud, in credit card transactions, is a huge problem for credit card issuers as well as for the final payers. The European Central Bank reported that in 2016, the total number of card fraud cases using cards issued in the Single Euro Payments Area (SEPA) amounted to 17.3 million, and the total number of card transactions using cards issued in SEPA amounted to 74.9 billion (https://www.ecb.europa.eu/pub/cardfraud/html/ecb.cardfraudreport201809.en.html#toc1).

However, the amount of fraud is not the only problem. From a data science perspective, fraud detection is also a very hard task to solve...

Building and Training the Autoencoder

Let's go into detail about the particular application we will build to tackle fraud detection with a neural autoencoder. Like all data science projects, it includes two separate applications: one to train and optimize the whole strategy on dedicated datasets, and one to set it in action to analyze real-world credit card transactions. The first application is implemented with the training workflow; the second application is implemented with the deployment workflow.

Tip

Often, training and deployment are separate applications since they work on different data and have different goals.

The training workflow uses a lab dataset to produce an acceptable model to implement the task, sometimes requiring a few different trials. The deployment workflow does not change the model or the strategy anymore; it just applies it to real-world transactions to get fraud alarms.

In this section, we will focus on the training phase, including the following...

Optimizing the Autoencoder Strategy

What is the best value to use for threshold ? In the last section, we adopted based on our experience. However, is this the best value for ? Threshold , in this case, is not automatically optimized via the training procedure. It is just a static parameter external to the training algorithm. In KNIME Analytics Platform, it is also possible to optimize static parameters outside of the Learner nodes.

Optimizing Threshold

Threshold is defined on a separate subset of data, called the optimization set. There are two options here:

  • If an optimization set with labeled fraudulent transactions is available, the value of threshold is optimized against any accuracy measure for fraud detection.
  • If no labeled fraudulent transactions are available in the dataset, the value of threshold is defined as a high percentile of the reconstruction errors on the optimization set.

During the data preparation phase, we generated three data subsets...

Deploying the Fraud Detector

At this point, we have an autoencoder network and a rule with acceptable performance for fraud detection. In this section, we will implement the deployment workflow.

The deployment workflow (Figure 5.11), like all deployment workflows, takes in new transaction data, passes it through the autoencoder, calculates the distance, applies the fraud detection rule, and finally, flags the input transaction as fraud or legitimate.

This workflow, named 02_Autoencoder_for_Fraud_Detection_Deployment, is downloadable from the KNIME Hub: https://hub.knime.com/kathrin/spaces/Codeless%20Deep%20Learning%20with%20KNIME/latest/Chapter%205/:

Figure 5.11 – The deployment workflow

Figure 5.11 – The deployment workflow

Let's have a look at the different parts of the workflow in detail.

Reading Network, New Transactions, and Normalization Parameters

In this workflow, first the autoencoder model is read from the previously saved Keras file, using the Keras...

Summary

In this chapter, we discussed approaches for building a fraud detector for credit card transactions in the desperate case when no, or almost no, examples of the fraud class are available. This solution trains a neural autoencoder to reproduce legitimate transactions from the input onto the output layer. Some postprocessing is necessary to set an alarm for the fraud candidate based on the reconstruction error.

In describing this solution, we have introduced the concept of training and deployment applications, components, optimization loops, and switch blocks.

In the next chapter, we will discuss a special family of neural networks, so-called recurrent neural networks, and how they can be used to train neural networks for sequential data.

Questions and Exercises

Check your level of understanding of the concepts presented in this chapter by answering the following questions:

  1. What is the goal of an autoencoder during training?

    a) To reproduce the input to the...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Codeless Deep Learning with KNIME
Published in: Nov 2020Publisher: PacktISBN-13: 9781800566613
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Kathrin Melcher

Kathrin Melcher is a data scientist at KNIME. She holds a master's degree in mathematics from the University of Konstanz, Germany. She joined the evangelism team at KNIME in 2017 and has a strong interest in data science and machine learning algorithms. She enjoys teaching and sharing her data science knowledge with the community, for example, in the book From Excel to KNIME, as well as on various blog posts and at training courses, workshops, and conference presentations.
Read more about Kathrin Melcher

author image
Rosaria Silipo

Rosaria Silipo, Ph.D., now head of data science evangelism at KNIME, has spent 25+ years in applied AI, predictive analytics, and machine learning at Siemens, Viseca, Nuance Communications, and private consulting. Sharing her practical experience in a broad range of industries and deployments, including IoT, customer intelligence, financial services, social media, and cybersecurity, Rosaria has authored 50+ technical publications, including her recent books Guide to Intelligent Data Science (Springer) and Codeless Deep Learning with KNIME (Packt).
Read more about Rosaria Silipo