Packt+ | Advance your knowledge in tech

You're reading from R Machine Learning By Example

Product typeBook

Published inMar 2016

Reading LevelIntermediate

Publisher

ISBN-139781784390846

Edition1st Edition

Languages

Tools

RStudio

Concepts

Machine Learning

Author (1)

Raghav Bali

Chapter 5. Credit Risk Detection and Prediction – Descriptive Analytics

In the last two chapters, you saw some interesting problems revolving around the retail and e-commerce domains. You now know how to detect and predict shopping trends from shopping patterns as well as how to build recommendation systems. If you remember from Chapter 1, Getting started with R and Machine Learning that the applications of machine learning are diverse, we can apply the same concepts and techniques to solve a wide variety of problems in the real world. We will be tackling a completely new problem here, but hold on to what you have learnt because several concepts you learnt previously will come in handy soon!

In the next couple of chapters, we will be tackling a new problem related to the financial domain. We will be looking at the bank customers of a particular German bank who could be credit risks for the bank, based on some data that has been previously collected. We will perform descriptive and exploratory...

Types of analytics

Before we start tackling our next challenge, it will be useful to get an idea of the different types of analytics which broadly encompass the data science domain. We use a variety of data mining and machine learning techniques to solve different data problems. However, depending on the mechanism of the technique and its end result, we can broadly classify analytics into four different types which are explained next:

Descriptive analytics: This is what we use when we have some data to analyze. We start with looking at the different attributes of the data, extract meaningful features, and use statistics and visualizations to understand what has already happened. The main aim of descriptive analytics is to get a broad idea of what kind of data we are dealing with and summarize what has happened in the past. Above almost 80% of all analytics in businesses today are descriptive.
Diagnostic analytics: This is sometimes clubbed together with descriptive analytics. Here the main...

Our next challenge

We have dealt with some interesting applications of machine learning in the e-commerce domain in the last couple of chapters. For the next two chapters, our big challenge will be in the financial domain. We will be using data analysis and machine learning techniques to analyze financial data from a German bank. This data will contain a lot of information regarding customers of that bank. We will be analyzing that data in two stages which include descriptive and predictive analytics.

Descriptive: Here we will look closely at the data and its various attributes. We will perform descriptive analysis and visualizations to see the kind of features we are dealing with and how they might be related to credit risk. The data we will be dealing with here consists of labeled data already and we will be able to see how many customers were credit risks and how many weren't. We will also look closely at each feature in the data and understand its significance which will be useful in...

What is credit risk?

We have been using this term credit risk since the start of this chapter and many of you might be wondering what exactly does this mean, even though you might have guessed it after reading the previous section. Here, we will be explaining this term clearly so that you will have no problem in understanding the data and its features in the subsequent sections when we will be analyzing the data.

The standard definition of credit risk is the risk of defaulting on a debt which takes place due to the borrower failing to make the required debt payments in time. This risk is taken by the lender since the lender incurs losses of both the principal amount as well as the interest on it.

In our case, we will be dealing with a bank which acts as the financial organization giving out loans to customers who apply for them. Hence, customers who might default on the loan payment would be credit risks for the bank. By analyzing customer data and applying machine learning algorithms on it...

Getting the data

The first step in our data analysis pipeline is to get the dataset. We have actually cleaned the data and provided meaningful names to the data attributes and you can check that out by opening the german_credit_dataset.csv file. You can also get the actual dataset from the source which is from the Department of Statistics, University of Munich through the following URL: http://www.statistik.lmu.de/service/datenarchiv/kredit/kredit_e.html.

You can download the data and then run the following commands by firing up R in the same directory with the data file, to get a feel of the data we will be dealing with in the following sections:

> # load in the data and attach the data frame
> credit.df <- read.csv("german_credit_dataset.csv", header = TRUE, sep = ",") 
> # class should be data.frame
> class(credit.df)
[1] "data.frame"
> 
> # get a quick peek at the data
> head(credit.df)

The following figure shows the first six rows of the data. Each column indicates...

Data preprocessing

In this section, we will be focusing on data preprocessing which includes data cleaning, transformation, and normalizations if required. Basically, we perform operations to get the data ready before we start performing any analysis on it.

Dealing with missing values

There will be situations when the data you are dealing with will have missing values, which are often represented as NA in R. There are several ways to detect them and we will show you a couple of ways next. Note that there are several ways in which you can do this.

> # check if data frame contains NA values
> sum(is.na(credit.df))
[1] 0
> 
> # check if total records reduced after removing rows with NA 
> # values
> sum(complete.cases(credit.df))
[1] 1000

The is.na function is really useful as it helps in finding out if any element has an NA value in the dataset. There is another way of doing the same by using the complete.cases function, which essentially returns a logical vector saying whether...

Data analysis and transformation

Now that we have processed our data, it is ready for analysis. We will be carrying out descriptive and exploratory analysis in this section, as mentioned earlier. We will analyze the different dataset attributes and talk about their significance, semantics, and relationship with the credit risk attribute. We will be using statistical functions, contingency tables, and visualizations to depict all of this.

Besides this, we will also be doing data transformation for some of the features in our dataset, namely the categorical variables. We will be doing this to combine the category classes which have similar semantics and remove the classes having very less proportion by merging them with a similar class. Some reasons for doing this include preventing the overfitting of our predictive models, which we will be building in Chapter 6, Credit Risk Detection and Prediction – Predictive Analytics, linking semantically similar classes together and also because modeling...

Next steps

We have analyzed our dataset, performed necessary feature engineering and statistical tests, built visualizations, and gained substantial domain knowledge about credit risk analysis and what kind of features are considered by banks when they analyze customers. The reason why we analyzed each feature in the dataset in detail was to give you an idea about each feature that is considered by banks when analyzing credit rating for customers. This was to give you good domain knowledge understanding and also to help you get familiar with the techniques of performing an exploratory and descriptive analysis of any dataset in the future. So, what next? Now comes the really interesting part of using this dataset; building feature sets from this data and feeding them into predictive models to predict which customers can be potential credit risks and which of them are not. As mentioned previously, there are two steps to this: data and algorithms. In fact, we will go a step further and say...

Summary

Congratulations on staying until the end of this chapter! You have learnt several important things by now which we have covered in this chapter. You now have an idea about one of the most important areas in the financial domain, that is, Credit Risk analysis. Besides this, you also gained significant domain knowledge about how banks analyze customers for their credit ratings and what kind of attributes and features are considered by them. Descriptive and exploratory analysis of the dataset also gave you an insight into how to start working from scratch when you just have a problem to solve and a dataset given to you! You now know how to perform feature engineering, build beautiful publication quality visualizations using ggplot2, and perform statistical tests to check feature associations. Finally, we wrapped up our discussion by talking about feature sets and gave a brief introduction to several supervised machine learning algorithms which will help us in the next step of predicting...

The rest of the chapter is locked

You have been reading a chapter from

R Machine Learning By Example

Published in: Mar 2016Publisher: ISBN-13: 9781784390846

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Raghav Bali

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages