You're reading from Machine Learning with R Quick Start Guide

Product typeBook

Published inMar 2019

Reading LevelIntermediate

PublisherPackt

ISBN-139781838644338

Edition1st Edition

Languages

Tools

RStudio

Concepts

Machine Learning

Author (1)

Iván Pastor Sanz

Predicting Failures of Banks - Multivariate Analysis

In this chapter, we are going to apply different algorithms with the aim of obtaining a good model using combinations of our predictors. The most common algorithm that's used in credit risk applications, such as credit scoring and rating, is logistic regression. In this chapter, we will see how other algorithms can be applied to solve some of the weaknesses of logistic regression.

In this chapter, we will be covering the following topics:

Logistic regression
Regularized methods
Testing a random forest model
Gradient boosting
Deep learning in neural networks
Support vector machines
Ensembles
Automatic machine learning

Logistic regression

Mathematically, a binary logistic model has a dependent variable with two categorical values. In our example, these values relate to whether or not a bank is solvent.

In a logistic model, log odds refers to the logarithm of the odds for a class, which is a linear combination of one or more independent variables, as follows:

The coefficients (beta values, β) of the logistic regression algorithm must be estimated using maximum likelihood estimation. Maximum likelihood estimation involves getting values for the regression coefficients that minimize the error in the probabilities that are predicted by the model and the real observed case.

Logistic regression is very sensitive to the presence of outlier values, so high correlations in variables should be avoided. Logistic regression in R can be applied as follows:

set.seed(1234)
LogisticRegression=glm(train...

Regularized methods

There are three common approaches to using regularized methods:

Lasso
Ridge
Elastic net

In this section, we will see how these methods can be implemented in R. For these models, we will use the h2o package. This provides a predictive analysis platform to be used in machine learning that is open source, based on in-memory parameters, and distributed, fast, and scalable. It helps in creating models that are built on big data and is most suitable for enterprise applications as it enhances production quality.

For more information on the h2o package, please visit its documentation at https://cran.r-project.org/web/packages/h2o/index.html.

This package is very useful because it summarizes several common machine learning algorithms in one package. Moreover, these algorithms can be executed in parallel on our own computer, as it is very fast. The package includes...

Testing a random forest model

A random forest is an ensemble of decision trees. In a decision tree, the training sample, which is based on the independent variables, will be split into two or more homogeneous sets. This algorithm deals with both categorical and continuous variables. The best attribute is selected using a recursive selection method and is split to form the leaf nodes. This continues until a criterion that's meant to stop the loop is met. Every tree that's created by the expansion of leaf nodes is considered to be a weak learner. This weak learner is built on top of the rows and columns of the subsets. The higher the number of trees, the lower the variance. Both classification and regression random forests calculate the average prediction of all of the trees to make a final prediction.

When a random forest is trained, some different parameters can be set...

Gradient boosting

Gradient boosting means combining weak and average predictors to acquire one strong predictor. This ensures robustness. It is similar to a random forest, which is mainly based on decision trees. The difference is that the sample is not modified from one tree to another; only the weights of the different observations are modified.

Boosting trains trees sequentially by using information from previously trained trees. For this, we first need to create decision trees using the training dataset. Then, we need to create another model that does nothing but rectify the errors that occurred in the training model. This process is repeated sequentially until the specified number of trees, or some other stopping rule, is reached.

More specific details about the algorithm can be found in the documentation of the h2o package. While training the algorithm, we will need to define...

Deep learning in neural networks

For machine learning, we need systems that can process nonlinear and unrelated sets of data. This is very important so that we can make predictions for bankruptcy problems, since the relationship between the default and explanatory variables will rarely be linear. Therefore, using neural networks is the best possible solution.

Artificial neural networks (ANNs) have long since been used to solve bankruptcy problems. An ANN is a computer system that has a number of interconnected processors. These processors provide outputs by processing information and by responding dynamically to the inputs that are provided. A prominent and basic example of ANN is the multilayer perceptron (MLP). An MLP can be represented as follows:

Except for the input nodes, each node is a neuron that uses a nonlinear activation function, which was sent in.

As is evident from...

Support vector machines

The support vector machine (SVM) algorithm is a supervised learning technique. To understand this algorithm, take a look at the following diagram for the optimal hyperplane and maximum margin:

In this classification problem, we only have two classes that exist for many possible solutions to a problem. As shown in the preceding diagram, the SVM classifies these objects by calculating an optimal hyperplane and maximizing the margins between the classes. Both of these things will differentiate the classes to the maximum extent. Samples that are placed closest to the margin are known as support vectors. The problem is then treated as an optimization problem and can be solved by optimization techniques, the most common one being the use of Lagrange multipliers.

Even in a separable linear problem, as shown in the preceding diagram, sometimes, it is not always...

Ensembles

At this point, we have trained five different models. The predictions are stored in two data frames, one for training and the other for the validation samples:

head(summary_models_train)
 ##    ID_RSSD Default          GLM RF            GBM              deep
 ## 4       37       0 0.0013554364  0 0.000005755001 0.000000018217172
 ## 21     242       0 0.0006967876  0 0.000005755001 0.000000002088871
 ## 38     279       0 0.0028306028  0 0.000005240935 0.000003555978680
 ## 52     354       0 0.0013898732  0 0.000005707480 0.000000782777042
 ## 78     457       0 0.0021731695  0 0.000005755001 0.000000012535539
 ## 81     505       0 0.0011344433  0 0.000005461855 0.000000012267744
 ##             SVM
 ## 4  0.0006227083
 ## 21 0.0002813123
 ## 38 0.0010763298
 ## 52 0.0009740568
 ## 78 0.0021555739
 ## 81 0.0005557417

Let's summarize the accuracy of the previously trained models...

Automatic machine learning

Now that we have learned how to develop a powerful model to predict bank failures, we will test a final option to develop different models. Specifically, we will try out automatic machine learning (autoML), which is included in the h2o package. The process that we have carried out to build many models and find the best one without any prior knowledge is done automatically by the autoML function. This function trains different models by trying different grids of parameters. Moreover, stacked ensembles or models based on previously trained models are trained to find more accurate or predictive models.

In my opinion, using this function before launching any model is highly recommended to get an initial idea of a reference starting point. Using an automatic approach, we can assess the most reliable algorithms, the most important potential variables to be...

Summary

In this chapter, we used different models and algorithms to try and optimize our model. All of the algorithms obtained good results. This would not have been the case in other problems. You can try using different algorithms in your problems and test the best combinations of parameters to solve your specific problem. A combination of different algorithms or ensembles might be a good option as well.

In the next chapter, we will continue by looking at other real problems—specifically, data visualization of economic imbalances in European countries.

The rest of the chapter is locked

You have been reading a chapter from

Machine Learning with R Quick Start Guide

Published in: Mar 2019Publisher: PacktISBN-13: 9781838644338

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Iván Pastor Sanz

Ivn Pastor Sanz is a lead data scientist and machine learning enthusiast with extensive experience in finance, risk management, and credit risk modeling. Ivn has always endeavored to find solutions to make banking more comprehensible, accessible, and fair. Thus, in his thesis to obtain his PhD in economics, Ivn tried to identify the origins of the 2008 financial crisis and suggest ways to avoid a similar crisis in the future.
Read more about Iván Pastor Sanz

Other recommended products

Related to this chapter

Ensemble Machine Learning Cookbook

This book uses a recipe-based approach to showcase the power of machine learning algorithms to build ensemble models using Python libraries. Through this book, you will be able to pick up the code, understand in depth how it works, execute and implement it efficiently. This will be a desk reference to implement a wide range of tasks and solve the common and uncommon problems in ensemble machine learning domain.

BookJan 2019336 pages

Machine Learning with R Cookbook

The R language is a powerful open source functional programming language. At its core, R is a statistical language that provides impressive tools to analyze data and create high-level graphics. This book covers the basics of R by setting up a user-friendly programming environment and programming ETL in R. Data exploration examples are provided that demonstrate how powerful data visualisation and machine learning is in discovering hidden relationships. You will also explore air quality data, steps to fix the missing values and visualising the same. You will then dive into important machine learning topics, including data classification, regression, survival analysis, time series analysis, clustering association rule mining, and dimension reduction.This book will include the latest code and examples based on R 3.3 and above—updated for better computation, accuracy, and speed with R.

BookOct 2017572 pages

Hands-On Time Series Analysis with R

This book introduces you to time series analysis and forecasting with R; this is one of the key fields in statistical programming and includes techniques for analyzing data to extract meaningful insights. You will explore methods, such as prediction with time series analysis, and identify the relationship between each data point in the series.

BookMay 2019448 pages

R Data Mining

This book will empower you to produce and present impressive analyses from data, by selecting and implementing the appropriate data mining techniques in R. Explore a data mining crime case, where you will be requested to help resolving a real fraud case affecting a commercial company, by the mean of both basic and advanced data mining techniques.

BookNov 2017442 pages

Mastering Machine Learning with R

Machine learning is the field of Artificial Intelligence where we build systems that learn from data. Given the growing prominence of R—a cross-platform, zero-cost statistical programming environment—there has never been a better time to start applying machine learning to your data. This book will teach you advanced techniques in machine learning with the latest code in R 3.3.2.

BookApr 2017420 pages

Neural Networks with R

The book helps you learn neural networks and implement them in R. It covers real-world use cases that will help you better understand their concepts. A basic understanding of R and mathematics is required.

BookSep 2017270 pages

Applied Supervised Learning with R

Applied Supervised Learning with R will make you a pro at identifying your business problem, selecting the best supervised machine learning algorithm to solve it, and fine-tuning your model to exactly deliver your needs without overfitting itself.

BookMay 2019502 pages

Learning Quantitative Finance with R

This book covers applications of quantitative finance in R. It starts with the basics of quantitative finance and goes to complexity at the end of the book along with a varying degree of R complexity. This will guide you to implement different trading strategies for various financial instruments using basic to complex techniques along with its optimization and keeping the risk of financial instruments in check.

BookMar 2017284 pages

Hands-On Ensemble Learning with R

This book introduces you to the concept of ensemble learning and demonstrates how different machine learning algorithms can be combined to build efficient machine learning models. Use R to implement the popular trilogy of ensemble techniques, i.e. bagging, random forest and boosting, to build faster and more accurate machine learning models.

BookJul 2018376 pages

Statistical Application Development with R and Python

Statistical Analysis involves collecting and examining data to describe the nature of data that needs to be analyzed. It helps you explore the relation of data and build models to make better decisions. You will begin with a brief understanding of the nature of data and end with modern and advanced statistical models like CART. Every step is taken with DATA and R code, and further enhanced by Python. By the end of this book you will be able to apply your statistical learning in major domains at work or in your projects.

BookAug 2017432 pages

R Machine Learning Projects

The purpose of the book is to help a machine learning practitioner gets hands-on experience in working with real-world data and apply modern machine learning algorithms. You will learn to implement each algorithm to a specific industry problem. It covers projects involving both supervised as well as unsupervised learning approaches.

BookJan 2019334 pages

Mastering Machine Learning with R

Machine learning is a field of AI where we build systems that learn from data. This book explains complicated concepts with real-world applications. It demonstrates the power of R and machine learning extensively while highlighting the constraints. Finally, it will walk you through topics such as text analysis, time series, and deep learning.

BookJan 2019354 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages