Reader small image

You're reading from  Machine Learning with R Quick Start Guide

Product typeBook
Published inMar 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781838644338
Edition1st Edition
Languages
Tools
Right arrow
Author (1)
Iván Pastor Sanz
Iván Pastor Sanz
author image
Iván Pastor Sanz

Ivn Pastor Sanz is a lead data scientist and machine learning enthusiast with extensive experience in finance, risk management, and credit risk modeling. Ivn has always endeavored to find solutions to make banking more comprehensible, accessible, and fair. Thus, in his thesis to obtain his PhD in economics, Ivn tried to identify the origins of the 2008 financial crisis and suggest ways to avoid a similar crisis in the future.
Read more about Iván Pastor Sanz

Right arrow

Taking further steps

We will be using the US bankruptcy problem statement to help you understand machine learning processes in depth and also to give you hands-on experience in dealing with and solving real-world problems. All the following chapters will describe each step in detail.

The objective of the following chapters is to describe all the steps and alternatives to develop a model based on machine learning techniques.

We will see several steps, starting from the extraction of the information and the generation of new variables up to the validation of the model. As we will see, in each step of the development, some alternatives or multiple steps are possible. In most of the cases, the best alternative will be the one that gives a better predictive model, but sometimes other alternatives will be chosen owing to some restrictions that are imposed by the future use of the model or the kind of problem we want to solve.

Background on the financial crisis

In this book, we will solve two different problems related to the financial crisis: the bankruptcy of the US banks and the assessment of the solvency of the European countries. Why have I chosen such a specific problem for this book? Well, the first reason is my concern about the financial crisis and my aim to try to avoid future crises. On the other hand, it is an interesting problem because a high amount of data is available, making it a very appropriate problem to understand machine learning techniques.

Most of the chapters in this book will cover the development of a predictive model to detect the failures of banks. To solve this problem, we will use a large dataset that collects some of the more typical problems you can find when dealing with different algorithms. For example, a high amount of observations and variables and an unbalanced sample means one of the categories in the classification model is much larger than the other.

Some of the steps we will see during the following chapters are as follows:

  • Data collection
  • Features generation
  • Descriptive analysis
  • Treatment of missing information
  • Univariate analysis
  • Multivariate analysis
  • Model selection

The last chapter will focus on the development of models to detect economic imbalances in the European countries, while covering some basic text mining and clustering techniques.

Although this book is technical, one of the most important aspects of each big data and machine learning solution is understanding the problem that we need to solve.

By the end of this book, you will see that just knowing algorithms is not enough to develop models. There are many important steps that you will need to follow before jumping into running algorithms. If you pay attention to these preliminary steps, you are more likely to get good results.

In this sense, and because I'm passionate about economic theory, you can find a summary about the causes of the problems that we will solve in this book, from an economic point of view, in the repository where the code for this book is located. Specifically, the causes of the financial crisis and the contagion and transformation to a sovereign crisis are described.

Previous PageNext Page
You have been reading a chapter from
Machine Learning with R Quick Start Guide
Published in: Mar 2019Publisher: PacktISBN-13: 9781838644338
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Iván Pastor Sanz

Ivn Pastor Sanz is a lead data scientist and machine learning enthusiast with extensive experience in finance, risk management, and credit risk modeling. Ivn has always endeavored to find solutions to make banking more comprehensible, accessible, and fair. Thus, in his thesis to obtain his PhD in economics, Ivn tried to identify the origins of the 2008 financial crisis and suggest ways to avoid a similar crisis in the future.
Read more about Iván Pastor Sanz