Reader small image

You're reading from  Hands-On Predictive Analytics with Python

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789138719
Edition1st Edition
Languages
Right arrow
Author (1)
Alvaro Fuentes
Alvaro Fuentes
author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Right arrow

Problem Understanding and Data Preparation

In the last chapter, we learned about the predictive analytics process; we also learned about some of the fundamental definitions and the main libraries in the Python data ecosystem. In this chapter, we will start getting our hands on a couple of datasets and delve deeper into the first and second phases of the predictive analytics process: Problem understanding and definition and Data collection and preparation.

In the first part of this chapter, we talk about some of the most important considerations when defining and understanding the problem: having enough context and domain knowledge about the problem, and defining what is being predicted and the data that we have to work with. This phase also includes proposing a solution; we talk about some of the main topics to consider.

We put this idea into practice in the second part of the...

Technical requirements

  • Python 3.6 or higher
  • Jupyter Notebook
  • Recent versions of the following Python libraries: NumPy, pandas, and matplotlib

Understanding the business problem and proposing a solution

In this section, we talk about problem understanding and definition, and other aspects related to the activity of defining the problem that will be solved using predictive analytics. Of course, the specifics of this stage depend entirely on the project, so we will provide only very generic guidance about this. However, when discussing the practical examples, we will touch on some of the important considerations when understanding the problem in a predictive analytics project.

Context is everything

What we call Problem understanding and definition is the first stage in the process, and as we mentioned in the last chapter, this is a key stage because here is where we...

Practical project – diamond prices

In this section, we introduce the diamond prices dataset. Let's start implementing the predictive analytics process we discussed in the first chapter. We begin with the stage we just discussed in the last section, Problem understanding and definition.

Diamond prices – problem understanding and definition

A new company, Intelligent Diamond Reseller (IDR), wants to get into the business of reselling diamonds. They want to innovate in the business, so they will use predictive modeling to estimate how much the market will pay for diamonds. Of course, to sell diamonds in the market, first they have to buy them from the producers; this is where predictive modeling becomes useful...

Practical project – credit card default

This is our second practical project, in which we will solve a classification problem. As we did with the diamonds dataset, let's begin the predictive analytics process for this new project by understanding and defining the problem.

Credit card default – problem understanding and definition

TFI is the Taiwanese Financial Institution and it offers credit cards. It has been detecting an increase in defaults among its customers; a default is defined as a customer missing a payment for a single month. This situation is negatively affecting the revenue of the company and they know they can do something about it if they could anticipate which credit card holders are going...

Summary

In this chapter, we have covered two stages in the predictive analytics process: Problem understanding and definition and Data collection and preparation. We learned about important considerations for understanding the problem and proposing the solution; we also introduced the concepts of regression tasks and classification tasks. We got our hands dirty with a couple of datasets that we will continue within the following chapters, and in going through the second phase, Data collection and preparation, with these datasets, we introduce important concepts such as one-hot encoding, outliers, missing values, collinearity, and feature engineering. In addition, we got to practice how to use pandas for loading, exploring, transforming, and preparing a dataset to continue with the next stages of the predictive analytics process.

In the next chapter, we will study the goals of...

Further reading

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Predictive Analytics with Python
Published in: Dec 2018Publisher: PacktISBN-13: 9781789138719
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes