Reader small image

You're reading from  Hands-On Predictive Analytics with Python

Product typeBook
Published inDec 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789138719
Edition1st Edition
Languages
Right arrow
Author (1)
Alvaro Fuentes
Alvaro Fuentes
author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes

Right arrow

Preface

Predictive analytics is one of the most important technologies of our time. Every day, companies in all industries and all types of institutions are using predictive techniques to solve a wide range of problems. Although many of the main ideas and techniques have been around for many decades, the use of predictive analytics has exploded recently due to the increased ability to capture and store data, which is the raw material from which we build predictive models. There are two other big factors that explain the increasing adoption of this technology: the first is the astonishing increase in computing power, and the second is the availability of many open source software projects that have given access to professionals outside academia to many of the most powerful predictive analytics techniques. The Python programming language and its ecosystem of analytical libraries, also known as Python's data science stack, is such a project and has democratized the use of advanced analytical techniques.

This is a book about predictive analytics, but rather than focusing exclusively on explaining in detail the algorithms and techniques, this book is more about the process of doing predictive analytics in the real world. The main goal of this book is to make you familiar with all the stages in the process of solving a business problem using predictive modeling and to show, with hands-on examples, how to use Python and its data analytics ecosystem to implement many of the main techniques and approaches used in real-world predictive analytics projects. We use two main projects in this book and walk you through the entire predictive analytics process: from business and problem understanding to model deployment, all through hands-on examples.

There are many techniques that can be used for predictive analytics: statistical models, time series analysis, and spatial statistics to mention a few; however, in this book, we focus on the most widely applicable and successful set of techniques: machine learning, specifically the branch of supervised learning.

In my view, a predictive model is only a means to an end. The goal of using predictive analytics is to solve problems; therefore, a good predictive model is not one that uses the latest and most fashionable techniques, nor is it the most complicated or the simplest one. A good predictive model one can be used to solve a real-world problem in a satisfactory way. My goal is that by the end of this book you will have the foundations that you need to start solving real-world problems using predictive analytics.

Who this book is for

This book is aimed at data scientists, data engineers, software engineers, and business analysts. Also, students and professionals who are constantly working with data in quantitative fields such as finance, economics, and business, among others, who would like to build models to make predictions will find this book useful. In general, this book is aimed at all professionals who would like to focus on the practical implementation of predictive analytics with Python.

What this book covers

Chapter 1, The Predictive Analytics Process, presents the foundational concepts of the field, explains at a high level the different stages in the predictive analytics process, and gives an overview of the libraries we will use in the book.

Chapter 2, Problem Understanding and Data Preparation, introduces the problems and datasets we will be using throughout the book and shows the basics of how to collect and prepare a dataset for modeling.

Chapter 3, Dataset Understanding – Exploratory Data Analysis, shows how to get important information from a dataset using visualizations and other numerical techniques.

Chapter 4, Predicting Numerical Values with Machine Learning, introduces the main ideas and concepts of machine learning and some of the most popular regression models.

Chapter 5, Predicting Categories with Machine Learning, introduces some of the most important classification machine learning models.

Chapter 6, Introducing Neural Nets for Predictive Analytics, shows how to build neural network models. These have become very popular because they are very powerful and are capable of producing highly accurate models.

Chapter 7, Model Evaluation, shows the main metrics and approaches you need to evaluate how good the predictions produced by a predictive model are.

Chapter 8, Model Tuning and Improving Performance, presents important techniques such as K-fold cross-validation that will improve the performance of our predictive model.

Chapter 9, Implementing a Model with Dash, shows how to build an interactive web application that will take input from the user and will use a trained predictive model to provide predictions.

To get the most out of this book

To get the most out of this book, these are the prerequisites:

  • Fluency in Python programming
  • Knowledge of basic statistical concepts

Knowledge of Python's data science stack is an advantage but is not essential. We will also be using Python 3.6 and many of the main analytical libraries. The easiest way to get them is by installing the Anaconda distribution. This is not required, but it will make your life easier. Go to https://www.anaconda.com/download/ to learn more about this software.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at www.packt.com.
  2. Select the SUPPORT tab.
  3. Click on Code Downloads & Errata.
  4. Enter the name of the book in the Search box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR/7-Zip for Windows
  • Zipeg/iZip/UnRarX for Mac
  • 7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Predictive-Analytics-with-Python. In case there's an update to the code, it will be updated https://github.com/PacktPublishing on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

carat_values = np.arange(0.5, 5.5, 0.5)
preds = first_ml_model(carat_values)
pd.DataFrame({"Carat": carat_values, "Predicted price":preds})

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

numerator = ((ccd['default']==1) & (ccd['male']==1)).sum()/N
denominator = Prob_B
Prob_A_given_B = numerator/denominator
print("P(A|B) = {:0.4f}".format(Prob_A_given_B))

Any command-line input or output is written as follows:

dim_features.corr()

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at customercare@packtpub.com.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Hands-On Predictive Analytics with Python
Published in: Dec 2018Publisher: PacktISBN-13: 9781789138719
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alvaro Fuentes

Alvaro Fuentes is a senior data scientist with a background in applied mathematics and economics. He has more than 14 years of experience in various analytical roles and is an analytics consultant at one of the ‘Big Three' global management consulting firms, leading advanced analytics projects in different industries like banking, technology, and consumer goods. Alvaro is also an author and trainer in analytics and data science and has published courses and books, such as 'Become a Python Data Analyst' and 'Hands-On Predictive Analytics with Python'. He has also taught data science and related topics to thousands of students both on-site and online through different platforms such as Springboard, Simplilearn, Udemy, and BSG Institute, among others.
Read more about Alvaro Fuentes