Packt+ | Advance your knowledge in tech

You're reading from Learning Predictive Analytics with Python

Product typeBook

Published inFeb 2016

Reading LevelIntermediate

Publisher

ISBN-139781783983261

Edition1st Edition

Languages

Python

Concepts

Predictive Analytics

Authors (2):

Ashish Kumar

Gary Dougan

View More author details

Chapter 9. Best Practices for Predictive Modelling

As we have seen in all the chapters on the modelling techniques, a predictive model is nothing but a set of mathematical equations derived using a few lines of codes. In essence, this code together with a slide-deck highlighting the high-level results from the model constitute a project. However, the user of our solution is more interested in finding a solution for the problem he is facing in the business context. It is the responsibility of the analyst or the data scientist to offer the solution in a way that is user-friendly and maximizes output or insights.

There are some general guidelines that can be followed for the optimum results in a predictive modelling project. As predictive modelling comprises a mix of computer science techniques, algorithms, statistics, and business context capabilities, the best practices in the predictive modelling are a total of the best practices in the aforementioned individual fields.

In this chapter, we...

Best practices for coding

When one uses Python for predictive modelling, one needs to write small snippets of code. To ensure that one gets the maximum out of their code snippets and that the work is reproducible, one should be aware of and aspire to follow the best practices in coding. Some of the best practices for coding are as follows.

Commenting the codes

There is a tradeoff between the elegance and understandability of a code snippet. As a code snippet becomes more elegant, its understandability by a new user (other than the author of the snippet) decreases. Some of the users are interested only in the end results, but most of the users like to understand what is going on behind the hood and want to have a good understanding of the code.

For the code snippet to be understandable by a new person or the user of the code, it is a common practice to comment on the important lines, if not all the lines, and write the headings for the major chunks of the code. Some of the properties of a comment...

Best practices for data handling

Data cleaning and manipulation constitutes the framework of any analytics project. To ensure that this important step is executed efficiently, the following best practices should be executed:

After importing the dataset, one should ensure that the dataset (all the variables and rows) has been read correctly. This means reading all the variables in their correct or required format. Sometimes, due to some limitation on the data or the IDE side, some variables are read wrongly and they need to be formatted to the correct format.
For example, if a variable reports some numerical ID (let's say 10-digits long), many a times it would be read and displayed in a scientific notation. However, this would be wrong as it is an ID and shouldn't be displayed in a scientific notation. Sometimes, a variable containing long strings are truncated. These issues should be taken care of before performing any operation on the data.
After every data manipulation step such as transposing...

Best practices for algorithms

The choice of which algorithm to deploy to answer a business question depends on a variety of parameters, and there is no one good answer. The choice of algorithm generally depends on the nature of the predictor and output variables; also, the overarching nature of the business problem at hand—whether it is a numerical prediction, classification, or an aggregation problem. Based on these preliminary criteria, one can shortlist a few existing methods to apply on the dataset.

Each method will have its own pros and cons, and the final decision should be taken keeping in mind the business context. The decision for the best-suited algorithm is usually taken based on the following two requirements:

Sometimes, the user of the result is interested only in the accuracy of the results. In such cases, the choice of the algorithm is done based on the accuracy of the algorithms. All the qualifying models are run and the one with the maximum accuracy is finalized.
At other times...

Best practices for statistics

Statistics are an integral part of any predictive modelling assignment. Statistics are important because they help us gauge the efficiency of a model. Each predictive model generates a set of statistics, which suggests how good the model is and how the model can be fine-tuned to perform better. The following is a summary of the most widely reported statistics and their desired values for the predictive models described in this book:

Algorithms	Statistics/Parameter	The desired value of statistics
Linear regression	R₂, p-values, F-statistic, and Adj. R₂	High Adj. R₂, low F-statistic, and low p-value
Logistic regression	Sensitivity, specificity, Area Under the Curve (AUC), and KS statistic	High AUC (proximity to 1)
Clustering	Intra-cluster distance and silhouette coefficient	High intra-cluster distance and high silhouette coefficient (proximity to 1)
Decision trees (classification)	AUC and KS statistics	High AUC (proximity to 1)

While reporting...

Best practices for business contexts

This is the meatiest part of the report created for a predictive modeling project. Some users of the report will navigate directly to this section as they are primarily interested in the overall effect of the project. Thus, it is imperative to mention the highlights and most important findings of the project in this section. This is different from reporting the statistics, which is in a way the raw output of the predictive model. In this section, we will focus on the following:

Findings and insights of the analyses
Major problems identified
Major results from the model
The accuracy or efficiency of the model
Action steps for the user to solve the business problem, and so on

If it is a customer segmentation problem, mention the names and characteristics of the segments identified along with the statistical summary for each segment. Recommend a plan to maximize sales and revenue (or whatever the business objective might be) for each of the segments.

If it is a...

Summary

What are the do's and don'ts of a predictive modelling project? This chapter dealt with these pressing questions and listed a number of best practices to make a predictive modelling project successful. Following are the important points:

Codes should be well-commented, modular, version-controlled, generalized, and not have hard-coded values.
Data should be observed carefully after every import and manipulation in order to check for any errors that might creep in while performing these operations.
The choice of the algorithm is guided by the nature of the predictor and outcome variable. The ultimate selection of the algorithm depends upon whether the user prioritizes accuracy or the understandability of the algorithm.
While reporting the results of a predictive model, the most optimum value of the important statistics and their relevance should be clearly stated.
Main business questions should be clearly answered. Major finding should be reported clearly. Some actionable recommendations...

The rest of the chapter is locked

You have been reading a chapter from

Learning Predictive Analytics with Python

Published in: Feb 2016Publisher: ISBN-13: 9781783983261

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Ashish Kumar

Ashish Kumar is a seasoned data science professional, a publisher author and a thought leader in the field of data science and machine learning. An IIT Madras graduate and a Young India Fellow, he has around 7 years of experience in implementing and deploying data science and machine learning solutions for challenging industry problems in both hands-on and leadership roles. Natural Language Procession, IoT Analytics, R Shiny product development, Ensemble ML methods etc. are his core areas of expertise. He is fluent in Python and R and teaches a popular ML course at Simplilearn. When not crunching data, Ashish sneaks off to the next hip beach around and enjoys the company of his Kindle. He also trains and mentors data science aspirants and fledgling start-ups.
Read more about Ashish Kumar

Gary Dougan

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages