You're reading from Building Data Science Solutions with Anaconda

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781800568785

Edition1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1)

Dan Meador

Chapter 11: Tuning Hyperparameters and Versioning Your Model

The journey of a data scientist is always an iterative one. Understanding how to create a process that is scalable and repeatable ensures that you can smoothly move through all the phases of data cleaning and model discovery.

In this chapter, we will cover how to create a pipeline that will combine a lot of the small steps we have learned throughout the book into an easier flow. We will then see how you can use a grid search to uncover the best hyperparameters to ensure you are creating the best possible model. We will then show you how you can create saved and versioned models to let you easily return to a previous model at any point in time. All these skills will allow for much greater accessibility and flexibility to your end goal of creating a maintainable process.

Specifically, we will cover the following in this chapter:

Creating a scikit-learn pipeline
Finding optimal hyperparameters with GridSearchCV...

Technical requirements

To get the most out of this chapter, you will need the Anaconda distribution installed. This will include Python, conda, and Navigator.

It is also necessary to have a conda environment setup with the following packages installed:

scikit-learn version 0.23.x
pandas
NumPy
joblib

It is preferable to install all these at the beginning, but you can also do so at the necessary parts of the chapter.

With these parts in place, it's now time to create a pipeline.

Creating a scikit-learn pipeline

If there is one thing that you may have noticed by now in this book, it's that there are many common steps for every problem we have looked at. We're now going to ensure that we can more easily iterate on the data and model creation steps by leveraging the scikit-learn pipeline to put together an easy, repeatable process. In this section, we are going to take a previous workflow that would ordinarily need to be repeated many times and turn it into a single unit, which will allow you much greater flexibility and save time compared to the previous process. If you are starting with this chapter or jumping to it before going through the others, you need to know that the underlying concepts covered previously are still incredibly important to understand.

To visualize what the process is going to look like, you can refer to the following diagram. On the left, you will see normal data input being passed into the pipeline object. In that pipeline...

Finding optimal hyperparameters with GridSearchCV

As we have created new models and tried various data processing techniques, we have used many different parameters and function arguments to determine how we set up the problem. One example is the impute method. Mean, median, or some other advanced approach – how do we know which we should take? One naïve approach might be to simply create a for loop and try every technique. We can calculate the score for each and use the best one. We tried a similar approach before when looking at which algorithm would give us the best score in the previous section.

This might be naïve, but never overlook the simple. It is such a good approach that scikit-learn decided to package that together and make an easy method to do so. It will even perform a k-fold cross-validation to make sure it is getting the best solution. There are a few different ways to tune hyperparameters, but we're going to focus on a grid search.

A grid...

Versioning and storing your model

As we have been working through this book, there has been one glaring issue that you might have noticed – when you closed your integration development environment, terminal, or Jupyter notebook, your model and data were gone. We won't go into the more involved topics of working and saving information on databases or other persistence layers, but there are some quite simple things you can do to create save points along the way.

Understanding the value of versioning your model

As you've worked through everything from data engineering to building models in this book, you have realized that there are a lot of iterations that happen. It's called data science, but there is also an art to guessing a path and trying to know where to go next. You've tried to make educated guesses with hyperparameters and model families, and kept the original dataset open to come back to as needed. This was all needed in case you were wrong....

Summary

In this last chapter, we covered what is the final batch of skills you will need to get up to speed in becoming a data scientist using Anaconda as a base.

We started by seeing how scikit-learn pipelines let you take discrete parts of the data science workflow and create a cohesive unit in a much more elegant way by putting estimators together, like pieces of a puzzle. We also saw how these can include things such as your scalers and imputers, finally ending in an algorithm type.

We then understood that many of the arguments we have been using throughout this book, such as the depth of a random forest, are called hyperparameters and that they are a vital component to get right. Looking at GridSearchCV from sckit-learn, we put together a grid search over possible combinations, being careful to balance the speed of discovery with the best attributes.

Finally, we looked at the value of versioning our model with pickling and joblib. We packaged up our optimized model into...

Close

Whether you are a seasoned veteran looking to brush up on your skills or brand new to the field, by now, you understand that the landscape is vast, but the journey starts small. Throughout this book, we have covered a lot of ground, including types of algorithms, how to avoid bias, and even evaluating open source tools. This is but a taste of all there is to discover, and I think you'll agree that there isn't a better skill to know or place to explore in the modern age than the wonderful world of data science.

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Solutions with Anaconda

Published in: May 2022Publisher: PacktISBN-13: 9781800568785

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Building Data Science Solutions with Anaconda

Chapter 11: Tuning Hyperparameters and Versioning Your Model

Technical requirements

Creating a scikit-learn pipeline

Finding optimal hyperparameters with GridSearchCV

Versioning and storing your model

Understanding the value of versioning your model

Summary

Close

Why subscribe?

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook