You're reading from Building Data Science Solutions with Anaconda

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781800568785

Edition1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1)

Dan Meador

Preface

When Marc Andreessen (https://www.crunchbase.com/person/marc-andreessen) wrote his famous article Why Software Is Eating The World in the Wall Street Journal, https://bit.ly/MarcAndreessen, he described a reality in which every company would be required to become a software company. The power of software was too great, its reach too vast. Companies could ignore it at their own peril. We are at the same inflection point now with Artificial Intelligence (AI).

There is a complexity to the field of AI that makes it both daunting for newcomers but also challenging for those already in it to ensure they have all the different areas covered. Aspects such as bias in models and data, interpretability/explainability, and even managing data science packages can be skills that aren't understood, even though they are critical in being able to build AI systems that will power our world. These concepts and more are no longer going to be optional. Too many resources leave this and many other areas of practical data science out.

After you are done reading this book, you'll wonder how anyone can be in this field and not have an understanding of core concepts such as proximity bias, using Anaconda Distribution, and how Shapley values tell you how features influence a model. All of this is knowledge that you will soon possess. We'll focus on the pragmatic and applicable as we use analogies to solidify your understanding. By the end, you'll be well positioned to take your knowledge of data science to the next level.

Who this book is for

This book is for anyone that not only wants to better understand the world of data science but also those that have a decent grasp and want to become more well rounded in their knowledge on things such as Anaconda tools and Open Source Software (OSS). Assume that you don't have a grasp of areas such as bias or interpretability and that you still don't know all the various types of algorithms you can use to create AI/ML models. We've designed this book to be as self-contained as possible, so you'll only need outside resources when you want to go deeper.

Some basic technical knowledge is expected, but being a developer or even knowing much about data science is not a necessity. You can read this book from beginning to end, or you can jump to the chapters that seem most relevant to you. While each chapter does build on the previous ones, we have structured it in such a way that you won't be lost if you choose to navigate to a specific topic.

What this book covers

Chapter 1, Understanding the AI/ML Landscape, provides an overview of the current state of data science as well as what tools you'll need to succeed.

Chapter 2, Analyzing Open Source Software, delves into the role of OSS in data science and how to decide what new OSS tool to use. You'll get a systematic checklist to look for in the next tool you evaluate.

Chapter 3, Using Anaconda Distribution to Manage Packages, covers how to manage packages with conda and Navigator. This includes how to create environments and create channels.

Chapter 4, Working with Jupyter Notebooks and NumPy, covers how to successfully turn notebooks into your daily driver to create data science value. We'll also go deeper into the powerful NumPy library to vastly speed up our operations.

Chapter 5, Cleaning and Visualizing Data, looks at the core techniques you'll need to shape data coming in to prepare it for model training. We'll cover areas such as imputing and also how we can visualize our data to gain a greater understanding.

Chapter 6, Overcoming Bias in AI/ML, looks at the many ways that naive ignorance can be present in our data and what we can do to avoid or correct these issues. You'll see what the real-world impacts are of a biased AI model.

Chapter 7, Choosing the Best AI Algorithm, goes into some of the major problem families that AI/ML models can help with, including regression and anomaly detection. We'll check out the algorithms you can use as well as the comparative rating for each.

Chapter 8, Dealing with Common Data Problems, looks at how you can identify and correct errors in your datasets, such as incorrect data entries. You'll also see how to scale your data and encode categorical features.

Chapter 9, Building a Regression Model with scikit-learn, walks you through a complete flow of building a regression model and how you can evaluate the results.

Chapter 10, Explainable AI – Using LIME and SHAP, goes further into the results of a model to be able to interpret and also explain how a model arrived at the results it did. Models that are interpretable by design and black-box models are covered.

Chapter 11, Tuning Hyperparameters with scikit-learn Pipelines, takes a more holistic approach and shows you how to leverage pipelines to create a flexible and repeatable process for data preparation and model creation. We'll cover how to use these tools to tune your hyperparameters to create a better model.

To get the most out of this book

All the software used in this book is open source, meaning you will not have to pay for any of it. If you do find it useful, you are encouraged to find ways to give back to these communities, either financially or by contributing to the code base directly. NumFOCUS sponsors many of the tools used; you can find more about them at their website: https://numfocus.org/.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Building-Data-Science-Solutions-with-Anaconda. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800568785_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

from sklearn.model_selection import train_test_split
training_data =	cali_data.data
target_value =	cali_data.target
X_train, X_test, y_train, y_test = train_test_ split(training_data, target_value, test_size = 0.2,
random_state=5)

Any command-line input or output is written as follows:

conda install numpy

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "At the top right of the screen, there will be a Fork button."

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at customercare@packtpub.com and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at copyright@packt.com with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Solutions with Anaconda

Published in: May 2022Publisher: PacktISBN-13: 9781800568785

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Building Data Science Solutions with Anaconda

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Share Your Thoughts

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook