Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Building Data Science Solutions with Anaconda

You're reading from  Building Data Science Solutions with Anaconda

Product type Book
Published in May 2022
Publisher Packt
ISBN-13 9781800568785
Pages 330 pages
Edition 1st Edition
Languages
Author (1):
Dan Meador Dan Meador
Profile icon Dan Meador

Table of Contents (16) Chapters

Preface Part 1: The Data Science Landscape – Open Source to the Rescue
Chapter 1: Understanding the AI/ML landscape Chapter 2: Analyzing Open Source Software Chapter 3: Using the Anaconda Distribution to Manage Packages Chapter 4: Working with Jupyter Notebooks and NumPy Part 2: Data Is the New Oil, Models Are the New Refineries
Chapter 5: Cleaning and Visualizing Data Chapter 6: Overcoming Bias in AI/ML Chapter 7: Choosing the Best AI Algorithm Chapter 8: Dealing with Common Data Problems Part 3: Practical Examples and Applications
Chapter 9: Building a Regression Model with scikit-learn Chapter 10: Explainable AI - Using LIME and SHAP Chapter 11: Tuning Hyperparameters and Versioning Your Model Other Books You May Enjoy

Chapter 3: Using the Anaconda Distribution to Manage Packages

If software packages are the tools, then a package manager is the A package manager allows you to quickly and effectively find the packages you need and ensures that each tool works seamlessly with each other. It is key to being able to clean data, build models, and create functioning software.

In this chapter, we will take a deeper look at what these packages are, where they live, and how you can use conda and Anaconda tools to incorporate them into your projects. Knowing how to pull in the packages that you need will be the first step in any data science project, so it's vital that you can do this easily.

We'll also create two conda environments and see how you can make use of these to make repeatable projects, share them with your colleagues, or maybe just keep them for yourself.

Finally, you'll discover some more advanced conda features, such as setting up your .condarc configuration file, which...

Technical requirements

To successfully execute the instructions in the chapter, make sure Anaconda Individual Edition is installed. This includes conda and Navigator. This can be downloaded here: https://www.anaconda.com/products/individual.

The conda.yml file, which contains the config information for the conda environment discussed in this chapter, can be found in the GitHub repository: https://github.com/PacktPublishing/Building-Data-Science-Solutions-with-Anaconda/tree/main/Chapter03.

Learning how dependency resolution works

Dependencies are part of the fundamental pieces of software development and data science. Back in Chapter 1, Understanding the AI/ML Landscape, we gave an overview of dependencies by using a cooking example with things that you need to make a certain dish, and how those requirements can conflict with one another, resulting in a tricky situation. We could provide another food analogy, but let's give something that's a little more real to understand – why the alternative of not using anyone else's packages and libraries can be a challenge.

Let's take a look and see whether we can get by without any dependencies for building a simple web application where users can create accounts and pick their favorite movies so you can recommend ones they might like:

  1. First, you will need to get some form of authentication for your project. You will need to brush up on your security skills, such as seeding, hashing functions...

Discovering what conda environments are and how to use them

In this section, you'll learn how to use a key feature of Anaconda that allows you to create separate spaces for your code: environments. You'll gain an understanding of what they are and how to create them in conda and in Navigator, as well as how to share those environments with others.

Let's start by making sure you have a clear understanding of what environments are and how they fit in with the other parts of conda.

There are the three core pillars of conda, environments, packages, and channels, and each is vital to know in order to master conda. We have already learned about packages in Chapter 2, Analyzing Open Source Software. Here, we will go through environments and channels using the analogy of getting food at the grocery store.

The first of the three pillars is environments. With our food analogy, let's say you are making spaghetti, with brownies for dessert. You need the milk and...

Managing channels with Anaconda Navigator and conda

It's time to look at the last pillar of the Anaconda landscape, which is channels. By the end of this section, you'll know what channels are and how to specify them, what the .condarc file is and how to set it up, and how to get the exact package version that you need.

This is a quick note to forewarn you that some of this section might not apply to what you need if you are grabbing more basic packages. Doing things such as setting the priority of channels isn't something you have to worry about, so don't worry too much about knowing these things in detail. It might be incredibly useful, however, as your software becomes more complex.

Let's start with making sure we have a clear idea of what a channel is.

Understanding what a channel is

A channel is simply a repo that contains a specific and intentional group of packages created by an individual or company. It allows you to have a set group of...

Using advanced conda info and settings

There are many layers to conda, and we've just covered the ones you will use first. There are many other operations that you'll find extremely useful and that will, in turn, enable you to work much more easily.

Let's cover how you can find out where conda is looking for its operations and set up a settings file to keep your preferred way of using conda intact through multiple sessions.

Using conda info to see configuration information

You will find yourself troubleshooting and needing to reference where conda looks for certain settings, and for this, conda info has you covered.

Let's run conda info in our ch_3_env environment and see what we get. Some of the output here has been omitted to keep the code a bit more concise:

conda info 
    active environment : base
    active env location : C:\Users\Dan\anaconda3
       user config file...

Conda cheat sheet

Here are the more common conda commands that you'll find yourself using quite often, along with some extra ones that will prove to be handy by way of a quick reference.

Conda general commands

The general commands are as follows:

  • conda install <package>: Searches for and installs the specified package from your channels
  • conda info: Shows the basic information about conda
  • conda config --add channels conda-forge: Adds a channel for searching for packages
  • conda update –all: Updates all packages that it can
  • conda search <package>: Searches the appropriate channel-specific package
  • conda create –-name <environment_name> python=<python version>: Creates a new conda environment and installs the specified Python version
  • conda activate <environment_name>: Activates the specified conda environment that allows you to use it

Conda environment commands

The environment-specific commands...

Summary

By now you've seen how you can create environments with the command line with conda, as well as employing a more visual approach with Navigator. You have an understanding of what dependency management looks like and why it's important to take into account different transitive dependencies and their versions.

You've seen how easy and simple it can be to upload and enable the sharing of your environments so that others can benefit from them.

In addition, we've wrapped up the three pillars of conda by looking at channels and how conda makes use of many different sources for the packages it needs. We've touched on conda-forge as one of those locations.

Much of what you have learned in this chapter will prove to be invaluable in your everyday work and I encourage you to reference the conda cheat sheet here until you no longer need it. You'll be using some commands so often that it won't take long.

Now that we know the Anaconda tools...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Building Data Science Solutions with Anaconda
Published in: May 2022 Publisher: Packt ISBN-13: 9781800568785
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}