Reader small image

You're reading from  Building Data Science Solutions with Anaconda

Product typeBook
Published inMay 2022
PublisherPackt
ISBN-139781800568785
Edition1st Edition
Concepts
Right arrow
Author (1)
Dan Meador
Dan Meador
author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Right arrow

Chapter 4: Working with Jupyter Notebooks and NumPy

Data is naturally something that is talked about any time that you hear data science discussed, and this data will rarely be in the exact format you need to create your models. In this chapter, we will learn the core skill of data cleaning using NumPy while working in a Jupyter notebook, two of the foundational tools for any data scientist.

By default, you won't see many of the needed operations for multidimensional arrays included with Python, and that's where NumPy comes in. With it, you can perform linear algebra, perform operations on each element, and do it all quickly, which was a challenge before. These core features are what make this package one of the fundamental tools for scientific computing that many other packages are built upon, including pandas and scikit-learn.

We'll also take a visual approach to this work by getting to know Jupyter notebooks. Jupyter notebooks make it incredibly easy to work...

Technical requirements

Luckily, what is needed for this chapter is simply to have Anaconda installed on your local machine; this includes Navigator and Conda. You'll find the installer here: https://bit.ly/3NolaVn.

Once you do that, you will need an Anaconda environment created with Python with version 3.8 or greater.

Now that the quick setup is done, let's start working with Jupyter notebooks.

Working with Jupyter notebooks

Jupyter notebooks are another core tool that every data scientist needs to know. In Jupyter notebooks, you can easily get small blocks of code working and also share what you are doing with others. It is a very common way to present the technical implementation of code.

The summary from the official source (https://bit.ly/3NqwksY) does a great job of describing what a Jupyter notebook is:

The Jupyter notebook is an open source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

While it's not the best tool for creating larger and more scalable software applications, it's a fantastic choice when learning a new library, working on smaller subsections of a code base to test things out, and in general anything that...

Using NumPy to perform calculations quickly

As we talked about in the How OSS and Anaconda create the data science landscape section in Chapter 2, Analyzing Open Source Software, open source builds on itself. One library uses another to do some basic operations, and that library then itself can be used by something else in order to accomplish a different task or do the same thing in a more abstract way. NumPy is one of those base libraries that is used by a huge number of tools and frameworks to handle fast mathematical operations for arrays.

Created by Travis Oliphant (who later went on to help found Anaconda, Inc), NumPy is used by scikit-learn, SciPy, and pandas in order to focus on the respective problems they are trying to solve and lets NumPy do what it's good at. Getting a good grasp of NumPy allows you to better understand those other higher abstractions that use NumPy later on, as well as being able to use it directly when you are cleaning and creating datasets.

...

Summary

JupyterLab and NumPy are for data scientists what a hacksaw and nail gun are for carpenters. Is using those two things by themselves carpentry? Not exactly, but they are vital tools that you will need in order to be able to achieve the work you want to. This is the same for data science – JupyterLab and NumPy don't cover everything, but they are two things that are going to play an important role in what you are trying to get done.

In this chapter, we discovered how to launch Jupyter notebooks from Anaconda Navigator and how to easily break down work into small chunks and evaluate the parts bit by bit. We saw that you can use a bit of line and cell magic to perform some special actions such as timing a function or operations. We also looked at some ways to speed up operations to save you valuable time. Finally, we saw how execution order matters and that you can use that as a powerful tool to explore.

We also looked at how NumPy can basically be used as a...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Science Solutions with Anaconda
Published in: May 2022Publisher: PacktISBN-13: 9781800568785
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador