You're reading from Building Data Science Solutions with Anaconda

Product typeBook

Published inMay 2022

PublisherPackt

ISBN-139781800568785

Edition1st Edition

Tools

Anaconda

Concepts

Data Science

Author (1)

Dan Meador

Chapter 4: Working with Jupyter Notebooks and NumPy

Data is naturally something that is talked about any time that you hear data science discussed, and this data will rarely be in the exact format you need to create your models. In this chapter, we will learn the core skill of data cleaning using NumPy while working in a Jupyter notebook, two of the foundational tools for any data scientist.

By default, you won't see many of the needed operations for multidimensional arrays included with Python, and that's where NumPy comes in. With it, you can perform linear algebra, perform operations on each element, and do it all quickly, which was a challenge before. These core features are what make this package one of the fundamental tools for scientific computing that many other packages are built upon, including pandas and scikit-learn.

We'll also take a visual approach to this work by getting to know Jupyter notebooks. Jupyter notebooks make it incredibly easy to work...

Technical requirements

Luckily, what is needed for this chapter is simply to have Anaconda installed on your local machine; this includes Navigator and Conda. You'll find the installer here: https://bit.ly/3NolaVn.

Once you do that, you will need an Anaconda environment created with Python with version 3.8 or greater.

Now that the quick setup is done, let's start working with Jupyter notebooks.

Working with Jupyter notebooks

Jupyter notebooks are another core tool that every data scientist needs to know. In Jupyter notebooks, you can easily get small blocks of code working and also share what you are doing with others. It is a very common way to present the technical implementation of code.

The summary from the official source (https://bit.ly/3NqwksY) does a great job of describing what a Jupyter notebook is:

The Jupyter notebook is an open source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. Uses include data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

While it's not the best tool for creating larger and more scalable software applications, it's a fantastic choice when learning a new library, working on smaller subsections of a code base to test things out, and in general anything that...

Using NumPy to perform calculations quickly

As we talked about in the How OSS and Anaconda create the data science landscape section in Chapter 2, Analyzing Open Source Software, open source builds on itself. One library uses another to do some basic operations, and that library then itself can be used by something else in order to accomplish a different task or do the same thing in a more abstract way. NumPy is one of those base libraries that is used by a huge number of tools and frameworks to handle fast mathematical operations for arrays.

Created by Travis Oliphant (who later went on to help found Anaconda, Inc), NumPy is used by scikit-learn, SciPy, and pandas in order to focus on the respective problems they are trying to solve and lets NumPy do what it's good at. Getting a good grasp of NumPy allows you to better understand those other higher abstractions that use NumPy later on, as well as being able to use it directly when you are cleaning and creating datasets.

...

Summary

JupyterLab and NumPy are for data scientists what a hacksaw and nail gun are for carpenters. Is using those two things by themselves carpentry? Not exactly, but they are vital tools that you will need in order to be able to achieve the work you want to. This is the same for data science – JupyterLab and NumPy don't cover everything, but they are two things that are going to play an important role in what you are trying to get done.

In this chapter, we discovered how to launch Jupyter notebooks from Anaconda Navigator and how to easily break down work into small chunks and evaluate the parts bit by bit. We saw that you can use a bit of line and cell magic to perform some special actions such as timing a function or operations. We also looked at some ways to speed up operations to save you valuable time. Finally, we saw how execution order matters and that you can use that as a powerful tool to explore.

We also looked at how NumPy can basically be used as a...

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Solutions with Anaconda

Published in: May 2022Publisher: PacktISBN-13: 9781800568785

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dan Meador

Dan Meador is an Engineering Manager at Anaconda and is the creator of Conda as well as a champion of open source at Anaconda. With a history of engineering and client facing roles, he has the ability to jump into any position. He has a track record of delivering as a leader and a follower in companies from the Fortune 10 to startups.
Read more about Dan Meador

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages