You're reading from Building Data Science Applications with FastAPI

Product typeBook

Published inOct 2021

Reading LevelBeginner

PublisherPackt

ISBN-139781801079211

Edition1st Edition

Languages

Python

Tools

Fastlane

Concepts

Data Science

Author (1)

François Voron

Chapter 11: Introduction to NumPy and pandas

In recent years, Python has gained a lot of popularity in the data science field. Its very efficient and readable syntax makes the language a very good choice for scientific research, while still being suitable for production workloads: it's very easy to deploy research projects into real applications that will bring value to users. Thanks to this growing interest, a lot of specialized Python libraries have emerged. The most well known are probably NumPy and pandas. Their goal is to provide a set of tools to manipulate a big set of data in an efficient way, much more than what we could actually achieve with standard Python, and we'll show how and why in this chapter. NumPy and pandas are at the heart of most data science applications in Python; knowing them is therefore the first step on your journey into Python for data science.

In this chapter, we're going to cover the following main topics:

Getting started with...

Technical requirements

You'll need a Python virtual environment, as we set up in Chapter 1, Python Development Environment Setup.

You'll find all the code examples of this chapter in the dedicated GitHub repository: https://github.com/PacktPublishing/Building-Data-Science-Applications-with-FastAPI/tree/main/chapter11.

Getting started with NumPy

In Chapter 2, Python Programming Specificities, we stated that Python is a dynamically typed language. This means that the interpreter automatically detects the type of a variable at runtime, and this type can even change throughout the program. For example, you can do something like this in Python:

$ python
>>> x = 1
>>> type(x)
<class 'int'>
>>> x = "hello"
>>> type(x)
<class 'str'>

The interpreter was able to determine the type of x at each assignation.

Under the hood, the standard implementation of Python, CPython, is written in C. The C language is a compiled and statically typed language. This means that the nature of the variables is fixed at compile time, and they can't change during execution. Thus, in the Python implementation, a variable doesn't only consist in its value: it's actually a structure containing information about the variable, including...

Manipulating arrays with NumPy – computation, aggregations, comparisons

As we said, NumPy is all about manipulating large arrays with great performance and controlled memory consumption. Let's say, for example, that we want to compute the double of each element in a large array. In the following example, you can see an implementation of such a function with a standard Python loop:

chapter11_compare_operations.py

import numpy as np
np.random.seed(0)  # Set the random seed to make examples reproducible
m = np.random.randint(10, size=1000000)  # An array with a million of elements
def standard_double(array):
    output = np.empty(array.size)
    for i in range(array.size):
        output[i] = array[i] * 2
    return output

https://github.com/PacktPublishing/Building-Data-Science-Applications-with-FastAPI/blob/main/chapter11/chapter11_compare_operations...

Getting started with pandas

In the previous section, we introduced NumPy and its ability to efficiently store and work with a large array of data. We'll now introduce another widely used library in data science: pandas. This library is built on top of NumPy to provide convenient data structures able to efficiently store large datasets with labeled rows and columns. This is, of course, especially handy when working with most datasets representing real-world data that we want to analyze and use in data science projects.

To get started, we will, of course, install the library with the usual command:

$ pip install pandas

Once done, we can start to use it in a Python interpreter:

$ python
>>> import pandas as pd

Just like we alias numpy as np, the convention is to alias pandas as pd when importing it.

Using pandas Series for one-dimensional data

The first pandas data structure we'll introduce is Series. This data structure behaves very similarly to...

Summary

Great! You now have a grasp of the ins and outs of NumPy and pandas. Basically, those libraries are the essential tool for data scientists in Python. By relying on optimized and compiled code, they allow you to load and manipulate large set of data in Python, without sacrificing performance. To allow this, they define fixed-type data structures, meaning each value in the dataset should be of the same type. This is what enables efficient memory consumption and fast computations.

Even though those basics should be enough for you to get started, we recommend that you spend some time on the official user guides and tinker with those a bit to discover all their aspects.

As we said in the introduction, NumPy and pandas are at the heart of most data science applications in Python. In the next chapter, we'll see how they will help us in machine learning tasks, along with the well-known machine learning library scikit-learn.

The rest of the chapter is locked

You have been reading a chapter from

Building Data Science Applications with FastAPI

Published in: Oct 2021Publisher: PacktISBN-13: 9781801079211

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

François Voron

François Voron graduated from the University of Saint-Étienne (France) and the University of Alicante (Spain) with a master's degree in machine learning and data mining. A full stack web developer and a data scientist, François has a proven track record working in the SaaS industry, with a special focus on Python backends and REST APIs. He is also the creator and maintainer of FastAPI Users, the #1 authentication library for FastAPI, and is one of the top experts in the FastAPI community.
Read more about François Voron

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages