Reader small image

You're reading from  Building Data Science Applications with FastAPI

Product typeBook
Published inOct 2021
Reading LevelBeginner
PublisherPackt
ISBN-139781801079211
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
François Voron
François Voron
author image
François Voron

François Voron graduated from the University of Saint-Étienne (France) and the University of Alicante (Spain) with a master's degree in machine learning and data mining. A full stack web developer and a data scientist, François has a proven track record working in the SaaS industry, with a special focus on Python backends and REST APIs. He is also the creator and maintainer of FastAPI Users, the #1 authentication library for FastAPI, and is one of the top experts in the FastAPI community.
Read more about François Voron

Right arrow

Chapter 11: Introduction to NumPy and pandas

In recent years, Python has gained a lot of popularity in the data science field. Its very efficient and readable syntax makes the language a very good choice for scientific research, while still being suitable for production workloads: it's very easy to deploy research projects into real applications that will bring value to users. Thanks to this growing interest, a lot of specialized Python libraries have emerged. The most well known are probably NumPy and pandas. Their goal is to provide a set of tools to manipulate a big set of data in an efficient way, much more than what we could actually achieve with standard Python, and we'll show how and why in this chapter. NumPy and pandas are at the heart of most data science applications in Python; knowing them is therefore the first step on your journey into Python for data science.

In this chapter, we're going to cover the following main topics:

  • Getting started with...

Technical requirements

You'll need a Python virtual environment, as we set up in Chapter 1, Python Development Environment Setup.

You'll find all the code examples of this chapter in the dedicated GitHub repository: https://github.com/PacktPublishing/Building-Data-Science-Applications-with-FastAPI/tree/main/chapter11.

Getting started with NumPy

In Chapter 2, Python Programming Specificities, we stated that Python is a dynamically typed language. This means that the interpreter automatically detects the type of a variable at runtime, and this type can even change throughout the program. For example, you can do something like this in Python:

$ python
>>> x = 1
>>> type(x)
<class 'int'>
>>> x = "hello"
>>> type(x)
<class 'str'>

The interpreter was able to determine the type of x at each assignation.

Under the hood, the standard implementation of Python, CPython, is written in C. The C language is a compiled and statically typed language. This means that the nature of the variables is fixed at compile time, and they can't change during execution. Thus, in the Python implementation, a variable doesn't only consist in its value: it's actually a structure containing information about the variable, including...

Manipulating arrays with NumPy – computation, aggregations, comparisons

As we said, NumPy is all about manipulating large arrays with great performance and controlled memory consumption. Let's say, for example, that we want to compute the double of each element in a large array. In the following example, you can see an implementation of such a function with a standard Python loop:

chapter11_compare_operations.py

import numpy as np
np.random.seed(0)  # Set the random seed to make examples reproducible
m = np.random.randint(10, size=1000000)  # An array with a million of elements
def standard_double(array):
    output = np.empty(array.size)
    for i in range(array.size):
        output[i] = array[i] * 2
    return output

Getting started with pandas

In the previous section, we introduced NumPy and its ability to efficiently store and work with a large array of data. We'll now introduce another widely used library in data science: pandas. This library is built on top of NumPy to provide convenient data structures able to efficiently store large datasets with labeled rows and columns. This is, of course, especially handy when working with most datasets representing real-world data that we want to analyze and use in data science projects.

To get started, we will, of course, install the library with the usual command:

$ pip install pandas

Once done, we can start to use it in a Python interpreter:

$ python
>>> import pandas as pd

Just like we alias numpy as np, the convention is to alias pandas as pd when importing it.

Using pandas Series for one-dimensional data

The first pandas data structure we'll introduce is Series. This data structure behaves very similarly to...

Summary

Great! You now have a grasp of the ins and outs of NumPy and pandas. Basically, those libraries are the essential tool for data scientists in Python. By relying on optimized and compiled code, they allow you to load and manipulate large set of data in Python, without sacrificing performance. To allow this, they define fixed-type data structures, meaning each value in the dataset should be of the same type. This is what enables efficient memory consumption and fast computations.

Even though those basics should be enough for you to get started, we recommend that you spend some time on the official user guides and tinker with those a bit to discover all their aspects.

As we said in the introduction, NumPy and pandas are at the heart of most data science applications in Python. In the next chapter, we'll see how they will help us in machine learning tasks, along with the well-known machine learning library scikit-learn.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Building Data Science Applications with FastAPI
Published in: Oct 2021Publisher: PacktISBN-13: 9781801079211
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
François Voron

François Voron graduated from the University of Saint-Étienne (France) and the University of Alicante (Spain) with a master's degree in machine learning and data mining. A full stack web developer and a data scientist, François has a proven track record working in the SaaS industry, with a special focus on Python backends and REST APIs. He is also the creator and maintainer of FastAPI Users, the #1 authentication library for FastAPI, and is one of the top experts in the FastAPI community.
Read more about François Voron