You're reading from Data Ingestion with Python Cookbook

Product typeBook

Published inMay 2023

PublisherPackt

ISBN-139781837632602

Edition1st Edition

Concepts

Data Engineering

Author (1)

Gláucia Esppenchutz

Setting up Python and its environment

In the data world, languages such as Java, Scala, or Python are commonly used. The first two languages are used due to their compatibility with the big data tools environment, such as Hadoop and Spark, the central core of which runs on a Java Virtual Machine (JVM). However, in the past few years, the use of Python for data engineering and data science has increased significantly due to the language’s versatility, ease of understanding, and many open source libraries built by the community.

Getting ready

Let’s create a folder for our project:

First, open your system command line. Since I use the Windows Subsystem for Linux (WSL), I will open the WSL application.
Go to your home directory and create a folder as follows:
```
$ mkdir my-project
```
Go inside this folder:
```
$ cd my-project
```
Check your Python version on your operating system as follows:
```
$ python -–version
```

Depending on your operational system, you might or might not have output here – for example, WSL 20.04 users might have the following output:

Command 'python' not found, did you mean:
 command 'python3' from deb python3
 command 'python' from deb python-is-python3

If your Python path is configured to use the python command, you will see output similar to this:

Python 3.9.0

Sometimes, your Python path might be configured to be invoked using python3. You can try it using the following command:

$ python3 --version

The output will be similar to the python command, as follows:

Python 3.9.0

Now, let’s check our pip version. This check is essential, since some operating systems have more than one Python version installed:
```
$ pip --version
```

You should see similar output:

pip 20.0.2 from /usr/lib/python3/dist-packages/pip (python 3.9)

If your operating system (OS) uses a Python version below 3.8x or doesn’t have the language installed, proceed to the How to do it steps; otherwise, you are ready to start the following Installing PySpark recipe.

How to do it…

We are going to use the official installer from Python.org. You can find the link for it here: https://www.python.org/downloads/:

Note

For Windows users, it is important to check your OS version, since Python 3.10 may not be yet compatible with Windows 7, or your processor type (32-bits or 64-bits).

Download one of the stable versions.

At the time of writing, the stable recommended versions compatible with the tools and resources presented here are 3.8, 3.9, and 3.10. I will use the 3.9 version and download it using the following link: https://www.python.org/downloads/release/python-390/. Scrolling down the page, you will find a list of links to Python installers according to OS, as shown in the following screenshot.

Figure 1.1 – Python.org download files for version 3.9

After downloading the installation file, double-click it and follow the instructions in the wizard window. To avoid complexity, choose the recommended settings displayed.

The following screenshot shows how it looks on Windows:

Figure 1.2 – The Python Installer for Windows

If you are a Linux user, you can install it from the source using the following commands:

$ wget https://www.python.org/ftp/python/3.9.1/Python-3.9.1.tgz
$ tar -xf Python-3.9.1.tgz
$ ./configure –enable-optimizations
$ make -j 9

After installing Python, you should be able to execute the pip command. If not, refer to the pip official documentation page here: https://pip.pypa.io/en/stable/installation/.

How it works…

Python is an interpreted language, and its interpreter extends several functions made with C or C++. The language package also comes with several built-in libraries and, of course, the interpreter.

The interpreter works like a Unix shell and can be found in the usr/local/bin directory: https://docs.python.org/3/tutorial/interpreter.html.

Lastly, note that many Python third-party packages in this book require the pip command to be installed. This is because pip (an acronym for Pip Installs Packages) is the default package manager for Python; therefore, it is used to install, upgrade, and manage the Python packages and dependencies from the Python Package Index (PyPI).

There’s more…

Even if you don’t have any Python versions on your machine, you can still install them using the command line or HomeBrew (for macOS users). Windows users can also download them from the MS Windows Store.

Note

If you choose to download Python from the Windows Store, ensure you use an application made by the Python Software Foundation.

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Data Ingestion with Python Cookbook

Setting up Python and its environment

Getting ready

How to do it…

How it works…

There’s more…

See also

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook