You're reading from The Data Wrangling Workshop - Second Edition

Product typeBook

Published inJul 2020

Reading LevelIntermediate

PublisherPackt

ISBN-139781839215001

Edition2nd Edition

Languages

Python

Tools

Jupyter

Concepts

Data Processing

Authors (3):

Brian Lipp

Shubhadeep Roychowdhury

Dr. Tirthajyoti Sarkar

View More author details

8. RDBMS and SQL

Overview

This chapter will introduce you to the basics of using an RDBMS to query a database using Python and convert data from SQL and then store it in a pandas DataFrame. It will explain the concepts of databases, including their creation, manipulation, and control, and how to transform tables into pandas DataFrames. By the end of this chapter, you will learn some basic SQL commands. This knowledge will make you adept at adding, updating, retrieving, and deleting data from databases; another valuable skill in a budding data wrangling expert's repertoire.

Introduction

This chapter of our data journey is focused on Relational Database Management System (RDBMS) and Structured Query Language (SQL). In the previous chapter, we stored and read data from a file. In this chapter, we will read structured data, design access to the data, and create query interfaces for databases.

For years, the RDBMS format has been the conventional way to store data. An RDBMS is one of the safest ways to store, manage, and retrieve data. It is backed by a solid mathematical foundation (relational algebra and calculus) and exposes an efficient and intuitive declarative language – SQL – for easy interaction. Almost every language has a rich set of libraries to interact with different RDBMS, and the tricks and methods of using them are well tested and well understood.

Scaling an RDBMS is a pretty well-understood task, and there is a group of well trained, experienced professionals to do this job (DBAs, or database administrators).

So, it...

Refresher of RDBMS and SQL

An RDBMS is a piece of software that manages data (represented for the end user in tabular form) on physical hard disks and is built using Codd's relational model. Most of the databases that we encounter today are RDBMS. In recent years, there has been a huge industry shift toward a newer kind of database management system, called NoSQL (MongoDB, CouchDB, Riak, and so on). These systems, while they do follow some of the rules of RDBMS in certain aspects, in most cases they reject or modify them.

How Is an RDBMS Structured?

The RDBMS structure consists of three main elements, namely the storage engine, the query engine, and log management. Here is a diagram that demonstrates the structure of an RDBMS:

Figure 8.2: RDBMS structure

The following are the main concepts of any RDBMS structure:

Storage engine: This is the part of the RDBMS that is responsible for storing data in an efficient way and also retrieving it,...

Relation Mapping in Databases

We have been working with a single table and altering it, as well as reading back the data. However, the real power of an RDBMS comes from the handling of relationships among different objects (tables). In this section, we are going to create a new table called comments and link it with the user table in a 1: N relationship. This means that one user can have multiple comments. The way we are going to do this is by adding the user table's primary key as a foreign key in the comments table. This will create a 1: N relationship.

When we link two tables, we need to specify to the database engine what should be done if the parent row is deleted, which has many children in the other table. As we can see in the following diagram, we are asking what happens at the place of the question marks when we delete row1 of the user table:

Figure 8.6: Illustration of relations

In a non-RDBMS situation, this situation can quickly become difficult...

Joins

Now, we will learn how to exploit the relationship we just built. This means that if we have the primary key from one table, we can recover all the data needed from that table and also all the linked rows from the child table. To achieve this, we will use something called a join.

A join is basically a way to retrieve linked rows from two tables using any kind of primary key – foreign key relation that they have. There are many types of join, including INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, and CROSS. They are used in different situations. However, most of the time, in simple 1: N relations, we end up using an INNER join. In Chapter 1, Introduction to Data Wrangling with Python, we learned about sets. We can view an INNER join as an intersection of two sets. The following diagram illustrate the concepts:

Figure 8.7: A diagram representing the intersection join

Here, A represents one table, and B represents another. The meaning of having...

Retrieving Specific Columns from a JOIN Query

In the previous exercise, we saw that we can use a JOIN to fetch the related rows from two tables. However, if we look at the results, we will see that it returned all the columns, thus combining both tables. This is not very concise. What about if we only want to see the emails and the related comments, and not all the data?

There is some nice shorthand code that lets us do this:

import sqlite3
with sqlite3.connect("../lesson.db") as conn:
    cursor = conn.cursor()
    cursor.execute("PRAGMA foreign_keys = 1")
    sql = """
    SELECT comments.* FROM comments \
    JOIN user ON comments.user_id = user.email \
    WHERE user.email='bob@example.com' \
    """
    rows = cursor.execute(sql)
    for...

Summary

We have come to the end of the database chapter. We have learned how to connect to SQLite using Python. We have brushed up on the basics of relational databases and how to open and close a database. We then learned how to export this relational database into Python DataFrames.

In the next chapter, we will be performing data wrangling on datasets that are used in business use cases. We will use different types of datasets and then clean and process the data in a meaningful way. We will be able to apply all the skills and tricks we have learned so far in this book to process data and get valuable insights from them.

The rest of the chapter is locked

You have been reading a chapter from

The Data Wrangling Workshop - Second Edition

Published in: Jul 2020Publisher: PacktISBN-13: 9781839215001

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Brian Lipp

Brian Lipp is a Technology Polyglot, Engineer, and Solution Architect with a wide skillset in many technology domains. His programming background has ranged from R, Python, and Scala, to Go and Rust development. He has worked on Big Data systems, Data Lakes, data warehouses, and backend software engineering. Brian earned a Master of Science, CSIS from Pace University in 2009. He is currently a Sr. Data Engineer working with large Tech firms to build Data Ecosystems.
Read more about Brian Lipp

Shubhadeep Roychowdhury

Shubhadeep Roychowdhury holds a master's degree in computer science from West Bengal University of Technology and certifications in machine learning from Stanford. He works as a senior software engineer at a Paris-based cybersecurity startup, where he is applying state-of-the-art computer vision and data engineering algorithms and tools to develop cutting-edge products. He often writes about algorithm implementation in Python and similar topics.
Read more about Shubhadeep Roychowdhury

Dr. Tirthajyoti Sarkar

Dr. Tirthajyoti Sarkar works as a senior principal engineer in the semiconductor technology domain, where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He writes regularly about Python programming and data science topics. He holds a Ph.D. from the University of Illinois and certifications in artificial intelligence and machine learning from Stanford and MIT.
Read more about Dr. Tirthajyoti Sarkar

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages