Reader small image

You're reading from  Data Ingestion with Python Cookbook

Product typeBook
Published inMay 2023
PublisherPackt
ISBN-139781837632602
Edition1st Edition
Right arrow
Author (1)
Gláucia Esppenchutz
Gláucia Esppenchutz
author image
Gláucia Esppenchutz

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz

Right arrow

Creating schemas

Schemas are considered blueprints of a database or table. While some databases strictly require schema definition, others can work without it. However, in some cases, it is advantageous to work with data schemas to ensure that the application data architecture is maintained and can receive the desired data input.

Getting ready

Let’s imagine we need to create a database for a school to store information about the students, the courses, and the instructors. With this information, we know we have at least three tables so far.

Figure 1.13 – A table diagram for three entities

Figure 1.13 – A table diagram for three entities

In this recipe, we will cover how schemas work using the Entity Relationship Diagram (ERD), a visual representation of relationships between entities in a database, to exemplify how schemas are connected.

How to do it…

Here are the steps to try this:

  1. We define the type of schema. The following figure helps us understand how to go about this:
Figure 1.14 – A diagram to help you decide which schema to use

Figure 1.14 – A diagram to help you decide which schema to use

  1. Then, we define the fields and the data type for each table column:
Figure 1.15 – A definition of the columns of each table

Figure 1.15 – A definition of the columns of each table

  1. Next, we define which fields can be empty or NULL:
Figure 1.16 – A definition of which columns can be NULL

Figure 1.16 – A definition of which columns can be NULL

  1. Then, we create the relationship between the tables:
Figure 1.17 – A relationship diagram of the tables

Figure 1.17 – A relationship diagram of the tables

How it works…

When designing data schemas, the first thing we need to do is define their type. As we can see in the diagram in step 1, applying the schema architecture depends on the data’s purpose.

After that, the tables are designed. Deciding how to define data types can vary, depending project or purpose, but deciding what values a column can receive is important. For instance, the officeRoom on Teacher table can be an Integer type if we know the room’s identification is always numeric, or a String type if it is unsure how identifications are made (for example, Room 3-D).

Another important topic covered in step 3 is how to define which of the columns can accept NULL fields. Can a field for a student’s name be empty? If not, we need to create a constraint to forbid this type of insert.

Finally, based on the type of schema, a definition of the relationship between the tables is made.

See also

If you want to know more about database schema designs and their application, read this article by Mark Smallcombe: https://www.integrate.io/blog/database-schema-examples/.

Previous PageNext Page
You have been reading a chapter from
Data Ingestion with Python Cookbook
Published in: May 2023Publisher: PacktISBN-13: 9781837632602
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Gláucia Esppenchutz

Gláucia Esppenchutz is a data engineer with expertise in managing data pipelines and vast amounts of data using cloud and on-premises technologies. She worked in companies such as Globo, BMW Group, and Cloudera. Currently, she works at AiFi, specializing in the field of data operations for autonomous systems. She comes from the biomedical field and shifted her career ten years ago to chase the dream of working closely with technology and data. She is in constant contact with the open source community, mentoring people and helping to manage projects, and has collaborated with the Apache, PyLadies group, FreeCodeCamp, Udacity, and MentorColor communities.
Read more about Gláucia Esppenchutz