Reader small image

You're reading from  Database Design and Modeling with Google Cloud

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781804611456
Edition1st Edition
Concepts
Right arrow
Author (1)
Abirami Sukumaran
Abirami Sukumaran
author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran

Right arrow

Business aspect

Business requirements are the starting point for your application and also for choosing your database system. There are four stages in the life cycle of data in its business application that help determine the choice of database system:

  • Data ingestion
  • Storage
  • Process
  • Visualize

The following diagram represents the attributes in the four stages of data and the categories of questions in each stage in the life cycle of your data:

Figure 1.1 – Representation of the four stages of data and the categories of questions in each stage

Figure 1.1 – Representation of the four stages of data and the categories of questions in each stage

Let’s look at some of these attributes in detail. Some of them are in the business attributes category, while others are technical.

Ingestion

This is the first stage in the data life cycle and it is all about acquiring (bringing in) data from different sources in one place into your system. In this stage, the questions that arise are bucketed into three categories:

  • What type of data are you bringing in?
  • What is the purpose of this data?
  • What is the structure of your data?

Let’s take a look at each in detail.

Types of data

There are broadly three types of data we will be dealing with that highly influence the choice of database and storage.

Application data

This is the kind of data that is generated or downloaded as part of the application’s content and can contain transactional data that is generated by users and applications – for example, online retail applications, log data from applications, event data, and clickstream data. Let’s take a look at a specific example – consider a banking application in which user A transfers money from their account to user B’s account. In this case, the user data, such as the account ID, name, bank details, the recipient’s name, and transaction date, constitute the application data.

Live stream and real-time stream data

This data comes from real-time sources such as streaming data, which comes in continuously from data sources such as sensor data. These can also be event data responses and can be very frequent compared to batch data processing. It refers to data that is immediately available and not delayed by a system or process. The term real-time stream refers to streams of real-time data that are gathered and stored or processed as they come in. This includes monitoring data such as CPU utilization, memory consumption, Internet of Things (IoT) devices data such as humidity and pressure, and automated real-time environmental temperature monitoring data.

Batch data

This is data that comes in as bulk at scheduled intervals and could be event-triggered. For example, batch data is transactional data that comes in from applications after a transaction and is stored for use in later stages of the data life cycle. This can include data extracted from one application for use in another at a later point, data migration use cases, and file uploads for processing later. Such applications may not be designed for real-time operations on the data.

The purpose of data

The specific use case and the nature of implementing applications using the data being ingested is a critical factor in determining the choice and design of the database. There may be cases where the type and ingestion mode of data fall into a different choice of database design, whereas its functional use case would imply a different purpose. For example, you could have data streamed in from live events or housekeeping data coming in real-time from transactions, but the specific use case you are designing for might only involve visualization, analytical, or ML functionalities. So, make sure you understand what purpose you are solving with the data that is being ingested in a specific mode and type.

The structure of data

The structure of the data is a crucial factor in deciding the choice and design of a database. There are three widely recognized categories:

  • Structured
  • Semi-structured
  • Unstructured

Let’s briefly explore these three categories.

Structured data

This type of data is typically composed of rows and columns; rows are entities or records and columns are attributes. Structured data is organized in such a way that you can be sure that the data structure will be consistent for the most part throughout the life cycle of that data, except for the possible addition or removal of some attributes altogether. This kind of data is mostly transactional or analytical.

Semi-structured data

Semi-structured data does not follow a fixed tabular format – that is, a column-row structure. Instead, it stores schema attributes along with data. The attributes for semi-structured data could vary for each record. The major differentiating factor for each kind of semi-structured data is the way they are accessed.

Unstructured data

Unstructured data includes images, audio files, and so on. Unstructured data does not have a definite schema or data model. The amount of unstructured data is much larger than that of structured data. So, the methods by which we store such data are more important than ever. Here are some examples of unstructured data:

  • Text
  • Audio
  • Video
  • Images
  • Other binary large objects (BLOBs)

Now that we have had a sneak peek into the structure of data, be sure to include functional and design questions based on these categories while designing your database and application model.

Previous PageNext Page
You have been reading a chapter from
Database Design and Modeling with Google Cloud
Published in: Dec 2023Publisher: PacktISBN-13: 9781804611456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran