Reader small image

You're reading from  Database Design and Modeling with Google Cloud

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781804611456
Edition1st Edition
Concepts
Right arrow
Author (1)
Abirami Sukumaran
Abirami Sukumaran
author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran

Right arrow

Designing for Semi-Structured Data

So far, we have discussed relational databases and analytics design for structured data. Now, we will look at semi-structured data and design considerations for it by looking at a hands-on example. In this chapter, you will learn about the fundamentals of semi-structured data with examples and real-world use cases, the characteristics of semi-structured data, design considerations, and the components of a document database. We will also explore setting up and configuring a serverless document database (Firestore), creating indexes, and querying your data with APIs.

In this chapter, we’ll cover the following topics:

  • Semi-structured data
  • NoSQL for semi-structured data
  • Firestore and its features
  • Firestore setup
  • Security
  • Client libraries and APIs
  • Indexing
  • Data model considerations
  • Easy querying with RunQuery API

Semi-structured data

Semi-structured data does not follow the typical row–column table format or conform to a rigid schema structure; therefore, it cannot fit completely in the relational databases category. However, it is not entirely unstructured either. Semi-structured data is somewhere in between these two categories and is characterized by the following features:

  • Schema flexibility: It is adaptable to changes and accepts structure in the form of hierarchical data format and keys.
  • Nestedness and hierarchy: Semi-structured data often has a hierarchical or a nested structure and its data elements are often grouped within other elements, forming a tree-like structure. This results in representations such as eXtensible Markup Language (XML) or JavaScript Object Notation (JSON).
  • Without a fixed datatype: Semi-structured data can have elements of varying data types within the same dataset. For example, a JSON document can contain strings, Booleans, numbers, and...

NoSQL for semi-structured data

NoSQL is a type of database management system that is designed to cater to the needs of semi-structured and unstructured forms of data modeled in formats other than the tabular one supported by relational databases. NoSQL databases are mainly used to manage big data and real-time web applications. You can think of NoSQL as a system or language that not only supports SQL but also unstructured and semi-structured queries and data. The objective of NoSQL databases can be simply put as follows:

  • Robust and simple design
  • Flexible in terms of format and source
  • Object-oriented application—database dependency limitation
  • Better control over availability
  • Partition tolerance

NoSQL databases adhere to the CAP theorem, which guarantees two of the following three principles for any distributed data store:

  • Consistency: This is different from the consistency addressed by the atomicity, consistency, isolation, and durability...

Firestore and its features

Firestore is a serverless document database that scales easily and flexibly to meet any growing demand with no maintenance. In the Firestore database, data is stored as documents that are organized into collections. Documents contain subcollections, nested objects, and complex objects such as lists. If you don’t start by creating a collection or document for your use case, Firestore automatically creates it as needed. The following are some key features that stand out to me in using Firestore as a document database:

  • Firestore, being fully managed and serverless, allows developers to focus fully on application development and effortlessly scale up and down without any maintenance or downtime.
  • Since it supports live sync and offline mode, it is ideal for mobile and web applications in several real-time use cases and remote low-accessibility situations.
  • The powerful query engine allows you to run ACID transactions on document data.
  • ...

Setting up Firestore

If you are new to Google Cloud, first of all, go to Google Cloud Console (https://console.cloud.google.com/), select your organization, and create a new Google Cloud project with billing enabled.

You can follow the instructions here:

https://cloud.google.com/resource-manager/docs/creating-managing-projects

All the following steps can be done with Command Shell commands or in Google Cloud Console:

  1. In the Google Cloud Console, search for Firestore in the search bar.
  2. From the Firestore Viewer page, in the Select a cloud Firestore mode screen, select Firestore in Native mode: https://console.cloud.google.com/firestore/.
  3. Select a location for your Firestore data. Remember that this choice is permanent.
  4. Click on Create database.
  5. When you create the Firestore project, the Firestore API is enabled.
  6. Once you set this up, you should be able to see the database, collection, and document view where you can add the collection, document,...

Security

After we have configured the database and set up sample collections and objects, it is important to provision security rules for your data to control access. Firestore offers two types of security, authentication, and access control methods depending on your choice of client libraries:

  • Mobile and web client libraries: These are Firebase authentication and security rules that perform serverless authentication, authorization, and data validation
  • Server client libraries: Identity and access management (IAM) is a method of access control for your database

You should be able to create, edit, and monitor security rules easily from the Firebase interface. Follow this link to get started: https://cloud.google.com/firestore/docs/security/get-started.

You can read more about it in this documentation: https://cloud.google.com/firestore/docs/security/overview.

Remember to always test and monitor your security rules before deploying or rolling out your application...

Client libraries and APIs

Firestore supports several mobile and web SDKs, server client libraries, Admin SDKs, Google Cloud client libraries, REST APIs, and other third-party library integrations.

Reference their documentation for finding your language or platform-specific sample and library in the documentation:

https://cloud.google.com/firestore/docs/reference/libraries

If you’d like to try out a sample web application that I built using Firestore REST API on a Java Spring Boot framework, check out this project:

https://github.com/AbiramiSukumaran/firestore-project

Indexing

A database index helps connect items to their locations. It is used to improve the speed of search in queries. If the index does not exist, databases typically search for items one by one. But Firestore supports high-performance queries by indexing all your queries, which means the following:

  • Indexes for your basic queries are all automatically created for you
  • Query performance is dependent on query results and not on the record volume in the database
  • There are the two types of indexes:
    • A single-field index is an ordered mapping of all the documents in a collection consisting of a specific field
    • A composite index is also an ordered mapping of documents, but it is based on an ordered list of fields to index (basically field-combinations as opposed to one specific field)
  • A collection group query refers to the hierarchy of collections, documents, subcollections, and so on, and querying such collection groups is possible through collection group indexes
...

Data model considerations

When it comes to the data model design choice, we always struggle to choose between a hierarchical (collection group) format and a format closer to a denormalized form (top-level collection) of storing data. Before we get into that, let’s talk about the hierarchical and denormalized formats!

Hierarchical format

The hierarchical format represents information in a structured, nested format. This format is ideal for scenarios where data has clear parent–child relationships or when you want to maintain a well-defined structure. The following is an example of hierarchical data using JSON to represent an organizational hierarchy:

{
  "organization": {
    "name": "XYZ Corporation",
    "departments": [
      {
        "name": "HR",
     ...

Easy querying with RunQuery API

Firestore offers a straightforward way to query your data using a REST API-based mechanism called the RunQuery API. This API allows you to retrieve specific data from your Firestore database using HTTP requests.

API endpoint and method

To use the RunQuery API, you make a POST request to the following endpoint:

https://firestore.googleapis.com/v1/{parent=projects/*/databases/*/documents}:runQuery

The parent parameter

In the API request, you’ll need to provide the parent parameter with a value in this format:

projects/{project_id}/databases/{databaseId}/documents

Replace {project_id} and {databaseId} with your specific project and database identifiers.

JSON body format

The request body should be in JSON format and include the structuredQuery object, which defines the specifics of your query. The structuredQuery object contains various query parameters, such as select, from, where, orderBy, startAt, endAt, offset, and limit...

Implementing RunQuery API programmatically

Now, let’s invoke this Firestore RunQuery API programmatically by implementing this in a standalone application that is deployed serverlessly on Google Cloud. After reading Chapter 4, Setting Up a Fully Managed RDBMS, you should be familiar with the steps involved in setting up Cloud Functions. If not, go to the Create an application with the cloud database section in Chapter 4, Setting Up a Fully Managed RDBMS and follow the instructions to create a Java Cloud Functions application.

Replace the pom.xml dependencies section with the following dependencies to include the libraries required for this implementation:

<dependency>
     <groupId>com.google.cloud.functions</groupId>
     <artifactId>functions-framework-api</artifactId>
     <version>1.0.4</version>
   </dependency>
<dependency>...

Summary

In this chapter, we explored semi-structured data and its fundamental aspects, real-world use cases, and design considerations. Semi-structured data, characterized by its adaptability and hierarchical nature, bridges the gap between structured and unstructured data, making it an invaluable resource in today’s data-driven landscape.

We then talked about NoSQL databases, which are designed to handle semi-structured and unstructured data efficiently. These databases prioritize robust design, flexibility, and high availability. We explored the principles of consistency, availability, and partition tolerance, which are crucial for distributed data stores.

Firestore, a serverless document database, stood out as a prime example of a NoSQL database. Its features, including scalability, strong consistency, and multi-region replication, make it a top choice for various applications. The chapter also shed light on Firestore’s data model, indexing techniques, and collection...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Database Design and Modeling with Google Cloud
Published in: Dec 2023Publisher: PacktISBN-13: 9781804611456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran