You're reading from Database Design and Modeling with Google Cloud

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781804611456

Edition1st Edition

Concepts

Databases

Author (1)

Abirami Sukumaran

Designing for Semi-Structured Data

So far, we have discussed relational databases and analytics design for structured data. Now, we will look at semi-structured data and design considerations for it by looking at a hands-on example. In this chapter, you will learn about the fundamentals of semi-structured data with examples and real-world use cases, the characteristics of semi-structured data, design considerations, and the components of a document database. We will also explore setting up and configuring a serverless document database (Firestore), creating indexes, and querying your data with APIs.

In this chapter, we’ll cover the following topics:

Semi-structured data
NoSQL for semi-structured data
Firestore and its features
Firestore setup
Security
Client libraries and APIs
Indexing
Data model considerations
Easy querying with RunQuery API

Semi-structured data

Semi-structured data does not follow the typical row–column table format or conform to a rigid schema structure; therefore, it cannot fit completely in the relational databases category. However, it is not entirely unstructured either. Semi-structured data is somewhere in between these two categories and is characterized by the following features:

Schema flexibility: It is adaptable to changes and accepts structure in the form of hierarchical data format and keys.
Nestedness and hierarchy: Semi-structured data often has a hierarchical or a nested structure and its data elements are often grouped within other elements, forming a tree-like structure. This results in representations such as eXtensible Markup Language (XML) or JavaScript Object Notation (JSON).
Without a fixed datatype: Semi-structured data can have elements of varying data types within the same dataset. For example, a JSON document can contain strings, Booleans, numbers, and...

NoSQL for semi-structured data

NoSQL is a type of database management system that is designed to cater to the needs of semi-structured and unstructured forms of data modeled in formats other than the tabular one supported by relational databases. NoSQL databases are mainly used to manage big data and real-time web applications. You can think of NoSQL as a system or language that not only supports SQL but also unstructured and semi-structured queries and data. The objective of NoSQL databases can be simply put as follows:

Robust and simple design
Flexible in terms of format and source
Object-oriented application—database dependency limitation
Better control over availability
Partition tolerance

NoSQL databases adhere to the CAP theorem, which guarantees two of the following three principles for any distributed data store:

Consistency: This is different from the consistency addressed by the atomicity, consistency, isolation, and durability...

Firestore and its features

Firestore is a serverless document database that scales easily and flexibly to meet any growing demand with no maintenance. In the Firestore database, data is stored as documents that are organized into collections. Documents contain subcollections, nested objects, and complex objects such as lists. If you don’t start by creating a collection or document for your use case, Firestore automatically creates it as needed. The following are some key features that stand out to me in using Firestore as a document database:

Firestore, being fully managed and serverless, allows developers to focus fully on application development and effortlessly scale up and down without any maintenance or downtime.
Since it supports live sync and offline mode, it is ideal for mobile and web applications in several real-time use cases and remote low-accessibility situations.
The powerful query engine allows you to run ACID transactions on document data.

Setting up Firestore

If you are new to Google Cloud, first of all, go to Google Cloud Console (https://console.cloud.google.com/), select your organization, and create a new Google Cloud project with billing enabled.

You can follow the instructions here:

https://cloud.google.com/resource-manager/docs/creating-managing-projects

All the following steps can be done with Command Shell commands or in Google Cloud Console:

In the Google Cloud Console, search for Firestore in the search bar.
From the Firestore Viewer page, in the Select a cloud Firestore mode screen, select Firestore in Native mode: https://console.cloud.google.com/firestore/.
Select a location for your Firestore data. Remember that this choice is permanent.
Click on Create database.
When you create the Firestore project, the Firestore API is enabled.
Once you set this up, you should be able to see the database, collection, and document view where you can add the collection, document,...

Security

After we have configured the database and set up sample collections and objects, it is important to provision security rules for your data to control access. Firestore offers two types of security, authentication, and access control methods depending on your choice of client libraries:

Mobile and web client libraries: These are Firebase authentication and security rules that perform serverless authentication, authorization, and data validation
Server client libraries: Identity and access management (IAM) is a method of access control for your database

You should be able to create, edit, and monitor security rules easily from the Firebase interface. Follow this link to get started: https://cloud.google.com/firestore/docs/security/get-started.

You can read more about it in this documentation: https://cloud.google.com/firestore/docs/security/overview.

Remember to always test and monitor your security rules before deploying or rolling out your application...

Client libraries and APIs

Firestore supports several mobile and web SDKs, server client libraries, Admin SDKs, Google Cloud client libraries, REST APIs, and other third-party library integrations.

Reference their documentation for finding your language or platform-specific sample and library in the documentation:

https://cloud.google.com/firestore/docs/reference/libraries

If you’d like to try out a sample web application that I built using Firestore REST API on a Java Spring Boot framework, check out this project:

https://github.com/AbiramiSukumaran/firestore-project

Indexing

A database index helps connect items to their locations. It is used to improve the speed of search in queries. If the index does not exist, databases typically search for items one by one. But Firestore supports high-performance queries by indexing all your queries, which means the following:

Indexes for your basic queries are all automatically created for you
Query performance is dependent on query results and not on the record volume in the database
There are the two types of indexes:
- A single-field index is an ordered mapping of all the documents in a collection consisting of a specific field
- A composite index is also an ordered mapping of documents, but it is based on an ordered list of fields to index (basically field-combinations as opposed to one specific field)
A collection group query refers to the hierarchy of collections, documents, subcollections, and so on, and querying such collection groups is possible through collection group indexes

...

Data model considerations

When it comes to the data model design choice, we always struggle to choose between a hierarchical (collection group) format and a format closer to a denormalized form (top-level collection) of storing data. Before we get into that, let’s talk about the hierarchical and denormalized formats!

Hierarchical format

The hierarchical format represents information in a structured, nested format. This format is ideal for scenarios where data has clear parent–child relationships or when you want to maintain a well-defined structure. The following is an example of hierarchical data using JSON to represent an organizational hierarchy:

{
  "organization": {
    "name": "XYZ Corporation",
    "departments": [
      {
        "name": "HR",
     ...

Easy querying with RunQuery API

Firestore offers a straightforward way to query your data using a REST API-based mechanism called the RunQuery API. This API allows you to retrieve specific data from your Firestore database using HTTP requests.

API endpoint and method

To use the RunQuery API, you make a POST request to the following endpoint:

https://firestore.googleapis.com/v1/{parent=projects/*/databases/*/documents}:runQuery

The parent parameter

In the API request, you’ll need to provide the parent parameter with a value in this format:

projects/{project_id}/databases/{databaseId}/documents

Replace {project_id} and {databaseId} with your specific project and database identifiers.

JSON body format

The request body should be in JSON format and include the structuredQuery object, which defines the specifics of your query. The structuredQuery object contains various query parameters, such as select, from, where, orderBy, startAt, endAt, offset, and limit...

Implementing RunQuery API programmatically

Now, let’s invoke this Firestore RunQuery API programmatically by implementing this in a standalone application that is deployed serverlessly on Google Cloud. After reading Chapter 4, Setting Up a Fully Managed RDBMS, you should be familiar with the steps involved in setting up Cloud Functions. If not, go to the Create an application with the cloud database section in Chapter 4, Setting Up a Fully Managed RDBMS and follow the instructions to create a Java Cloud Functions application.

Replace the pom.xml dependencies section with the following dependencies to include the libraries required for this implementation:

<dependency>
     <groupId>com.google.cloud.functions</groupId>
     <artifactId>functions-framework-api</artifactId>
     <version>1.0.4</version>
   </dependency>
<dependency>...

Summary

In this chapter, we explored semi-structured data and its fundamental aspects, real-world use cases, and design considerations. Semi-structured data, characterized by its adaptability and hierarchical nature, bridges the gap between structured and unstructured data, making it an invaluable resource in today’s data-driven landscape.

We then talked about NoSQL databases, which are designed to handle semi-structured and unstructured data efficiently. These databases prioritize robust design, flexibility, and high availability. We explored the principles of consistency, availability, and partition tolerance, which are crucial for distributed data stores.

Firestore, a serverless document database, stood out as a prime example of a NoSQL database. Its features, including scalability, strong consistency, and multi-region replication, make it a top choice for various applications. The chapter also shed light on Firestore’s data model, indexing techniques, and collection...

The rest of the chapter is locked

You have been reading a chapter from

Database Design and Modeling with Google Cloud

Published in: Dec 2023Publisher: PacktISBN-13: 9781804611456

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages