You're reading from Database Design and Modeling with Google Cloud

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781804611456

Edition1st Edition

Concepts

Databases

Author (1)

Abirami Sukumaran

Data to AI – Modeling Your Databases for Analytics and ML

Businesses rely on analytics to gain valuable insights and make informed decisions. Cloud databases have emerged as a powerful platform for storing and analyzing large volumes of data. To leverage the full potential of cloud databases for analytics, effective data modeling is crucial. It involves a deep focus on the needs of data analysts, business intelligence, and data operations teams. It involves designing a database that is optimized for data analysis, such as querying, reporting, and visualization, and by extension for advanced analytics, such as machine learning (ML) and artificial intelligence (AI). In this chapter, we’ll explore some key considerations and best practices for data modeling for analytics, ML and AI in cloud databases.

In this chapter, we’ll cover the following topics:

Modeling considerations for analytics, AI, and ML
Data to AI
Google Cloud ETL services
Google...

Modeling considerations for analytics, AI, and ML

As with relational transactional applications, analytics applications require data to be modeled, stored, and accessed to address the application’s design aspects. While the business, functional, technical, and regulatory requirements vary for each application, there are some fundamental operational and design needs that are generally considered the baseline for all analytical data modeling. We’ll look at a few of them in this section:

Understand the analytical requirements: Before diving into data modeling, it’s important to have a clear understanding of your analytical requirements. Define the specific questions you want to answer or the insights you want to derive from your data. This understanding will guide your data modeling efforts and help you design a database structure that aligns with your analytical goals.
Denormalize your data: Normalization is a widely adopted practice in traditional database...

Data to AI

This section is a perspective on data modeling for journeying from data to AI through several stages, including ingestion to storage, integration, transformation, and archival considerations:

Data ingestion: Data ingestion is the process of acquiring and importing data from various sources into an analytics database or data warehouse. When designing the data model for ingestion, consider the frequency and volume of data updates, data formats, and data integration requirements. Choose appropriate ingestion mechanisms such as batch processing, real-time streaming, or event-based ingestion based on the timeliness and velocity of your data. Ensure data validation and cleansing mechanisms are in place to maintain data quality during ingestion.
Storage: Choosing the right storage infrastructure is crucial for efficiently managing and accessing large volumes of data in AI workflows. Cloud object storage and database services such as Google Cloud Storage and BigQuery...

Google Cloud ETL services

Google Cloud offers a comprehensive set of services to support ETL workflows, enabling organizations to efficiently process and transform data at scale. These services provide integration, scalability, and managed infrastructure for performing ETL tasks in the cloud. Here are some Google Cloud ETL services:

Google Cloud Dataflow: Google Cloud Dataflow is a fully managed service for executing parallel data processing pipelines. It enables developers to build and execute batch or streaming ETL jobs using a unified programming model. Dataflow provides automatic scaling, fault tolerance, and data parallelism, allowing efficient processing of large datasets. It integrates with other Google Cloud services, such as BigQuery, Cloud Storage, and Pub/Sub, making it an ideal choice for ETL workflows.
Google Cloud Dataproc: Google Cloud Dataproc is a managed Apache Hadoop and Apache Spark service. It offers a scalable and cost-effective environment for processing...

Google Cloud Dataflow at a glance

Google Cloud Dataflow is a powerful and fully managed service for executing ETL pipelines. It allows developers to focus on data processing logic without worrying about infrastructure management. Dataflow offers a unified programming model based on Apache Beam, enabling consistent ETL development across batch and streaming data processing scenarios.

The key features of Google Cloud Dataflow are as follows:

Scalability: Dataflow automatically scales resources based on the input data size and processing requirements. It can handle data processing tasks ranging from small to petabyte-scale datasets, ensuring efficient ETL operations without the need for manual resource provisioning.
Fault tolerance: Dataflow ensures fault tolerance by automatically recovering from failures and providing reliable data processing. It divides the input data into small, parallelizable chunks and distributes them across multiple compute resources. In case of...

Taking your data to AI

Now that we have taken our data on a journey through a sample ETL pipeline, let’s take it through one last step, which is to perform ML on the data output from the previous step, that is, tokenized words and their counts.

In this section, we will create a model to identify the context from the given list of words using word2vec and cosine similarity techniques. We will use the top 1,000 frequently occurring words (from the output of the previous step) to predict the context of the tokenized words generated from the pipeline we created in the previous section.

In this exercise, we will take the data we have generated through the pipeline as input data to the context prediction application we will build in Python. Don’t worry, I have kept the code simple to understand and very minimal, so we don’t spend hours explaining the steps and the code. Open a new Colab Notebook from https://colab.research.google.com/. Enter the code snippets in...

Summary

Building upon the foundation of previous chapters, where we explored storage solutions for transactions and analytics, this chapter takes a deeper dive into data modeling considerations related to ETL processes and advanced analytics. Through the lens of real-world use cases, we examine how data modeling plays a crucial role in ensuring efficient ETL operations. Furthermore, we highlight the utilization of Google Cloud services as a means to address these considerations effectively with hands-on implementation.

At this point, having covered almost all the foundational aspects of designing for applications driven by data and databases of different types and structures, I look forward to engaging you with the most popular topic of discussion – generative AI. In particular, I would like to discuss the basics of generative AI and deep-dive into the world of Large Language Models (LLMs), covering the basics, design practices, and a hands-on implementation for extending...

The rest of the chapter is locked

You have been reading a chapter from

Database Design and Modeling with Google Cloud

Published in: Dec 2023Publisher: PacktISBN-13: 9781804611456

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages