Reader small image

You're reading from  Database Design and Modeling with Google Cloud

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781804611456
Edition1st Edition
Concepts
Right arrow
Author (1)
Abirami Sukumaran
Abirami Sukumaran
author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran

Right arrow

Unstructured Data Management

Unstructured data is said to constitute a huge portion of data generated today. Some industry studies say at least 85% of data available today is unstructured. So, what is this unstructured data? Data with no predefined external structure in the form of a schema or table (rows and columns), object structure (JSON, XML, and so on), or data model is termed unstructured data, even though such data can have an internal structure. It can be generated by machines, humans, or applications.

Some common examples and sources of unstructured data are images, audio, video, files, and rich media. The fact that it is not constrained by fixed schema and is flexible for analyzing and drawing insights from raw data as-is is one of the major advantages of this type of data management. However, industries and organizations are not able to take full advantage of data of this kind for many reasons:

  • The complexity involved in terms of data storage, management, processing...

Use cases

Unstructured data is generated in the real world in the form of social media posts, tweets, IoT data, camera feeds, movies, music, AI-generated data, and more. Organizations and businesses can use this kind of data to solve problems transactionally and also to derive analytical insights. Some transactional applications use unstructured data for the following reasons:

  • Verify biometric information such as facial recognition for checked-in passengers in airports
  • Process music and video files to identify songs, videos, and faces
  • Image classification and identification
  • Video and audio suggestions and personalization
  • Behavior-based usage prediction
  • Behavior-based fraud detection
  • Posture detection and tracking in sports and fitness applications
  • Movement and symptoms-based health alert generation and tracking

Analytical applications for unstructured data are unlimited as organizations have the potential to scale, evolve, and transform their...

Storage options in Google Cloud

In Google Cloud, there are several options for unstructured data storage, depending on your requirements, format, and purpose of application or storage. Let’s look at a few:

  • Cloud Storage: Cloud Storage is a fully managed object storage service in Google Cloud that allows you to store any type, duration, and volume of data. It is mainly used in service use cases such as streaming videos, images for web applications, and data storage in data lakes.
  • Filestore: It is a fully managed high-performance service for file storage that supports high-performance scalability, high availability, backup, and security.
  • Block Storage: It is a fully managed, high-performance persistent disk for virtual machines. High scalability, pay-per-use, high flexibility, and high performance are some of the key features of Block Storage.
  • Storage Transfer Service: To transfer data across multiple services and service providers quickly and securely, Google...

Unstructured data storage with BigQuery

BigQuery supports unstructured data storage and management using object tables. The exciting part for me about this is that you can store unstructured data such as relational data and reference it in rows and columns with structured queries.

External sources

External sources such as Cloud Storage house unstructured data while the data is accessed in BigQuery with metadata fields and references to the unstructured objects. BigQuery uses object tables to achieve this. Object tables are read-only tables over unstructured data that you have stored in Cloud Storage. These tables allow you to analyze the unstructured data just like you would do with regular structured data. You can perform analytics and ML, use other ML models on this data, and join the results with structured data in BigQuery. This helps you improve the accuracy of your model, gain deep insights, and make informed decisions based on the combination of structured and unstructured...

Unstructured data analytics with BigQuery

In this section, we will put the theory of storing unstructured data in BigQuery and querying it into action. If you are wondering why we are storing unstructured data in BigQuery, refer to the Storage options in BigQuery section:

  1. Go to Cloud Storage from the Google Cloud console and select the bucket we created previously (bucket-demo-gc).
  2. Click Upload Files under the Objects section and select your files.
  3. Once the upload is complete, you should be able to see the files in the bucket, as shown here:
Figure 7.10: The Cloud Storage Bucket page with its objects listed

Figure 7.10: The Cloud Storage Bucket page with its objects listed

  1. Head over to the BigQuery console by searching for it in the Google Cloud console and enable the APIs as required (it is pretty self-explanatory with the console prompts).
  2. Create a dataset by clicking the three dots next to your project name and click Create data set.
  3. Provide your Data set ID, Location type, and other...

Summary

Unstructured data has most certainly found its way through the heart and soul of every business these days. To take advantage of the data that is available to your business, irrespective of its size, format, structure, and source, you should ensure you have made the right design decisions for the storage service of your choice and the application that leverages it for your business. Keep in mind that the role of technology and platform providers when handling unstructured data should always be as follows:

  • To enable businesses to get the most value out of their data by leveraging sources of all formats and types
  • To manage and operate these services with efficient and automatic scaling, monitoring, encryption, security, and more, lifting the burden off of the shoulders of businesses
  • To smoothly integrate across platforms, programming languages, and services, providing the business with diverse options to choose from while it’s designing its applications...
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Database Design and Modeling with Google Cloud
Published in: Dec 2023Publisher: PacktISBN-13: 9781804611456
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Abirami Sukumaran

Abirami Sukumaran is a lead developer advocate at Google, focusing on databases and data to AI journey with Google Cloud. She has over 17 years of experience in data management, data governance, and analytics across several industries in various roles from engineering to leadership, and has 3 patents filed in the data area. She believes in driving social and business impact with technology. She is also an international keynote, tech panel, and motivational speaker, including key events like Google I/O, Cloud NEXT, MLDS, GDS, Huddle Global, India Startup Festival, Women Developers Academy, and so on. She founded Code Vipassana, an award-winning, non-profit, tech-enablement program powered by Google and she runs with the support of Google Developer Communities GDG Cloud Kochi, Chennai, Mumbai, and a few developer leads. She is pursuing her doctoral research in business administration with artificial intelligence, is a certified Yoga instructor, practitioner, and an Indian above everything else.
Read more about Abirami Sukumaran