You're reading from Learning Google BigQuery

Product typeBook

Published inDec 2017

Reading LevelBeginner

PublisherPackt

ISBN-139781787288591

Edition1st Edition

Languages

Python

Concepts

Database Administration

Authors (3):

Thirukkumaran Haridass

Mikhail Berlyant

Eric Brown

View More author details

BigQuery SQL Advanced

This chapter explores advanced options in BigQuery SQL. It explains partition tables in BigQuery and how to query data from partition tables. Sharding of tables is explained as an option to store data across multiple tables to save on billing. Built-in functions are explained for various categories such as datetime, strings, numbers, and so on.

Partition tables

Partition tables are special tables that store data at a daily level in separate internal tables. This helps to improve the query performance and also reduces billing by querying data using a specified date range. The following steps outline how to create the partition table for your projects using a GUI and Google Cloud SDK.

Creating a partition table using a GUI

Download the sample file from this URL and upload it to a Google Cloud Storage bucket: https://github.com/hthirukkumaran/Learning-Google-BigQuery/blob/master/chapter1/employeedetails.csv. And note down the bucket name.

Click on the Create new table option under the Dataset menu.
To create a partition table, enable the partition option by choosing...

Querying external data sources using BigQuery

BigQuery supports querying and joining of data from external data sources on the Google Cloud Platform. The following are the data sources you can query from BigQuery:

Google Cloud Storage files
Google Bigtable database
Google Drive files

The following demo shows how to query a CSV file in the Google Cloud Storage bucket using the BigQuery engine. The performance of the queries against external data sources is not as good as BigQuery data, and so it should be used with caution. The following are the steps to integrate Google Cloud Storage bucket files for querying:

Create a table definition file for the file in the Google Cloud Storage bucket
Link the data source as a table in the BigQuery dataset
Query the table in the BigQuery dataset

...

Wildcard tables

Wildcard is a way of performing a union on tables whose names are similar and have compatible schemas. The following queries show how to perform wildcard operations on tables in the public dataset bigquery-public-data:new_york provided by Google.

The following query gets the number of trips per year made by a yellow taxi in New York. The query uses UNION ALL on all tables that start with the name tlc_yellow_trips_. If a new table is added for 2017, this query has to be modified to include that table as well. To automatically include tables having similar names in the query, wildcard table syntax can be used. This query uses standard SQL:

#standardSQL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear, count(1) as TripCount FROM `bigquery-public-data.new_york.tlc_yellow_trips_2009`
UNION ALL
SELECT MAX(EXTRACT(YEAR from pickup_datetime)) as TripYear...

User-defined functions

User-defined functions can be written in JavaScript or SQL in BigQuery. These functions can be called in queries to obtain results. The following are the supported datatypes that can be passed to and returned by the functions:

ARRAY
BOOL
BYTES
DATE
FLOAT64
STRING
STRUCT
TIMESTAMP

The following is a simple function written in JavaScript to return the sum of two numbers, and it is used in the query. This query passes the tip_amount and tolls_amount values for each row from the table to the function and gets the sum:

#standardSQL
CREATE TEMPORARY FUNCTION GetOtherCharges(tipamount FLOAT64, tollsamount FLOAT64)
RETURNS INT64
LANGUAGE js AS """
  return tipamount + tollsamount;
""";

SELECT vendor_id, GetOtherCharges( tip_amount, tolls_amount )
FROM `bigquery-public-data.new_york.tlc_green_trips_2013`

Custom external JavaScript libraries...

Views

BigQuery supports creating views, but they are not materialized views and the underlying query for a view is executed each time someone runs a query on the view. A view can be defined using legacy SQL or standard SQL, but the limitation is that if a view is defined in legacy SQL, then the queries executed using that view must also be in legacy SQL. The same applies to views that are defined using standard SQL; they can be used only in standard SQL statements. User-defined functions cannot be used in the query to define the views.

The BigQuery web console provides an option to save a query as a view, as shown in the following screenshot. Click on the Save View button as shown in this screenshot and choose the dataset under which the view has to be saved; provide a view name and save it:

To change the view definition, navigate to the view in the BigQuery web console and open...

Querying nested and repeated records

Google BigQuery supports loading of JSON files into BigQuery tables. JSON format data can contain nested datatypes and repeated datatypes. The example table shown in the following screenshot has an Employee_Names column as RECORD datatype. Each record in that column has two columns, one to store the first name and one to store the last name. Create the table as shown in this screenshot:

Download the following file to load to this new table. The file is a JSON file that contains the records to be loaded into this table from JSON format: https://github.com/hthirukkumaran/Learning-Google-BigQuery/blob/master/chapter1/employeedetails.json.

Upload the file to your Google Cloud Storage bucket using the gsutil command as shown here:

gsutil cp employeedetails.json gs://myfirstprojectbucket201706/employeedetails.json

Run the following command...

Summary

This chapter covered the practical use of partition tables, wildcard tables, nested and repeated records, and views. This chapter also covered how to define and use user-defined functions in JavaScript and SQL. Then we covered how to connect to external data sources and query them using the BigQuery engine. More and more federated data sources will be added to this list and you will learn how to connect to Bigtable and Google Drive files using the documents provided in the further reading section.

This document outlines the steps to create authorized views in your projects: https://cloud.google.com/bigquery/docs/views
This document outlines how to specify nested and repeated fields when creating the tables: https://cloud.google.com/bigquery/docs/nested-repeated
An overview of partition tables and best practices are outlined here: https://cloud.google.com/bigquery/docs/partitioned-tables
This document outlines how to query wildcard tables using standard SQL: https://cloud.google.com/bigquery/docs/reference/standard-sql/wildcard-table-reference
How to query nested and repeated fields using legacy SQL: https://cloud.google.com/bigquery/docs/legacy-nested-repeated
How to migrate from legacy SQL to standard SQL: https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql
How to connect to external data sources: https://cloud.google...

The rest of the chapter is locked

You have been reading a chapter from

Learning Google BigQuery

Published in: Dec 2017Publisher: PacktISBN-13: 9781787288591

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Thirukkumaran Haridass

Thirukkumaran Haridass currently works as a lead software engineer at Builder Homesite Inc. in Austin, Texas, USA. He has over 15 years of experience in the IT industry. He has been working on the Google Cloud Platform for more than 3 years. Haridass is responsible for the big data initiatives in his organization that help the company and its customers realize the value of their data. He has played various roles in the IT industry and worked for Fortune 500 companies in various verticals, such as retail, e-commerce, banking, automotive, and presently, real estate online marketing.
Read more about Thirukkumaran Haridass

Mikhail Berlyant

Other recommended products

Related to this chapter

Google Cloud Platform for Architects

The Google Cloud Platform is fast emerging as a leading public cloud provider. The GCP, as it is popularly known, is backed by Google’s awe-inspiring engineering expertise and infrastructure and is able to draw upon the goodwill and respect that Google has come to enjoy. The GCP is one of a handful of public cloud providers to offer the full range of cloud computing services, ranging from IaaS (Infrastructure-as-a-Service) to PaaS (Platform-as-a-Service). There is another reason the GCP is fast gaining popularity; genre-defining technologies such as TensorFlow and Kubernetes originated at Google before being open-sourced, and the GCP is a natural choice of cloud on which to run them. If you are a cloud professional today, time spent on mastering the GCP is likely to be an excellent investment.

BookJun 2018372 pages

Machine Learning with BigQuery ML

This book helps you accelerate machine learning model development with BigQuery ML. Throughout the book, you'll use various ML models to learn about BigQuery ML features and discover how to apply them to different business scenarios. This book will help you to extend existing SQL capabilities to leverage the full potential of machine learning.

BookJun 2021344 pages

Google Cloud Platform for Developers

The Google Cloud Platform provides auto-scaling compute power and distributed in-memory cache, task queues and data stores to write, build, and deploy Cloud-hosted applications. This book will help you to learn how to integrate the various services to build optimal solutions for your unique business needs using Google Cloud Platform.

BookJul 2018506 pages

Google Cloud Platform Cookbook

Google Cloud Platform is a cloud computing service that offers hosting on the same supporting technology internally used by Google for its end users. This book follows a recipe-based approach, giving you hands-on experience to make the most out of Google Cloud services.

BookApr 2018280 pages

Cloud Analytics with Google Cloud Platform

This book will deep-dive into the concept of analytics on the cloud with the design and business considerations. You will build an end-to-end analytics engine to perform smart analytics using machine learning and deep learning concepts. From ingestion to processing your data, this book contains the best practices using Google Cloud Platform.

BookApr 2018282 pages

Google Cloud Platform Administration

With Google Cloud Platform, you can build, test and deploy applications on Google’s highly reliable and scalable infrastructure. This book will help you explore a list of different developer tools available to manage and interact with the GCP platform.

BookSep 2018230 pages

Professional Cloud Architect – Google Cloud Certification Guide

This book will help you prepare for Google’s popular Professional Cloud Architect certification from the ground up. You will learn the necessary skills to design, develop, and manage enterprise-grade cloud solutions, and thereby achieve your organization’s business objectives.

BookOct 2019520 pages

Hands-On Serverless Computing

Serverless applications and architectures are gaining momentum and are increasingly being used by companies of all sizes to solve the problems of developers. This book teaches you how to quickly and securely develop applications without the hassle of configuring and maintaining infrastructure on three public cloud platforms.

BookJul 2018350 pages

Hands-On Artificial Intelligence on Google Cloud Platform

This book focuses on the use of powerful AI tools offered by Google Cloud Platform to develop and design intelligent applications on the cloud. You will start with topics that set the foundation for using GCP with various powerful libraries, and then move on to building end to end AI applications using them.

BookMar 2020350 pages

Hands-On Machine Learning on Google Cloud Platform

In this book, you will learn how to create powerful machine learning based applications for a wide variety of problems leveraging different data services from the Google Cloud Platform. Finally, you will know the main difficulties that you may encounter and get appropriate strategies to overcome these difficulties and build efficient systems.

BookApr 2018500 pages

Architecting Google Cloud Solutions

Google Cloud is a powerful and highly scalable cloud platform that has seen rising demand and interest from enterprises seeking digital transformation and looking to modernize their workloads. This book is a comprehensive introduction to solution architecture with Google Cloud that will have you up to speed in no time.

BookApr 2021472 pages

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages