Reader small image

You're reading from  Learning Google BigQuery

Product typeBook
Published inDec 2017
Reading LevelBeginner
PublisherPackt
ISBN-139781787288591
Edition1st Edition
Languages
Right arrow
Authors (3):
Thirukkumaran Haridass
Thirukkumaran Haridass
author image
Thirukkumaran Haridass

Thirukkumaran Haridass currently works as a lead software engineer at Builder Homesite Inc. in Austin, Texas, USA. He has over 15 years of experience in the IT industry. He has been working on the Google Cloud Platform for more than 3 years. Haridass is responsible for the big data initiatives in his organization that help the company and its customers realize the value of their data. He has played various roles in the IT industry and worked for Fortune 500 companies in various verticals, such as retail, e-commerce, banking, automotive, and presently, real estate online marketing.
Read more about Thirukkumaran Haridass

Eric Brown
Eric Brown
author image
Eric Brown

Eric Brown currently works as an analytics manager for PMG advertising in Austin, Texas. Eric has over 11 years of experience in the data analytics field. He has been working on the Google Cloud Platform for over 3 years. He oversees client web analytics implementations and implements big data integrations in both Google BigQuery and Amazon Redshift. Eric has a passion for analytics, and especially for visualization and data manipulation through open source tools such as R. He has worked in various roles in various verticals, such as web analytics service providers, media companies, real-estate online marketing, and advertising.
Read more about Eric Brown

View More author details
Right arrow

Google Cloud SDK

Google Cloud Platform provides an SDK developed in Python to manage the resources in the Cloud. The framework is available for Windows, Linux, and macOS. Python 2.7 is a requisite for installing this SDK. The SDK provides command-line utilities to manage and interact with various services on Google Cloud.

The following are the three command-line utilities available in SDK:

  • gsutil: This is the command-line utility to interact with Google Cloud Storage
  • bq: This is the command-line utility to interact with Google BigQuery
  • gcloud: This is the command-line utility to interact with all other services on Google Cloud

Installing Google Cloud SDK

The installers are available for Windows, Linux, and macOS. Since Linux has various distributions, some manual command execution is needed for installing and configuring the Google Cloud SDK on Linux.

Installing Google Cloud SDK on Windows

Google Cloud SDK for Windows comes with a friendly installer and it also comes with an option to install Python which is a prerequisite to run the commands in Google Cloud SDK:

  1. Download the installer from the link provided: https://cloud.google.com/sdk/docs/quickstart-windows. The installer is a GUI-based utility which will install the requisites for the SDK, and the SDK with default configuration.
  1. In the installation wizard, choose the Bundled Python...

gsutil for Google Cloud Storage

gsutil provides options to manage files, folders, and buckets in Google Cloud Storage. The first step in moving your data to Google Cloud and Google BigQuery is to export the data and upload to Google Cloud Storage:

  • Manually via the browser if it is small
  • Automate it for basic scenarios using gsutil, which comes with Google Cloud SDK
  • The third option will be to use the Google Cloud Storage API to perform advanced automation

Before using the gsutil command, make sure that the project and credentials configured in the Google Cloud SDK are pointing to the project and account which you intend to use by typing the following command:

gcloud info

We will now look at the features available with gsutil:

  • To see the list of options provided by gsutil, type the following command:
gsutil help
  • The available commands section shows the command...

Using the bq utility for BigQuery

The bq command-line utility is used to interact with the Google BigQuery service on Google Cloud Platform:

  • Use the following command to check the version of the bq utility once the SDK is installed:
bq version
  • Type the following command to confirm which project the bq utility will use. The bq utility shares settings with the gcloud utility. If you wish to change the project, then run the command in the second line and choose the project you want to work on:
gcloud info
gcloud init
If an older version of Google Cloud SDK is installed on the machine, then run the bq init command to choose the default project for the bq utility to use. Use the bq help option to see the complete details of the command, its options, and its switches.
  • The first step in using BigQuery is to create a dataset in a project and then create tables...

Using the gcloud utility

The gcloud utility is used to interact with the rest of the services on the Google Cloud Platform, other than BigQuery and Google Cloud Storage. The commands in the gcloud utility are grouped for each service. The following are the service groups for some of the services on the Google Cloud Platform:

Service group

Google Cloud service

App

App Engine standard and flexible environment

Compute

Compute Engine to manage virtual machines

Container

Container Engine to manage containers and clusters

Dataflow

Manage Cloud Dataflow services for ETL and data processing

Dataproc

Manage Cloud Dataproc service which consists of Apache Hadoop, Spark, Pig, and Hive

Datastore

Manage Cloud Datastore service which creates entities on a NoSQL database

SQL

Manage Cloud SQL service which consists of MySQL or PostgreSQL databases...

Connecting to Cloud SQL using gcloud

There is more than one way to connect to a Cloud SQL instance from your machine, but the two most frequently used options are in the following list. The listed prerequisites for both the options require the MySQL client libraries to be installed on the local machine:

  • Adding the client machine IP address to authorized networks in Google Cloud Console
  • Installing a proxy script on the client machine and using that to connect to Google Cloud SQL

Authorizing the client machine via Google Cloud Console

Get the IP address of the client machine by opening the browser and navigating to this URL: http://ipv4.whatismyv6.com/. Note down the IP address shown on this page for your machine. Open Google...

Deploying to Google App Engine

To deploy to Google App Engine, use the app command group. The App Engine allows us to host websites developed in PHP, Java, Go, or Python in the standard environment. To deploy a sample app to the App Engine instance that was created in Chapter 1, Google Cloud and Google BigQuery, follow the steps listed here:

  1. Download the sample PHP file and app.yaml file from the GitHub URL (https://github.com/hthirukkumaran/Learning-Google-BigQuery/tree/master/chapter2/phpapp) to a folder on the local computer. This PHP code connects to the Cloud SQL instance of the same project, executes a sample query, and displays the result.
  2. Open the index.php file and modify the $dsn variable value to point to your Cloud SQL instance. To get the full qualified Cloud SQL instance name, open the Cloud SQL instance in your project. Copy the value of the instance connection...

Summary

This chapter covered the Google Cloud SDK and the utilities that come with it. The samples in this chapter provided an overview of how to interact with Google Cloud Services and how to manage your resources on Google Cloud Platform using the Google Cloud SDK. Use the utilities to write your batch programs that you run from your on-premises infrastructure.

The next chapter covers data types in Google BigQuery, how to use them when creating custom tables, and how to import and export the data using the bq utility that was covered in this chapter.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Learning Google BigQuery
Published in: Dec 2017Publisher: PacktISBN-13: 9781787288591
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Thirukkumaran Haridass

Thirukkumaran Haridass currently works as a lead software engineer at Builder Homesite Inc. in Austin, Texas, USA. He has over 15 years of experience in the IT industry. He has been working on the Google Cloud Platform for more than 3 years. Haridass is responsible for the big data initiatives in his organization that help the company and its customers realize the value of their data. He has played various roles in the IT industry and worked for Fortune 500 companies in various verticals, such as retail, e-commerce, banking, automotive, and presently, real estate online marketing.
Read more about Thirukkumaran Haridass

author image
Eric Brown

Eric Brown currently works as an analytics manager for PMG advertising in Austin, Texas. Eric has over 11 years of experience in the data analytics field. He has been working on the Google Cloud Platform for over 3 years. He oversees client web analytics implementations and implements big data integrations in both Google BigQuery and Amazon Redshift. Eric has a passion for analytics, and especially for visualization and data manipulation through open source tools such as R. He has worked in various roles in various verticals, such as web analytics service providers, media companies, real-estate online marketing, and advertising.
Read more about Eric Brown