Google Cloud AI Services Quick Start Guide

By Arvind Ravulavaru
    What do you get with a Packt Subscription?

  • Instant access to this title and 7,500+ eBooks & Videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
About this book

Cognitive services are the new way of adding intelligence to applications and services. Now we can use Artificial Intelligence as a service that can be consumed by any application or other service, to add smartness and make the end result more practical and useful.

Google Cloud AI enables you to consume Artificial Intelligence within your applications, from a REST API.  Text, video and speech analysis are among the powerful machine learning features that can be used. This book is the easiest way to get started with the Google Cloud AI services suite and open up the world of smarter applications.

This book will help you build a Smart Exchange, a forum application that will let you upload videos, images and perform text to speech conversions and translation services. You will use the power of Google Cloud AI Services to make our simple forum application smart by validating the images, videos, and text provided by users to Google Cloud AI Services and make sure the content which is uploaded follows the forum standards, without a human curator involvement.

You will learn how to work with the Vision API, Video Intelligence API, Speech Recognition API, Cloud Language Process, and Cloud Translation API services to make your application smarter.

By the end of this book, you will have a strong understanding of working with Google Cloud AI Services, and be well on the way to building smarter applications.

Publication date:
May 2018


Introducing Google Cloud AI Services

Cognition as a Service (CAAS) is the new kid on the block. No longer do engineers need to spend time building intelligence on top of their applications. Now, intelligence is borrowed from various services.

Welcome to Google Cloud AI Services Quick Start Guide, where we are going to explore the powerful Google Cloud AI services via a project-based approach. We are going to build a forum application named SmartExchange, similar to Discourse or Stack Overflow, where users start a discussion thread and other users comment on it.

To manage the content that goes into the SmartExchange application, we are going to use Google Cloud AI services such as Cloud Vision API and Video Intelligence API.

In this chapter, we are going to cover the following:

  • What is Google Cloud Platform?
  • Cognition in the cloud
  • What is Google Cloud AI?
  • Overview of Google Cloud AI services

Google Cloud Platform

We are going to start by understanding Google Cloud Platform (GCP). GCP is a collection of services that leverages the power of Cloud Computing ( Along with these services, GCP also offers tools to manage these services.

GCP has a command-line interface or Cloud SDK (, using which engineers can easily manage and monitor these services.

As of March 2018, GCP has the following verticals of services, which we will discuss in the following subsections. You can read more about Google Platform offerings here:


Compute offers infrastructure to perform user-defined computing. Some of the services in this vertical are the Compute Engine, App Engine, and Cloud Functions.

Big data

As the name suggests, Big data provides tools needed to work with large volumes of data. Some of the services in this vertical are BigQuery, Cloud Datalab, and Genomics.

Identity and security

Identity and security provides tools needed for identity, access, and content security. Some of the services in this vertical are Cloud IAM, Cloud Resource Manager, and Cloud Security Scanner.

Internet of Things (IoT)

Currently, GCP has one core service, named Cloud IoT Core, under Internet of Things (IoT) , which provides device management services when working with IoT.

Storage and databases

Storage and databases provides storage services when dealing with large volumes of data. Right from object storage to block storage, this vertical has all the services needed. Some of the services in this vertical are cloud storage, Cloud Bigtable, Cloud SQL, Cloud Spanner, and Persistent Disk.

Data transfer

Data transfer services help us easily import or export data from one service to another. The three services currently in this vertical are Google BigQuery Data Transfer Service, Cloud Storage Transfer Service, and Google Transfer Appliance.

API platform and ecosystem

API platform and ecosystem offers services that help in managing and protecting APIs. From API monetization to API analytics, this vertical offers them all. This vertical also supports Apigee platform services.

Management tools

Management tools offers tools needed to manage various cloud services offered by GCP. This service has tools needed for logging, monitoring, and controlling various other GCP services.


Networking offers Virtual Private Cloud (VPC), Content Delivery Network (CDN), and Domain-Naming Systems (DNS), to name a few.

Cloud AI

Cloud AI offers various services that are needed to add Artificial Intelligence to our applications. Cloud-based machine learning, Cloud Vision API, and Cloud Speech API are some of these services.

Developer tools

Last, but not the least, the developer tools provides all the essential tools developers need to quickly bring up an application or a solution on top of GCP. Some of the software under this offering includes container registry, Cloud Test Lab, Cloud Tools for Eclipse, and Cloud SDK.

In this book, we are going to work closely with the Cloud AI vertical. In the next section, we are going to look at the what and the why of Cloud AI.


Cognition on cloud

In the last section, we saw the various services offered by Google Cloud Platform. One of the services we saw in that section is Google Cloud AI Services. Before we start exploring Google Cloud AI Services, let's understand what it's importance is.

We have been using the cloud as a central entity for storing data and providing scalable computing for more than a decade now. Until recently, all applications had intelligence built in locally. Times have changed; we are now using the cloud as a central dispatcher for intelligence. We have separated the application from its intelligence and have hosted the intelligence so that everyone can use it and not just the application.

So, what exactly is AI in the cloud? It is when clients upload data to a cognitive service, and the service responds with a prediction.

Let's take a moment to understand the three previously highlighted terms: the clients, the data, and the cognitive service. This helps us better define AI on the cloud.


When I say clients, I mean any device that has the capability of making an HTTP request to an endpoint and being able to resolve the response.

This could be a simple web/mobile/desktop application or a piece of smart internet-enabled hardware such as an IoT device, a voice assistant, a smartwatch, or a smart camera.

If these were defined as clients, what would the data that we are dealing with be?

Data types

As we have seen the different types of clients, let's see what kinds of data they produce.

In today's world, data can be categorized into three types:

  • Structured data: Structured data is well defined data and the entire dataset follows a schema for such data. Examples of structured data are CSV files and RDBMS databases.
  • Unstructured data: Unstructured data, on the other hand, is not well defined and there are structural changes to the data throughout the dataset. Examples of unstructured data are audio files, video files, and image files.
  • Semi-structured data: Data that is present in emails, log files, text files, or word documents is considered unstructured data.

So, data has three types, and we need a cognitive service that can consume this data and respond with an intelligent response. So, let's define what a cognitive service is.

Cognitive services

A cognitive service is a piece of computing software that can consume a data type that we have defined previously and respond with a cognitive response.

A simple example of a cognitive service is image intelligence. This is the ability to upload an image to view its contents and label it. Almost all of us have experienced this feature using the camera app, where the camera software can detect faces and detect smiles on those faces. This is image intelligence.

Another type of intelligence is sentiment analysis. Given a few paragraphs of text, the cognitive service can detect the emotions in the text. A simple example could be a product Twitter account feeding all the tweets it's tagged in into a cognitive service to see the overall sentiment of people using the product.

Of late, video intelligence has become even more common. This is the ability to scan a video's contents and label it for rapid detection of content in various frames and scenes, and this is very helpful for navigating and indexing a long video.

Now that we understand what AI on the cloud is, let's look at why we need it.

Why Cognition on Cloud?

This is a very important question that one needs to understand before going further. Here are a few reasons:

  • Distributed global intelligence
  • Process large volumes of data
  • Process different types of data
  • Cognitive accuracy

Distributed global intelligence defines how cognition as a service, when placed in a central location, can be used by many more entities to make them smart, rather than just one application.

Processing large volumes of data defines how the power of cloud computing can handle large volumes of data efficiently, which a normal computer or a human being would find difficult.

Processing different types of data defines how the cognitive service can process various types of data without much effort.

Cognitive accuracy is one of the most important features of all. The more data a machine learning algorithm service consumes, the better its accuracy. We will talk more about this in the next sections.

How do machines achieve intelligence?

Accuracy depends on how we train the system. There are two ways for machines to learn something:

  • Rule-based learning
  • Pattern-based learning

In rule-based learning, the developer defines a bunch of rules and the machine parses the incoming data against those rules to come to a conclusion. This approach is good for monotonous systems and where things do not change that often.

What if we are trying to build ;intelligence for a weather prediction system? Will the learning that we have had up to today be enough for us to get an accurate prediction, even after 50 years? Maybe not.

This is where pattern-based learning comes in. Pattern-based learning is more popularly known as machine learning (ML). In today's world, most of the learning by computers happens through machine learning. Let's take a quick look at how ML plays an important role in this.

Cognitive accuracy and machine learning

Machine learning is the process a machine follows to learn about various things. Some things are easier to learn than others.

Artificial Intelligence is a collection of such machine learnings that can be put to use in the real world to make decisions or to predict something.

Here is a diagram that shows how a typical machine learns:

We have a data-gathering source at one end, which gets data from various reliable sources, depending on the solution. This data has both features and labels. Features are columns of data that are taken as input to learning, and labels are the expected outcome for that set of features. Let's take a look at an example of weather station data:





17 degrees Celsius


5 km per hour

10 mm

23 degrees Celsius


1 km per hour

0 mm

The columns named Temperature, Humidity, and Wind are features, and Rainfall is a label, in our table. Using this type of supervised learning, we would build a data model from this data and ask a question such as: Given the following features, what is the chance of rain?

The data we gather is the most important part of machine learning, as the quality and quantity of data define the accuracy of prediction.

Once the data has been gathered, this data is then cleaned and normalized. The cleaned data is then split into two parts, training data and testing data. Training data is used to train the data model and testing data is used to cross-validate the accuracy of the data model.

Now, depending on the type of cognitive service we want to provide, we would use a machine learning algorithm and feed the training data to it, building something called a data model.

A data model is a snapshot of the learning and this snapshot is now tested against the testing data. This step is critical in analyzing the accuracy of the data model.

The model can be trained again and again with various sets of data to have a better accuracy. Once the model is completed, we host it as an API for other systems to query it, passing their features. Based on the prediction results from here on, we would refine the data model.

The previous process is how most cognitive services are built. Now, one of the key steps of data model accuracy depends on the quality and quantity of data.

The more accurate the data that is fed to the machine learning algorithm, the higher the quality of the data model.

Imagine a cognitive service such as explicit image detection built by you or your organization. We need data to train this cognitive service to start with. How many images can we feed it, 1 million, 2 million? Imagine the size of infrastructure needed for training about 10 million images.

Once the service is built, how many hits will your users make? 1 million requests per day? And will this be sufficient to know the accuracy of your model and improve it?

Now, on the other hand, consider data models built by the likes of Google, which pretty much has access to almost all the content of the internet. And imagine the number of people using this service, thus helping the cognitive service to learn by experience.

Within no time, a cognitive service like this will be far more accurate, not only for mainstream scenarios, but also corner cases.

In cognitive services, accuracy increases with the quality and quantity of data and this is one of the main things that adds value to cloud-based cognition over local intelligence.

Take a look at this video titled Inside Google Translate:, which explains how the Google Translate service works. This re-emphasizes the thought I expressed previously about how machines learn.

This concludes our section on why cognition on the cloud. In the next section, we are going to explore various Google Cloud AI services.


Google Cloud AI

Now that we understand what Cognition/AI on cloud ;is and why we need it, let's get started with learning the various Google Cloud AI services that are offered.

We have been briefly introduced to Google Cloud AI services in the GCP services section. Now let's dive deep into its offering.

In the next few subsections, we will be going through each of the services under the Google Cloud AI vertical.

Cloud AutoML Alpha

As of April 2018, Cloud AutoML is in alpha and is only available on request, subject to GCP terms and conditions.

AutoML helps us develop custom machine learning models with minimal ML knowledge and experience, using the power of Google's transfer learning and Neural Architecture Search technology.

Under this service, the first custom service that Google is releasing is named AutoML Vision. This service will help users to train custom vision models for their own use cases.

There are other services that will follow.

Some of the key AutoML features are the following:

  • Integration with human labeling
  • Powered by Google's Transfer Learning and AutoML
  • Fully integrated with other services of Google Cloud

You can read more about AutoML here:

Cloud TPU Beta

As of today, this service is in beta, but we need to explicitly request a TPU quota for our processing needs.

Using the Cloud TPUs, one can easily request large computation power to run our own machine learning algorithms. This service helps us with not only the required computing, but by using Google's TensorFlow, we can accelerate the complete setup.

This service can be used to perform heavy-duty machine learning, both training and prediction.

Some of the key Cloud TPU features are the following:

  • High performance
  • Utilizing the power of GCP
  • Referencing data models
  • Fully Integrated with other services of Google Cloud
  • Connecting Cloud TPUs to custom machine types

You can read more about Cloud TPU here:

Cloud Machine Learning Engine

Cloud Machine Learning Engine helps us easily build machine learning models that work on any type of data, of any size. Cloud Machine Learning Engine can take any TensorFlow model and perform large-scale training on a managed cluster. Additionally, it can also manage the trained models for large-scale online and batch predictions.

Cloud Machine Learning Engine can seamlessly transition from training to prediction, using online and batch prediction services. Cloud Machine Learning Engine uses the same scalable and distributed infrastructure with GPU acceleration that powers Google ML products.

Some of the key Cloud Machine Learning Engine features are the following:

  • Fully integrated with other Google Cloud services
  • Discover and Share Samples
  • HyperTune your models
  • Managed and Scalable Service
  • Notebook Developer Experience
  • Portable Models

You can read more about Cloud Machine Learning Engine here:

Cloud Job Discovery Private Beta

Matching qualified people with the right people doesn't have to be so hard; that is the premise of Cloud Job Discovery.

Today's job portals and career sites search people for a job role based on keywords. This approach most of the time results in a mismatch of the candidate to the role. That is where Cloud Job Discovery comes into the picture to bridge the gap between employer and employee. Job Discovery provides plug-and-play access to Google's search and machine learning capabilities, enabling the entire recruiting ecosystem—company career sites, job boards, applicant-tracking systems, and staffing agencies—to improve job site engagement and candidate conversion.

Before we continue, you can navigate to and try out the Job Discovery Demo. You should see results based on your selection, similar to the following screenshot:

The key takeaway from the demo is how Discovery relates a profile to a keyword.

This diagram explains how Cloud Job Discovery works:

Some of the key differences of Cloud Job Discovery over a standard keyword search are the following:

  • Keyword matching
  • Company jargon recognition
  • Abbreviation recognition
  • Commute search
  • Spelling correction
  • Concept recognition
  • Title detection
  • Real-time query broadening
  • Employer recognition
  • Job enrichment
  • Advanced location mapping
  • Location expansion
  • Seniority alignment

Dialogflow Enterprise Edition Beta

Dialogflow is a development suite which is used for building interfaces for websites, mobile applications, some of the popular machine learning platforms, and IoT devices.

It is powered by machine learning to recognize the intent and context of what a user says, allowing your conversational interface to provide highly efficient and accurate responses. Natural language understanding recognizes a user's intent and extracts prebuilt entities such as time, date, and numbers. You can train your agent to identify custom entity types by providing a small dataset of examples.

This service offers cross-platform and multi-language support and can work well with the Google Cloud speech service.

You can read more about Dialogflow Enterprise Edition here:

Cloud Natural Language

Google's Cloud Natural Language service helps us better understand the structure and meaning of a piece of text by providing powerful machine learning models.

These models can be queried by REpresentational State Transfer (REST) API. We can use it to understand sentiment about our product on social media, or parse intent from customer conversations happening in a call center or through a messaging app.

Before we continue with Cloud Natural Language, I would recommend heading over to and trying out the API. Here is a quick glimpse of it:

As we can see from the previous screenshot, this service offers various insights regarding a piece of text.

Some of the key features are:

  • Syntax analysis
  • Entity recognition
  • Sentiment analysis
  • Content classification
  • Multi-language
  • Integrated REST API

You can read more about Cloud Natural Language service here:

Cloud Speech API

Cloud Speech API uses powerful neural network models to convert audio to text in real time. This service is exposed as a REST API, as we have seen with the Google Cloud Natural Language API.

This API can recognize over 110 languages and users can use this service to convert speech to text in real time, recognize audio uploaded in the request, and integrate with our audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products.

Before we continue with Cloud Speech API, I would recommend heading over to and trying out the API. Here is a quick glimpse of it:

I was actually playing a song in the background and tried the speech-to-text. I was very impressed with the results, except for one part, where I said with a song playing and the API represented it as with the song playing; still, pretty good!

I think it is only a matter of time and continued use of these services that will increase their accuracy.

Some of the key features of Cloud Speech API are:

  • Automatic Speech Recognition (ASR)
  • Global vocabulary
  • Streaming recognition
  • Word hints
  • Real-time or prerecorded audio support
  • Noise robustness
  • Inappropriate content filtering
  • Integrated API

You can read more about Cloud Speech API here:

Cloud Translation API

Using the state-of-the-art Neural Machine Translation, the Cloud Translation service converts texts from one language to another.

Translation API is highly responsive, so websites and applications can integrate with Translation API for fast, dynamic translation of source text from the source language to a target language.

Before we continue with Cloud Translation API, I would recommend heading over to and trying out the API. Here is a quick glimpse of it, as shown in the following screenshot:

Some of the key features of Cloud Translation API are as follows:

  • Programmatic access – REST API-driven
  • Text translation
  • Language detection
  • Continuous updates

You can read more about Cloud Translate API here:

Cloud Vision API

Fred R. Barnard of Printers' Ink stated "A picture is worth ten thousand words".

But no one really knows what those words are. Here comes the Google Cloud Vision API to decipher that for us.

Cloud Vision API takes an image as input and spits out the contents of the image as text. It can understand the contents of the image. And this service can be accessed over REST API.

Before we continue with Cloud Vision API, I would recommend heading over to and trying out the API. Here is a quick glimpse of it as shown in the screenshot:

That is a photo of me when I was going through a trying-to-grow-long-hair phase, and after having fun at the beach. What is important is how the vision service was able to look at the image and detect my mood.

The same service can perform label detection as well as detect web entities related to this image among others.

Some of the key features of this service are:

  • Detecting explicit content
  • Detecting logos, labels, landmarks
  • Landmark detection
  • Optical character recognition
  • Face detection
  • Image attributes
  • Integrated REST API

To find out more about Cloud Vision API, check this out:

Cloud Video Intelligence

Cloud Video Intelligence is one of the latest cognitive services released by Google. Cloud Video Intelligence API does almost all the things that the Cloud Vision API can do, but on videos.

This service extracts the metadata from a video frame by frame, and we can search any moment of the video file.

Before we continue with Cloud Video Intelligence, I would recommend heading over to and trying out the API. Here is a quick glimpse of it, as shown in the screenshot:

I have selected the dinosaur and the bicycle video, and you can see the analysis.

Some of the key features of Cloud Video Intelligence are:

  • Label detection
  • Shot change detection
  • Explicit content detection
  • Video transcription Alpha

This concludes the overview of the various services offered as part of the Cloud AI vertical.

In this book, we are going to use a few of these to make a simple web application smart.



In this introductory chapter, we went through what Google Cloud Platform is and what services it offers. Next, we saw what Cloud Intelligence is and why we need it. After that, we went through the various services provide under the Cloud AI vertical.

In the next chapter, we are going to get started with exploring various Cloud AI services and how we can integrate them with the forum application, SmartExchange, which we are going to build.

About the Author
  • Arvind Ravulavaru

    Arvind Ravulavaru is a platform architect at Ubiconn IoT Solutions, with over 9 years of experience in software development and 2 years in hardware & product development. For the last 5 years, he has been working extensively on JavaScript, both on the server side and the client side. And for the last couple of years in IoT, building a platform for rapidly developing IoT solutions, named The IoT Suitcase. Prior to that, Arvind worked on big data, cloud computing, and orchestration.

    Browse publications by this author
Google Cloud AI Services Quick Start Guide
Unlock this book and the full library FREE for 7 days
Start now