Reader small image

You're reading from  The Definitive Guide to Google Vertex AI

Product typeBook
Published inDec 2023
PublisherPackt
ISBN-139781801815260
Edition1st Edition
Concepts
Right arrow
Authors (2):
Jasmeet Bhatia
Jasmeet Bhatia
author image
Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

Kartik Chaudhary
Kartik Chaudhary
author image
Kartik Chaudhary

​Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary

View More author details
Right arrow

ML APIs for Vision, NLP, and Speech

Research teams at Google have put their decades of research and experience into creating state-of-the-art solutions for many complex problems. Some of these solutions, which include Vision AI, Translation AI, Natural Language AI, and Speech AI, are quite general-purpose and can be readily leveraged to get insights from complex and unstructured data. These solutions are provided as a service and thus as customers, we don’t have to worry about managing the infrastructure, availability, or scaling of these products. Many popular Google products, such as Maps, Photos, Gmail, YouTube, and others make use of these products every day to provide AI-driven experiences.

In this chapter, we will look at some of these popular offerings and understand what kind of problems can be solved using them. The main topics that will be covered in this chapter are as follows:

  • Vision AI on Google Cloud
  • Translation AI on Google Cloud
  • Natural Language...

Vision AI on Google Cloud

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive insights from visual data such as digital images and videos. Understanding images and videos is a complex task, but with never-ending research in the field, the AI research community has led to the development of many smart ways of getting information out of unstructured data, such as images and videos. Information extracted from digital images and videos can be leveraged by businesses to take action and provide recommendations at scale. Google Cloud provides the following two offerings as a platform to solve computer vision problems:

  • Vision AI
  • Video AI

Now, let’s deep dive into each of these offerings.

Vision AI

Google Vision AI provides a platform for creating vision-based applications with pre-trained APIs, AutoML, or custom models. Using Vision AI, we can create image and video analytics solutions in just a few minutes...

Translation AI on Google Cloud

As its name suggests, Translation AI on Google Cloud is an offering that can be utilized to create applications with multi-lingual content with fast and dynamic machine translation. Multi-lingual content can help businesses take their products to global markets and engage with global audiences. Its real-time translation capabilities provide a seamless experience. Let’s take a look at translation-related offerings on Google Cloud.

Google Cloud provides three translation products:

  • Cloud Translation API
  • AutoML Translation
  • Translation Hub

Let’s deep dive into each of these products.

Cloud Translation API

Google Research has developed several neural machine translation (NMT) models over time and keeps improving them whenever there is better training data or improved techniques. The Cloud Translation API makes use of these pre-trained models or custom ML models to translate text from various source languages into...

Natural Language AI on Google Cloud

Almost every organization deals with large amounts of text data in the form of text documents, forms, contracts, PDFs, web pages, user reviews, and so on. Google Cloud offers Natural Language AI, which leverages ML models to derive insights from unstructured text data. Natural Language AI is an end-to-end product that can help in extracting, analyzing, and storing text on Google Cloud.

Google offers the following three natural language solutions:

  • AutoML for Text Analysis
  • Natural Language API
  • Healthcare Natural Language API

Let’s take a closer look at each of these solutions.

AutoML for Text Analysis

Imagine that there is an e-commerce company that receives customer queries related to a wide variety of issues, including payment failures, delivery address updates, product quality issues, and so on. As most of these queries are typed by customers in a text box, there is a need to classify these queries into a fixed...

Speech AI on Google Cloud

Another important form of capturing and storing information is speech. Google has done decades of research to come up with state-of-the-art solutions for many speech and audio data-related use cases. A significant amount of critical information is present in the forms of audio calls and recorded messages and thus it becomes important to transcribe and extract useful insights from them. Also, there are voice assistant-related use cases that demand text-to-speech kind of functionality. Google Cloud offers several solutions for speech understanding and transcriptions. To help organizations tackle these use cases, Google has created the following product offerings related to speech data:

  • Speech-to-Text
  • Text-to-Speech

Now, let’s learn about each of them in detail.

Speech-to-Text

A good chunk of useful data is present in unstructured form, such as audio recordings, customer voice calls, videos, and so on, for many organizations. Thus...

Summary

Not all the important data is present in a structured format. A significant amount of important information is found in unstructured forms such as audio, videos, documents, recordings, and so on. The progress that’s been made in ML has enabled us to analyze these unstructured data sources on a large scale to extract actionable insights and inform key business decisions. Google has worked on this ML research problem extensively to come up with state-of-the-art solutions for voice, vision, NLP, speech, and more.

In this chapter, we learned about different offerings from Google for understanding and extracting information from unstructured data formats, including audio, videos, images, documents, phone call recordings, and more. After reading this chapter, we should now have a good understanding of each of these offerings, including their key features and potential use cases. After discussing them in detail, we should now be able to find new use cases to apply these...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
The Definitive Guide to Google Vertex AI
Published in: Dec 2023Publisher: PacktISBN-13: 9781801815260
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (2)

author image
Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

author image
Kartik Chaudhary

​Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary