You're reading from The Definitive Guide to Google Vertex AI

Product typeBook

Published inDec 2023

PublisherPackt

ISBN-139781801815260

Edition1st Edition

Concepts

Data Science

Authors (2):

Jasmeet Bhatia

Kartik Chaudhary

View More author details

ML APIs for Vision, NLP, and Speech

Research teams at Google have put their decades of research and experience into creating state-of-the-art solutions for many complex problems. Some of these solutions, which include Vision AI, Translation AI, Natural Language AI, and Speech AI, are quite general-purpose and can be readily leveraged to get insights from complex and unstructured data. These solutions are provided as a service and thus as customers, we don’t have to worry about managing the infrastructure, availability, or scaling of these products. Many popular Google products, such as Maps, Photos, Gmail, YouTube, and others make use of these products every day to provide AI-driven experiences.

In this chapter, we will look at some of these popular offerings and understand what kind of problems can be solved using them. The main topics that will be covered in this chapter are as follows:

Vision AI on Google Cloud
Translation AI on Google Cloud
Natural Language...

Vision AI on Google Cloud

Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive insights from visual data such as digital images and videos. Understanding images and videos is a complex task, but with never-ending research in the field, the AI research community has led to the development of many smart ways of getting information out of unstructured data, such as images and videos. Information extracted from digital images and videos can be leveraged by businesses to take action and provide recommendations at scale. Google Cloud provides the following two offerings as a platform to solve computer vision problems:

Vision AI
Video AI

Now, let’s deep dive into each of these offerings.

Vision AI

Google Vision AI provides a platform for creating vision-based applications with pre-trained APIs, AutoML, or custom models. Using Vision AI, we can create image and video analytics solutions in just a few minutes...

Translation AI on Google Cloud

As its name suggests, Translation AI on Google Cloud is an offering that can be utilized to create applications with multi-lingual content with fast and dynamic machine translation. Multi-lingual content can help businesses take their products to global markets and engage with global audiences. Its real-time translation capabilities provide a seamless experience. Let’s take a look at translation-related offerings on Google Cloud.

Google Cloud provides three translation products:

Cloud Translation API
AutoML Translation
Translation Hub

Let’s deep dive into each of these products.

Cloud Translation API

Google Research has developed several neural machine translation (NMT) models over time and keeps improving them whenever there is better training data or improved techniques. The Cloud Translation API makes use of these pre-trained models or custom ML models to translate text from various source languages into...

Natural Language AI on Google Cloud

Almost every organization deals with large amounts of text data in the form of text documents, forms, contracts, PDFs, web pages, user reviews, and so on. Google Cloud offers Natural Language AI, which leverages ML models to derive insights from unstructured text data. Natural Language AI is an end-to-end product that can help in extracting, analyzing, and storing text on Google Cloud.

Google offers the following three natural language solutions:

AutoML for Text Analysis
Natural Language API
Healthcare Natural Language API

Let’s take a closer look at each of these solutions.

AutoML for Text Analysis

Imagine that there is an e-commerce company that receives customer queries related to a wide variety of issues, including payment failures, delivery address updates, product quality issues, and so on. As most of these queries are typed by customers in a text box, there is a need to classify these queries into a fixed...

Speech AI on Google Cloud

Another important form of capturing and storing information is speech. Google has done decades of research to come up with state-of-the-art solutions for many speech and audio data-related use cases. A significant amount of critical information is present in the forms of audio calls and recorded messages and thus it becomes important to transcribe and extract useful insights from them. Also, there are voice assistant-related use cases that demand text-to-speech kind of functionality. Google Cloud offers several solutions for speech understanding and transcriptions. To help organizations tackle these use cases, Google has created the following product offerings related to speech data:

Speech-to-Text
Text-to-Speech

Now, let’s learn about each of them in detail.

Speech-to-Text

A good chunk of useful data is present in unstructured form, such as audio recordings, customer voice calls, videos, and so on, for many organizations. Thus...

Summary

Not all the important data is present in a structured format. A significant amount of important information is found in unstructured forms such as audio, videos, documents, recordings, and so on. The progress that’s been made in ML has enabled us to analyze these unstructured data sources on a large scale to extract actionable insights and inform key business decisions. Google has worked on this ML research problem extensively to come up with state-of-the-art solutions for voice, vision, NLP, speech, and more.

In this chapter, we learned about different offerings from Google for understanding and extracting information from unstructured data formats, including audio, videos, images, documents, phone call recordings, and more. After reading this chapter, we should now have a good understanding of each of these offerings, including their key features and potential use cases. After discussing them in detail, we should now be able to find new use cases to apply these...

The rest of the chapter is locked

You have been reading a chapter from

The Definitive Guide to Google Vertex AI

Published in: Dec 2023Publisher: PacktISBN-13: 9781801815260

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (2)

Jasmeet Bhatia

Jasmeet is a Machine Learning Architect with over 8 years of experience in Data Science and Machine Learning Engineering at Google and Microsoft, and overall has 17 years of experience in Product Engineering and Technology consulting at Deloitte, Disney, and Motorola. He has been involved in building technology solutions that focus on solving complex business problems by utilizing information and data assets. He has built high performing engineering teams, designed and built global scale AI/Machine Learning, Data Science, and Advanced analytics solutions for image recognition, natural language processing, sentiment analysis, and personalization.
Read more about Jasmeet Bhatia

Kartik Chaudhary

Kartik is an Artificial Intelligence and Machine Learning professional with 6+ years of industry experience in developing and architecting large scale AI/ML solutions using the technological advancements in the field of Machine Learning, Deep Learning, Computer Vision and Natural Language Processing. Kartik has filed 9 patents at the intersection of Machine Learning, Healthcare, and Operations. Kartik loves sharing knowledge, blogging, travel, and photography.
Read more about Kartik Chaudhary

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages