Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide
Read more
Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide
Agnieszka Koziorowska, Wojciech Marusiak
900 min read
2024-11-05 13:23:03
0 Likes
0 Comments
This article is an excerpt from the book, "Google Cloud Associate Cloud Engineer Certification and Implementation Guide", by Agnieszka Koziorowska, Wojciech Marusiak. This book serves as a guide for students preparing for ACE certification, offering invaluable practical knowledge and hands-on experience in implementing various Google Cloud Platform services. By actively engaging with the content, you’ll gain the confidence and expertise needed to excel in your certification journey.
Introduction
In this article, we will walk you through an example of implementing Google Cloud Functions for optical character recognition (OCR) on Google Cloud Platform. This tutorial will demonstrate how to automate the process of extracting text from an image, translating the text, and storing the results using Cloud Functions, Pub/Sub, and Cloud Storage. By leveraging Google Cloud Vision and Translation APIs, we can create a workflow that efficiently handles image processing and text translation. The article provides detailed steps to set up and deploy Cloud Functions using Golang, covering everything from creating storage buckets to deploying and running your function to translate text.
Google Cloud Functions Example
Now that you’ve learned what Cloud Functions is, I’d like to show you how to implement a sample Cloud Function.
We will guide you through optical character recognition (OCR) on Google Cloud Platform with Cloud Functions.
Our use case is as follows:
1. An image with text is uploaded to Cloud Storage.
2. A triggered Cloud Function utilizes the Google Cloud Vision API to extract the text and identify the source language.
3. The text is queued for translation by publishing a message to a Pub/Sub topic.
4. A Cloud Function employs the Translation API to translate the text and stores the result in the translation queue.
5. Another Cloud Function saves the translated text from the translation queue to Cloud Storage. 6. The translated results are available in Cloud Storage as individual text files for each translation.
We need to download the samples first; we will use Golang as the programming language. Source files can be downloaded from – https://github.com/GoogleCloudPlatform/golangsamples. Before working with the OCR function sample, we recommend enabling the Cloud Translation API and the Cloud Vision API. If they are not enabled, your function will throw errors, and the process will not be completed. Let’s start with deploying the function:
1. We need to create a Cloud Storage bucket. Create your own bucket with unique name – please refer to documentation on bucket naming under following link: https://cloud.google.com/storage/docs/buckets We will use the following code:
gsutil mb gs://wojciech_image_ocr_bucket
2. We also need to create a second bucket to store the results:
gsutil mb gs://wojciech_image_ocr_bucket_results
3. We must create a Pub/Sub topic to publish the finished translation results. We can do so with the following code: gcloud pubsub topics create YOUR_TOPIC_NAME. We used the following command to create it:
6. From the repository, we need to go to the golang-samples/functions/ocr/app/ file to be able to deploy the desired Cloud Function.
7. We recommend reviewing the included go files to review the code and understand it in more detail. Please change the values of your storage buckets and Pub/Sub topic names.
8. We will deploy the first function to process images. We will use the following command:
10. The last part of the complete solution is a third Cloud Function that saves results to Cloud Storage. We will use the following snippet of code to do so:
11. We are now free to upload any image containing text. It will be processed first, then translated and saved into our Cloud Storage bucket.
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
12. We uploaded four sample images that we downloaded from the Internet that contain some text. We can see many entries in the ocr-extract-go Cloud Function’s logs. Some Cloud Function log entries show us the detected language in the image and the other extracted text:
Figure 7.22 – Cloud Function logs from the ocr-extract-go function
13. ocr-translate-go translates detected text in the previous function:
Figure 7.23 – Cloud Function logs from the ocr-translate-go function
14. Finally, ocr-save-go saves the translated text into the Cloud Storage bucket:
Figure 7.24 – Cloud Function logs from the ocr-save-go function
15. If we go to the Cloud Storage bucket, we’ll see the saved translated files:
Figure 7.25 – Translated images saved in the Cloud Storage bucket
16. We can view the content directly from the Cloud Storage bucket by clicking Download next to the file, as shown in the following screenshot:
Figure 7.26 – Translated text from Polish to English stored in the Cloud Storage bucket
Cloud Functions is a powerful and fast way to code, deploy, and use advanced features. We encourage you to try out and deploy Cloud Functions to understand the process of using them better.
At the time of writing, Google Cloud Free Tier offers a generous number of free resources we can use. Cloud Functions offers the following with its free tier:
2 million invocations per month (this includes both background and HTTP invocations)
400,000 GB-seconds, 200,000 GHz-seconds of compute time
In conclusion, Google Cloud Functions offer a powerful and scalable solution for automating tasks like optical character recognition and translation. Through this example, we have demonstrated how to use Cloud Functions, Pub/Sub, and the Google Cloud Vision and Translation APIs to build an end-to-end OCR and translation pipeline. By following the provided steps and code snippets, you can easily replicate this process for your own use cases. Google Cloud's generous Free Tier resources make it accessible to get started with Cloud Functions. We encourage you to explore more by deploying your own Cloud Functions and leveraging the full potential of Google Cloud Platform for serverless computing.
Author Bio
Agnieszka is an experienced Systems Engineer who has been in the IT industry for 15 years. She is dedicated to supporting enterprise customers in the EMEA region with their transition to the cloud and hybrid cloud infrastructure by designing and architecting solutions that meet both business and technical requirements. Agnieszka is highly skilled in AWS, Google Cloud, and VMware solutions and holds certifications as a specialist in all three platforms. She strongly believes in the importance of knowledge sharing and learning from others to keep up with the ever-changing IT industry.
With over 16 years in the IT industry, Wojciech is a seasoned and innovative IT professional with a proven track record of success. Leveraging extensive work experience in large and complex enterprise environments, Wojciech brings valuable knowledge to help customers and businesses achieve their goals with precision, professionalism, and cost-effectiveness. Holding leading certifications from AWS, Alibaba Cloud, Google Cloud, VMware, and Microsoft, Wojciech is dedicated to continuous learning and sharing knowledge, staying abreast of the latest industry trends and developments.