Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Learn TensorFlow Enterprise
Learn TensorFlow Enterprise

Learn TensorFlow Enterprise: Build, manage, and scale machine learning workloads seamlessly using Google's TensorFlow Enterprise

eBook
₹999.99 ₹2740.99
Paperback
₹3425.99
eBook + Subscription
₹1000 Monthly

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Table of content icon View table of contents Preview book icon Preview Book

Learn TensorFlow Enterprise

Chapter 1: Overview of TensorFlow Enterprise

In this introductory chapter, you will learn how to set up and run TensorFlow Enterprise in a Google Cloud Platform (GCP) environment. This will enable you to get some initial hands-on experience of how TensorFlow Enterprise integrates with other services in GCP. One of the most important improvements in TensorFlow Enterprise is the integration with the data storage options in Google Cloud, such as Google Cloud Storage and BigQuery.

This chapter starts by covering how to complete a one-time setup for the cloud environment and enable the necessary cloud service APIs. Then we will see how easy it is to work with these data storage systems at scale.

In this chapter, we'll cover the following topics:

  • Understanding TensorFlow Enterprise
  • Configuring cloud environments for TensorFlow Enterprise
  • Accessing the data sources

Understanding TensorFlow Enterprise

TensorFlow has become an ecosystem consisting of many valuable assets. At the core of its popularity and versatility is a comprehensive machine learning library and model templates that evolve quickly with new features and capabilities. This popularity comes at a cost, and that cost is expressed as complexity, intricate dependencies, and API updates or deprecation timelines that can easily break the models and workflow that were laboriously built not too long ago. It is one thing to learn and use the latest improvement in your code as you build a model to experiment with your ideas and hypotheses, but it is quite another if your job is to build a model for long-term production use, maintenance, and support.

Another problem associated with early TensorFlow in general concerned its code debugging process. In TensorFlow 1, lazy execution makes it rather tricky to test or debug your code because the code is not executed unless it is wrapped in a session, AKA a graph. Starting with TensorFlow 2, eager execution finally becomes a first-class citizen. Also, another welcome addition to TensorFlow 2 is the adoption of the Keras high-level API. This makes it much easier to code, experiment with, and maintain your model. It also improves the readability of your code and its training flow.

For enterprise adoption, there are typically these three major challenges that are of concern for stakeholders:

  • The first challenge is scale. A production-grade model has to be trained with large amounts of data, and often it is not practical or possible to fit into a single-node computer's memory. This also can be thought of as another problem: how do you pass training data to the model? It seems the natural and instinctive way is to declare and involve the entire dataset as a Pythonic structure such as a NumPy array or a pandas DataFrame, as we have seen in so many open source examples. But if the data is too large, then it seems reasonable to use another way of passing data into a model instance, similar to the Python iterator. In fact, TensorFlow.io and TensorFlow dataset libraries are specifically provided to address this issue. We will see how they ingest data in batches to a model training process in the subsequent chapters.
  • The second challenge that typically arises in consideration of enterprise adoption of TensorFlow is the manageability of the development environment. Backward compatibility is not a strength of TensorFlow, because there are historically very quick updates to and new releases of APIs that replace or deprecate old ones. This includes but is not limited to library version, API signature, and usage style deprecation. As you can imagine by now, this is a deal-breaker for development, debugging, and maintenance of the codebase; it also doesn't help with managing the stability and reproducibility of a production environment and its scoring results. It can easily become a nightmare for someone who manages and controls a machine learning development infrastructure and the standard practices in an enterprise project.
  • The third challenge is the efforts for API improvements, patch releases, and bug fixes. To address this, TensorFlow rolls these efforts into long-term support. Typically, for any TensorFlow release, Google's TensorFlow team is committed to providing these fixes for up to a year only. However, for an enterprise, this is too short for them to get a proper return on investment from the development cycle. Therefore, for enterprises' mission-critical performance, a longer commitment to TensorFlow releases is essential.

TensorFlow Enterprise was created to address these challenges. TensorFlow Enterprise is a special distribution of TensorFlow that is exclusively available through Google Cloud's various services. TensorFlow Enterprise is available through the following:

  • Google Cloud AI Notebooks
  • Google Cloud AI Deep Learning VMs
  • Google Cloud AI Deep Learning Containers
  • Partially available on Google Cloud AI Training

The dependencies such as drivers and library version compatibility are managed by Google Cloud. It also provides optimized connectivity with other Google Cloud services, such as Cloud Storage and the data warehouse (BigQuery). Currently, TensorFlow Enterprise supports versions 1.15, 2.1, and 2.3 of Google Cloud, and the GCP and TensorFlow teams will provide long-term support for up to three years, including bug fixes and updates.

In addition to these exclusive services and managed features, the TensorFlow team also takes enterprise support to another level by offering a white-glove service. This is a separate service from Google Cloud Support. In this case, TensorFlow engineers in Google will work with qualified enterprise customers to solve problems or provide bug fixes in cutting edge AI applications.

TensorFlow Enterprise packages

At the time of writing this book, TensorFlow Enterprise includes the following packages:

Figure 1.1 – TensorFlow packages

Figure 1.1 – TensorFlow packages

We will have more to say about how to launch JupyterLab in Google AI Platform in Chapter 2, Running TensorFlow Enterprise in Google AI Platform, but for now, as a demonstration, the following command can be executed as a CLI command in a JupyterLab cell. It will provide the version for each package in your instance so that you can be sure of version consistency:

!pip list | grep tensorflow

Here's the output:

Figure 1.2 – Google Cloud AI Platform JupyterLab environment

Figure 1.2 – Google Cloud AI Platform JupyterLab environment

We confirmed the environment is running a TensorFlow Enterprise distribution and all the library versions. Knowing this would help in future debugging and collaboration efforts.

Configuring cloud environments for TensorFlow Enterprise

Assuming you have a Google Cloud account already set up with a billing method, before you can start using TensorFlow Enterprise, there are some one-time setup steps that you must complete in Google Cloud. This setup consists of the following steps:

  1. Create a cloud project and enable billing.
  2. Create a Google Cloud Storage bucket.
  3. Enable the necessary APIs.

The following are some quick instructions for these steps.

Setting up a cloud environment

Now we are going to take a look at what we need to set up in Google Cloud before we can start using TensorFlow Enterprise. These setups are needed so that essential Google Cloud services can integrate seamlessly into the user tenant. For example, the project ID is used to enable resource creation credentials and access for different services when working with data in the TensorFlow workflow. And by virtue of the project ID, you can read and write data into your Cloud Storage and data warehouse.

Creating a project

This is the first step. It is needed in order to enable billing so you can use nearly all Google Cloud resources. Most resources will ask for a project ID. It also helps you organize and track your spending by knowing which services contribute to each workload. Let's get started:

  1. The URL for the page to create a project ID is https://console.cloud.google.com/cloud-resource-manager.

    After you have signed into the GCP portal, you will see a panel similar to this:

    Figure 1.3 – Google Cloud’s project creation panel

    Figure 1.3 – Google Cloud's project creation panel

  2. Click on CREATE PROJECT:
    Figure 1.4 – Creating a new project

    Figure 1.4 – Creating a new project

  3. Then provide a project name, and the platform will instantly generate a project ID for you. You can either accept it or edit it. It may give you a warning regarding how many projects you can create if you already have a few active projects:
    Figure 1.5 – Project name and project ID assignment

    Figure 1.5 – Project name and project ID assignment

  4. Make a note of the project name and project ID. Keep these handy for future use. Hit CREATE and soon you will see the platform dashboard for this project:
Figure 1.6 – The main project management panel

Figure 1.6 – The main project management panel

The project ID will frequently be used when accessing data storage. It is also used to keep track of resource consumption and allocation in a cloud tenant.

Creating a Google Cloud Storage bucket

A Google Cloud Storage bucket is a common way to store models and model assets from a model training job. Creating a storage bucket is very easy. Just look for Storage in the left panel and select Browser:

Figure 1.7 – Google Cloud’s Storage options

Figure 1.7 – Google Cloud's Storage options

Click CREATE BUCKET, and follow the instructions as indicated in the panel. In all cases, there are default options selected for you:

  1. Choose where to store your data. This is a trade-off between cost and availability as measured by performance. The default is multi-region to ensure the highest availability.
  2. Choose a default storage class for your data. This choice lets you decide on costs related to retrieval operations. The default is the standard level for frequently accessed data.
  3. Choose how to control access to objects. This offers two different access levels for the bucket. The default is object-level permissions (ACLs) in addition to bucket level permission (IAM).
  4. Advanced settings (optional). Here, you can choose the encryption type, bucket retention policy, and any bucket labels. The default is a Google-managed key and no retention policy nor labels:
Figure 1.8 – Storage bucket creation process and choices

Figure 1.8 – Storage bucket creation process and choices

Enabling APIs

Now we have a project, but before we start consuming Google Cloud services, we need to enable some APIs. This process needs to be done only once, usually as the project ID is created:

  1. For now, let's enable the Compute Engine API for the project of your choice:
    Figure 1.9 – Google Cloud APIs and Services for the project

    Figure 1.9 – Google Cloud APIs and Services for the project

    Optional: Then select ENABLE APIS AND SERVICES.

    You may do it here now, or as you go through the exercises in this book. If you need to use a particular cloud service for the first time, you can enable the API as you go along:

    Figure 1.10 – Enabling APIs and Services

    Figure 1.10 – Enabling APIs and Services

  2. In the search box, type Compute Engine API:
    Figure 1.11 – Enabling the Compute Engine API

    Figure 1.11 – Enabling the Compute Engine API

  3. You will see the status of the Compute Engine API in your project as shown in the following screenshot. Enable it if it's not already enabled:
Figure 1.12 – Google Cloud Compute Engine API

Figure 1.12 – Google Cloud Compute Engine API

For now, this is good enough. There are more APIs that you'll need as you go through the examples in this book; GCP will ask you to enable the API when relevant. You can do so at that time.

If you wish, you may repeat the preceding procedure to enable several other APIs as well: specifically, the BigQuery API, BigQuery Data Transfer API, BigQuery Connection API, Service Usage API, Cloud Storage, and the Storage Transfer API.

Next, let's take a look at how to move data in a storage bucket into a table inside a BigQuery data warehouse.

Creating a data warehouse

We will use a simple example of putting data stored in a Google Cloud bucket into a table that can be queried by BigQuery. The easiest way to do so is to use the BigQuery UI. Make sure it is in the right project. We will use this example to create a dataset that contains one table.

You can navigate to BigQuery by searching for it in the search bar of the GCP portal, as in the following screenshot:

Figure 1.13 – Searching for BigQuery

Figure 1.13 – Searching for BigQuery

You will see BigQuery being suggested. Click on it and it will take you to the BigQuery portal:

Figure 1.14 – BigQuery and the data warehouse query portal

Figure 1.14 – BigQuery and the data warehouse query portal

Here are the steps to create a persistent table in the BigQuery data warehouse:

  1. Select Create dataset:
    Figure 1.15 – Creating a dataset for the project

    Figure 1.15 – Creating a dataset for the project

  2. Make sure you are in the dataset that you just created. Now click CREATE TABLE:
    Figure 1.16 – Creating a table for the dataset

    Figure 1.16 – Creating a table for the dataset

    In the Source section, under Source, in the Create table from section, select Google Cloud Storage:

    Figure 1.17 – Populating the table by specifying a data source

    Figure 1.17 – Populating the table by specifying a data source

  3. Then it will transition to another dialog box. You may enter the name of the file or use the Browse option to find the file stored in the bucket. In this case, a CSV file has already been put in my Google Cloud Storage bucket. You may either put your own CSV file into the storage bucket, or download the one I used from https://data.mendeley.com/datasets/7xwsksdpy3/1. Also, enter the column names and datatypes as the schema:
    Figure 1.18 – An example of populating a table using an existing CSV file stored in the bucket

    Figure 1.18 – An example of populating a table using an existing CSV file stored in the bucket

  4. In the Schema section, use Auto-detect, and in the Advanced options, since the first row is an array of column names, we need to tell it to skip the first row:
    Figure 1.19 – Handling column names for the table

    Figure 1.19 – Handling column names for the table

  5. Once the table is created, you can click QUERY TABLE to update the SQL query syntax, or just enter this query:
    SELECT * FROM `project1-190517.myworkdataset.iris` LIMIT 1000
  6. Execute the preceding query and now click on Run:
Figure 1.20 – Running a query to examine the table

Figure 1.20 – Running a query to examine the table

There are many different data source types, as well as many different ways to create a data warehouse from raw data. This is just a simple example for structured data. For more information on other data sources and types, please refer to the BigQuery documentation at https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#console.

Now you have learned how to create a persistent table in your BigQuery data warehouse using the raw data in your storage bucket.

We used a CSV file as an example and added it to BigQuery as a table. In the next section, we are going to see how to connect TensorFlow to our data stored in BigQuery and the Cloud Storage bucket. Now we are ready to launch an instance of TensorFlow Enterprise running on AI Platform.

Using TensorFlow Enterprise in AI Platform

In this section, we are going to see firsthand how easy it is to access data stored in one of the Google Cloud Storage options, such as a storage bucket or BigQuery. To do so, we need to configure an environment to execute some example TensorFlow API code and command-line tools in this section. The easiest way to use TensorFlow Enterprise is through the AI Platform Notebook in Google Cloud:

  1. In the GCP portal, search for AI Platform.
  2. Then select NEW INSTANCE, with TensorFlow Enterprise 2.3 and Without GPUs. Then click OPEN JUPYTERLAB:
    Figure 1.21 – The Google Cloud AI Platform and instance creation

    Figure 1.21 – The Google Cloud AI Platform and instance creation

  3. Click on Python 3, and it will provide a new notebook to execute the remainder of this chapter's examples:
Figure 1.22 – A JupyterLab environment hosted by AI Platform

Figure 1.22 – A JupyterLab environment hosted by AI Platform

An instance of TensorFlow Enterprise running on AI Platform is now ready for use. Next, we are going to use this platform to perform some data I/O.

Accessing the data sources

TensorFlow Enterprise can easily access data sources in Google Cloud Storage as well as BigQuery. Either of these data sources can easily host gigabytes to terabytes of data. Reading training data into the JupyterLab runtime at this magnitude of size is definitely out of question, however. Therefore, streaming data as batches through training is the way to handle data ingestion. The tf.data API is the way to build a data ingestion pipeline that aggregates data from files in a distributed system. After this step, the data object can go through transformation steps and evolve into a new data object for training.

In this section, we are going to learn basic coding patterns for the following tasks:

  • Reading data from a Cloud Storage bucket
  • Reading data from a BigQuery table
  • Writing data into a Cloud Storage bucket
  • Writing data into BigQuery table

After this, you will have a good grasp of reading and writing data to a Google Cloud Storage option and persisting your data or objects produced as a result of your TensorFlow runtime.

Cloud Storage Reader

Cloud Storage Reader is integrated with tf.data, so a tf.data object can easily access data in Google Cloud Storage. For example, the following code snippet demonstrates how to read a tfrecord dataset:

my_train_dataset = tf.data.TFRecordDataset('gs://<BUCKET_NAME>/<FILE_NAME>*.tfrecord')
my_train_dataset = my_train_dataset.repeat()
my_train_dataset = my_train_dataset.batch()
…
model.fit(my_train_dataset, …)

In the example preceding pattern, the file stored in the bucket is serialized into tfrecord, which is a binary format of your original data. This is a very common way of storing and serializing large amounts of data or files in the cloud for TensorFlow consumption. This format enables a more efficient read for data being streamed over a network. We will discuss tfrecord in more detail in a future chapter.

BigQuery Reader

Likewise, BigQuery Reader is also integrated into the TensorFlow Enterprise environment, so training data or derived datasets stored in BigQuery can be consumed by TensorFlow Enterprise.

There are three commonly used methods to read a table stored in a BigQuery data warehouse. The first way is the %%bigquery magic command. The second way is using the BigQuery API in a general Python runtime, and the third way is to use TensorFlow I/O. Each has its advantages.

The BigQuery magic command

This method is perfect for running SQL statements directly in a JupyterLab cell. This is equivalent to switching the cell's command interpreter. The %%bigquery interpreter executes a standard SQL query and the results are returned as a pandas DataFrame.

The following code snippet shows how to use the %%bigquery interpreter and assign a pandas DataFrame name to the result. Each step is a JupyterLab cell:

  1. Specify a project ID. This JupyterLab cell uses a default interpreter. Therefore, this is a Python variable. If the BigQuery table is in the same project, then you don't need to specify the project ID:
    project_id = '<PROJECT-XXXXX>'
  2. Invoke the %%bigquery magic command, and assign the project ID and a DataFrame name to hold the result:
    %%bigquery --project $project_id mydataframe
    SELECT * from `bigquery-public-data.covid19_jhu_csse.summary` limit 5

    If the table is in the same project as you currently running from, you don't need --project argument.

  3. Verify the result is a pandas DataFrame:
    type(mydataframe)
  4. Show the DataFrame:
    mydataframe

The complete code snippet for this example is as follows:

Figure 1.23 – Code snippet for BigQuery and Python runtime integration

Figure 1.23 – Code snippet for BigQuery and Python runtime integration

Here are the key takeaways:

  • It is required to have a project ID in order to use the BigQuery API.
  • You may pass a Python variable such as the project ID as a value into the cell that runs the %%bigquery interpreter using the $ prefix.
  • In order for the result to be reusable further by the Python preprocessing functionality or for TensorFlow consumption, you need to specify a name for the DataFrame that will hold the query result.

The Python BigQuery API

The second method by which we can invoke the BigQuery API is through Google Cloud's BigQuery client. This will give us direct access to the data, execute the query, and allow us to receive the results right away. This method does not require the user to know about the table schema. In fact, it simply wraps a SQL statement inside the BigQuery client instantiated through a library call.

This code snippet demonstrates how to invoke the BigQuery API and use it to return the results in a pandas DataFrame:

from google.cloud import bigquery
project_id ='project-xxxxx'
client = bigquery.Client(project=project_id)
sample_count = 1000
row_count = client.query('''
  SELECT 
    COUNT(*) as total
  FROM `bigquery-public-data.covid19_jhu_csse.summary`''').to_dataframe().total[0]
df = client.query('''
  SELECT
    *
  FROM
    `bigquery-public-data.covid19_jhu_csse.summary`
  WHERE RAND() < %d/%d
''' % (sample_count, row_count)).to_dataframe()
print('Full dataset has %d rows' % row_count)

The output of the preceding code is as follows:

Figure 1.24 – Code output

Figure 1.24 – Code output

Let's take a closer look at the preceding code:

  • An import of the BigQuery library is required to create a BigQuery client.
  • The project ID is required for using this API to create a BigQuery client.
  • This client wraps a SQL statement and executes it.
  • The returned data can be easily converted to a pandas DataFrame right away.

The pandas DataFrame rendition of the BigQuery table has the following columns:

Figure 1.25 – The pandas DataFrame rendition of the BigQuery table

Figure 1.25 – The pandas DataFrame rendition of the BigQuery table

This is ready for further consumption. It is now a pandas DataFrame that occupies memory space in your Python runtime.

This method is very straightforward, as it can help you explore the data schema and do simple aggregation and filtering, and since it is basically a SQL statement wrapper, it is very easy to just get the data out of the warehouse and start using it. You didn't have to know much about the table schema to do this.

However, the problem with this approach is when the table is big enough to overflow your memory. TensorFlow I/O can help solve this problem.

TensorFlow I/O

For TensorFlow consumption of BigQuery data, it is better if we use TensorFlow I/O to invoke the BigQuery API. This is because TensorFlow I/O will provide us with a dataset object that represents the query results, rather than the entire results, as in the previous method. A dataset object is the means to stream training data for a model during training. Therefore not all training data has to be in memory at once. This complements mini-batch training, which is arguably the most common implementation of gradient descent optimization used in deep learning. However, this is a bit more complicated than the previous method. It requires you to know the schema of the table. This example uses a public dataset hosted by Google Cloud.

We need to start with the columns of our interest from the table. We can use the previous method to examine the column names and datatypes, and create a session definition:

  1. Load the required libraries and set up the variables as follows:
    import tensorflow as tf
    from tensorflow_io.bigquery import BigQueryClient
    PROJECT_ID = 'project-xxxxx' # This is from what you created in your Google Cloud Account. 
    DATASET_GCP_PROJECT_ID = 'bigquery-public-data'
    DATASET_ID = 'covid19_jhu_csse'
    TABLE_ID = 'summary'
  2. Instantiate a BigQuery client and specify the batch size:
    batch_size = 2048
    client = BigQueryClient()
  3. Use the client to create a read session and specify the columns and datatypes of interest. Notice that when using the BigQuery client, you need to know the correct column names and their respective datatypes:
    read_session = client.read_session(
        'projects/' + PROJECT_ID,
        DATASET_GCP_PROJECT_ID, TABLE_ID, DATASET_ID,
        ['province_state',
           'country_region',
           'confirmed',
           'deaths',
           'date',
           'recovered'
           ],
        [tf.string,
           tf.string,
           tf.int64,
           tf.int64,
           tf.int32,
           tf.int64],
          requested_streams=10
    )
  4. Now we can use the session object created to execute a read operation:
    dataset = read_session.parallel_read_rows(sloppy=True).batch(batch_size)
  5. Let's take a look at the dataset with type():
    type(dataset)

    Here's the output:

    Figure 1.26 – Output

    Figure 1.26 – Output

  6. In order to actually see the data, we need to convert the dataset ops to a Python iterator and use next() to see the content of the first batch:
    itr = tf.compat.v1.data.make_one_shot_iterator(
        dataset
    )
     next(itr)

The output of the preceding command shows it is organized as an ordered dictionary, where the keys are column names and the values are Tensors:

Figure 1.27 – Raw data as an iterator

Figure 1.27 – Raw data as an iterator

Here are the key takeaways:

  • TensorFlow I/O's BigQuery Client requires setting up a read session, which consists of column names from your table of interest.
  • This client then executes a read operation that also includes data batching.
  • The output of the read operation is a TensorFlow ops.
  • This ops may be converted to a Python iterator, so it can output the actual data read by the ops.
  • This improves the efficiency of memory use during training, as data is sent for training in batches.

Persisting data in BigQuery

We have looked at how to read data stored in Google Storage solutions, such as Cloud Storage buckets or a BigQuery data warehouse, and how to enable the data for consumption by AI Platform's TensorFlow Enterprise instance running in JupyterLab. Now let's take a look at some ways to write data back, or persist our working data, into our cloud Storage.

Our first example concerns writing a file stored in JupyterLab runtime's directory (in some TensorFlow Enterprise documentations, this is also referred to as a local file). The process in general is as follows:

  1. For convenience, execute a BigQuery SQL read command on a table from a public dataset.
  2. Store the result locally as a comma-separated file (CSV).
  3. Write the CSV file to a table in our BigQuery dataset.

Each step is a code cell. The following step-by-step code snippet applies to JupyterLab in any of the three AI platforms (AI Notebook, AI Deep Learning VM, and Deep Learning Container):

  1. Designate a project ID:
    project_id = 'project1-190517'
  2. Execute the BigQuery SQL command and assign the result to a pandas DataFrame:
    %%bigquery --project $project_id mydataframe
    SELECT * from `bigquery-public-data.covid19_jhu_csse.summary`

    The BigQuery results come back as a pandas DataFrame by default. In this case, we designate the DataFrame name to be mydataframe.

  3. Write the pandas DataFrame to a CSV file in a local directory. In this case, we used the /home directory of this JupyterLab runtime:
    import pandas as pd
    mydataframe.to_csv('my_new_data.csv')
  4. Designate a dataset name:
    dataset_id = 'my_new_dataset'
  5. Use the BigQuery command-line tool to create an empty table in this project's dataset. This command starts with !bq:
    !bq --location=US mk --dataset $dataset_id

    This command creates a new dataset. This dataset doesn't have any tables yet. We are going to write a new table into this dataset in the next step.

  6. Write the local CSV file to a new table:
    !bq \
        --location=US \
        load \
        --autodetect \
        --skip_leading_rows=1 \
        --source_format=CSV \
        {dataset_id}.my_new_data_table \
        'my_new_data.csv'

    In this command, since the CSV file is stored in the current directory, its filename of 'my_new_data.csv' will suffice. Otherwise, a full path is required. Also, {dataset_id}.my_new_data_table indicates that we want to write the CSV file into this particular dataset and the table name.

  7. Now you can navigate to the BigQuery portal, and you will find the dataset and the table:
    Figure 1.28 – The BigQuery portal and navigation to the dataset

    Figure 1.28 – The BigQuery portal and navigation to the dataset

    In this case, we have one dataset, which contains one table.

  8. Then, execute a simple query, as follows:
Figure 1.29 – A query for examining the table

Figure 1.29 – A query for examining the table

This is a very simple query where we just want to show 1,000 randomly selected rows. You can now execute this query and the output will be as shown in the following screenshot.

The following query output shows the data from the BigQuery table we just created:

Figure 1.30 – Example table output

Figure 1.30 – Example table output

Here are the key takeaways:

  • Data generated during the TensorFlow workflow in the AI Platform's JupyterLab runtime can be seamlessly persisted as a table in BigQuery.
  • Persisting data in a structured format, such as a pandas DataFrame or a CSV file, in BigQuery can easily be done using the BigQuery command-line tool.
  • When you need to move a data object (such as a table) between the JupyterLab runtime and BigQuery, use the BigQuery command-line tool with !bq to save time and effort.

Persisting data in a storage bucket

In the previous Persisting data in BigQuery section, we saw how a structured data source such as a CSV file or a pandas DataFrame can be persisted in a BigQuery dataset as a table. In this section, we are going to see how to persist working data such as a NumPy array. In this case, the suitable target storage is a Google Cloud Storage bucket.

The workflow for this demonstration is as follows:

  1. For convenience, read a NumPy array from tf.keras.dataset.
  2. Save the NumPy array as a pickle (pkl) file. (FYI: The pickle file format, while convenient and easy to use for serializing Python objects, also has its downsides. For one, it may be slow and creates a larger object than the original. Second, a pickle file may contain bugs or security risks for any process that opens it. It is used only for convenience here.)
  3. Use the !gsutil storage command-line tool to transfer files from JupyterLab's /home directory (in some documentation, this is referred to as the local directory) to the storage bucket.
  4. Use !gsutil to transfer the content in the bucket back to the JupyterLab runtime. Since we will use Python with !gsutil, we need to keep the content in separate cells.

Follow these steps to complete the workflow:

  1. Let's use the IMDB dataset because it is already provided in NumPy format:
    import tensorflow as tf
    import pickle as pkl
    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.imdb.load_data(
        path='imdb.npz',
        num_words=None,
        skip_top=0,
        maxlen=None,
        seed=113,
        start_char=1,
        oov_char=2,
        index_from=3
    )
    with open('/home/jupyter/x_train.pkl','wb') as f:
        pkl.dump(x_train, f)

    x_train, y_train, x_test, and y_test are returned as NumPy arrays. Let's use x_train for the purposes of this demonstration. The x_train array is going to be saved as a pkl file in the JupyterLab runtime.

    The preceding code opens the IMDB movie review dataset that is distributed as a part of TensorFlow. This dataset is formatted as tuples of NumPy arrays and separated as training and test partitions. Then we proceed to save the x_train array as a pickle file in the runtime's /home directory. This pickle file will then be persisted in a storage bucket in the next step.

  2. Designate a name for the new storage bucket:
    bucket_name = 'ai-platform-bucket'
  3. Create a new bucket with the designated name:
    !gsutil mb gs://{bucket_name}/

    Use !gsutil to move the pkl file from the runtime to the storage bucket:

    !gsutil cp /home/jupyter/x_train.pkl gs://{bucket_name}/
  4. Read the pkl file back:
    !gsutil cp gs://{bucket_name}/x_train.pkl /home/jupyter/x_train_readback.pkl
  5. Now let's inspect the Cloud Storage bucket:
Figure 1.31 – Serializing an object in a bucket from the workflow in AI Platform

Figure 1.31 – Serializing an object in a bucket from the workflow in AI Platform

Here are the key takeaways:

  • Working data generated during the TensorFlow workflow can be persisted as a serialized object in the storage bucket.
  • Google AI Platform's JupyterLab environment provides seamless integration between the TensorFlow runtime and the Cloud Storage command-line tool, gsutil.
  • When you need to transfer content between Google Cloud Storage and AI Platform, use the !gsutil command-line tool.

Summary

This chapter provided a broad overview of the TensorFlow Enterprise environment hosted by Google Cloud AI Platform. We also saw how this platform seamlessly integrates specific tools such as command-line APIs to facilitate the easy transfer of data or objects between the JupyterLab environment and our storage solutions. These tools make it easy to access data stored in BigQuery or in storage buckets, which are the two most commonly used data sources in TensorFlow.

In the next chapter, we will take a closer look at the three ways available in AI Platform to use TensorFlow Enterprise: the Notebook, Deep Learning VM, and Deep Learning Containers.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Build scalable, seamless, and enterprise-ready cloud-based machine learning applications using TensorFlow Enterprise
  • Discover how to accelerate the machine learning development life cycle using enterprise-grade services
  • Manage Google’s cloud services to scale and optimize AI models in production

Description

TensorFlow as a machine learning (ML) library has matured into a production-ready ecosystem. This beginner’s book uses practical examples to enable you to build and deploy TensorFlow models using optimal settings that ensure long-term support without having to worry about library deprecation or being left behind when it comes to bug fixes or workarounds. The book begins by showing you how to refine your TensorFlow project and set it up for enterprise-level deployment. You’ll then learn how to choose a future-proof version of TensorFlow. As you advance, you’ll find out how to build and deploy models in a robust and stable environment by following recommended practices made available in TensorFlow Enterprise. This book also teaches you how to manage your services better and enhance the performance and reliability of your artificial intelligence (AI) applications. You’ll discover how to use various enterprise-ready services to accelerate your ML and AI workflows on Google Cloud Platform (GCP). Finally, you’ll scale your ML models and handle heavy workloads across CPUs, GPUs, and Cloud TPUs. By the end of this TensorFlow book, you’ll have learned the patterns needed for TensorFlow Enterprise model development, data pipelines, training, and deployment.

Who is this book for?

This book is for data scientists, machine learning developers or engineers, and cloud practitioners who want to learn and implement various services and features offered by TensorFlow Enterprise from scratch. Basic knowledge of the machine learning development process will be useful.

What you will learn

  • Discover how to set up a GCP TensorFlow Enterprise cloud instance and environment
  • Handle and format raw data that can be consumed by the TensorFlow model training process
  • Develop ML models and leverage prebuilt models using the TensorFlow Enterprise API
  • Use distributed training strategies and implement hyperparameter tuning to scale and improve your model training experiments
  • Scale the training process by using GPU and TPU clusters
  • Adopt the latest model optimization techniques and deployment methodologies to improve model efficiency

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Nov 27, 2020
Length: 314 pages
Edition : 1st
Language : English
ISBN-13 : 9781800204874
Category :
Languages :
Tools :

What do you get with eBook?

Product feature icon Instant access to your Digital eBook purchase
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
Modal Close icon
Payment Processing...
tick Completed

Billing Address

Product Details

Publication date : Nov 27, 2020
Length: 314 pages
Edition : 1st
Language : English
ISBN-13 : 9781800204874
Category :
Languages :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
₹800 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
₹4500 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts
₹5000 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just ₹400 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 11,544.97
Learn TensorFlow Enterprise
₹3425.99
Mastering Reinforcement Learning with Python
₹3798.99
Machine Learning for Algorithmic Trading
₹4319.99
Total 11,544.97 Stars icon

Table of Contents

14 Chapters
Section 1 – TensorFlow Enterprise Services and Features Chevron down icon Chevron up icon
Chapter 1: Overview of TensorFlow Enterprise Chevron down icon Chevron up icon
Chapter 2: Running TensorFlow Enterprise in Google AI Platform Chevron down icon Chevron up icon
Section 2 – Data Preprocessing and Modeling Chevron down icon Chevron up icon
Chapter 3: Data Preparation and Manipulation Techniques Chevron down icon Chevron up icon
Chapter 4: Reusable Models and Scalable Data Pipelines Chevron down icon Chevron up icon
Section 3 – Scaling and Tuning ML Works Chevron down icon Chevron up icon
Chapter 5: Training at Scale Chevron down icon Chevron up icon
Chapter 6: Hyperparameter Tuning Chevron down icon Chevron up icon
Section 4 – Model Optimization and Deployment Chevron down icon Chevron up icon
Chapter 7: Model Optimization Chevron down icon Chevron up icon
Chapter 8: Best Practices for Model Training and Performance Chevron down icon Chevron up icon
Chapter 9: Serving a TensorFlow Model Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(7 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Kay T Mar 08, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is unlike myriads of Tensorflow or machine learning books in the market. If you are interested in enterprise level use and deployment of Tensorflow models, this is the right book. This book helps me to go beyond the baby-steps of learning how to build Tensorflow ML models. All examples throughout this book uses either datasets or TFRecord data structure. I did not have a good understanding about these data structure before I bought and read this book. I often wonder why we need such data structures. After I read this book, I now understand that in an enterprise or production level, knowing how to handle distributed data is a must-have skill. I am glad that the author chose to use such enterprise-relevant data structures throughout the examples in this book. I would say this is a unique aspect of this book which differentiates it from other Tensorflow books. Another nice touch is that instead of teaching you how to build a ML model, it uses pre-built models Tensorflow Hub, and show me how to make it work for my own data. I learned transfer learning for the first time with book.To make most use of this book, it is important to clone the accompanying GitHub directory.For Tensorflow to be useful at enterprise level, it is important to have cloud integration. This book also helped me get started with learning how to use Google Cloud AI Platform with the integration to BigQuery data warehouse. Further, this book also contains step by step instructions on how to leverage cloud TPU and GPU to perform distributed training job. As a matter of fact, I now realize how important it is to use cloud TPU or GPU for time consuming job such as hyperparameter optimization. And the book shows me exactly how to do it. This book also did a very good job helping me learn how to different hyperparameter tuning methods work. I learned how hyperband algorithm works for the first time. That’s a delightful surprise.This book also describes how model optimization works, and why it is important. I didn’t realize that model size can be reduced by so much and yet retains similar or identical accuracy. Now I realize that once the model is built, optimization is always a good idea to make it more light-weight. And finally, when it comes to deployment, this book helped me understand how to serve the model behind a REST API using Tensorflow Serving. Model serving is a complicated issue in enterprise setting. This book helps me acquire the table stake knowledge about serving a model using Docker container. With the way described by this book, it turned out to be much easier than I thought.So overall, I would rate this book as a five-star book, and really appreciate the thoughts and works put in this book by the author. I definitely recommend it for anyone who has Tensorflow experience, and looking to take their skills to another level, which is much more relevant and practical for enterprise use. Well done.
Amazon Verified review Amazon
Christian P. Mar 09, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Cloud computing is a relevant tool for companies to achieve their digital transformation, especially for those developing data-driven business models based on Deep Learning Frameworks as a core technology. Commence in cloud technologies can be overwhelming for beginners due to the tons of online tutorials, material, and documentation that are not always updated, creating frustration and delaying these technologies' adoption.The book is a useful guide and a great starting point for deploying the first AI-based applications for those who initiate Cloud Computing. Also, it is an excellent complementary material for those currently working as MLOps Engineers that want to understand advanced options in TensorFlow Enterprise and Google Cloud Computing. It includes sections with practical hands-on material easy to read and follow. Chapters handle relevant topics such as creating a data warehouse on the cloud, accessing the data efficiently using Tensorflow from different pythonic formats such as Pandas DataFrames or Numpy arrays, and small but valuable tips about using TPU instead of using a GPU. The hand-on material is available on Github, and such is a good source to start developing your ideas
Amazon Verified review Amazon
laksh Apr 30, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
One single source of reference material for tensorflow. Practical guide.
Amazon Verified review Amazon
Aishwary Feb 23, 2021
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a perfect introduction to TensorFlow, from learning basics to advanced features like model deployment in production. The best part about this book is sample code as a part of the explanation and images to illustrate the UI components on Google Cloud Platform. This book has an excellent use case to work google cloud AI notebooks while leveraging big data suite tools (tools like BigQuery). One of the other good things about this book is that it does not leave any concepts half explained. I liked the section on transfer learning and hyperparameter tuning (section 3 - chapter 4 and 6) the most. It also has details about working with TFRecords, which is an essential feature to work with data in the real world. Even if you do not work to deploy models in production, I would recommend every deep learning practitioner to read this book to get a perfect experience with all the concepts that one requires to leverage on a day-to-day basis.
Amazon Verified review Amazon
SID ALLA Dec 14, 2020
Full star icon Full star icon Full star icon Full star icon Full star icon 5
I have been doing deep learning modeling using tensorflow programming for couple of years and its hard to actually build the model picking right number of GPU machines , setting them up properly, connecting data sources, buidl train models and then take the model to production. I have used the other public clouds but its much much easier on google cloud as they created Tensorflow in the first place. I also bought some beefy gaming machines with Nvidia chipsets but always had the ceremonial steps to do before i could do anything practical at cloud scale.This books explains clearly how to set up the sources in data warehouse, how to create notebooks, build models and deploy them without breaking the bank as its all taken care by Google Cloud.I have to point out the toughest parts of Deep Learning are the scaling using TPUs and GPUs and this book covers those aspects too. Not to underestimate, serving models are tricky and there are just too many options to do it. This book walks you through a good way to serve such models.Frankly this book saves time you will spend going through the many docs online and lets you quickly start from introduction and takes you to production.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

How do I buy and download an eBook? Chevron down icon Chevron up icon

Where there is an eBook version of a title available, you can buy it from the book details for that title. Add either the standalone eBook or the eBook and print book bundle to your shopping cart. Your eBook will show in your cart as a product on its own. After completing checkout and payment in the normal way, you will receive your receipt on the screen containing a link to a personalised PDF download file. This link will remain active for 30 days. You can download backup copies of the file by logging in to your account at any time.

If you already have Adobe reader installed, then clicking on the link will download and open the PDF file directly. If you don't, then save the PDF file on your machine and download the Reader to view it.

Please Note: Packt eBooks are non-returnable and non-refundable.

Packt eBook and Licensing When you buy an eBook from Packt Publishing, completing your purchase means you accept the terms of our licence agreement. Please read the full text of the agreement. In it we have tried to balance the need for the ebook to be usable for you the reader with our needs to protect the rights of us as Publishers and of our authors. In summary, the agreement says:

  • You may make copies of your eBook for your own use onto any machine
  • You may not pass copies of the eBook on to anyone else
How can I make a purchase on your website? Chevron down icon Chevron up icon

If you want to purchase a video course, eBook or Bundle (Print+eBook) please follow below steps:

  1. Register on our website using your email address and the password.
  2. Search for the title by name or ISBN using the search option.
  3. Select the title you want to purchase.
  4. Choose the format you wish to purchase the title in; if you order the Print Book, you get a free eBook copy of the same title. 
  5. Proceed with the checkout process (payment to be made using Credit Card, Debit Cart, or PayPal)
Where can I access support around an eBook? Chevron down icon Chevron up icon
  • If you experience a problem with using or installing Adobe Reader, the contact Adobe directly.
  • To view the errata for the book, see www.packtpub.com/support and view the pages for the title you have.
  • To view your account details or to download a new copy of the book go to www.packtpub.com/account
  • To contact us directly if a problem is not resolved, use www.packtpub.com/contact-us
What eBook formats do Packt support? Chevron down icon Chevron up icon

Our eBooks are currently available in a variety of formats such as PDF and ePubs. In the future, this may well change with trends and development in technology, but please note that our PDFs are not Adobe eBook Reader format, which has greater restrictions on security.

You will need to use Adobe Reader v9 or later in order to read Packt's PDF eBooks.

What are the benefits of eBooks? Chevron down icon Chevron up icon
  • You can get the information you need immediately
  • You can easily take them with you on a laptop
  • You can download them an unlimited number of times
  • You can print them out
  • They are copy-paste enabled
  • They are searchable
  • There is no password protection
  • They are lower price than print
  • They save resources and space
What is an eBook? Chevron down icon Chevron up icon

Packt eBooks are a complete electronic version of the print edition, available in PDF and ePub formats. Every piece of content down to the page numbering is the same. Because we save the costs of printing and shipping the book to you, we are able to offer eBooks at a lower cost than print editions.

When you have purchased an eBook, simply login to your account and click on the link in Your Download Area. We recommend you saving the file to your hard drive before opening it.

For optimal viewing of our eBooks, we recommend you download and install the free Adobe Reader version 9.

Modal Close icon
Modal Close icon