You're reading from Generative AI with Python and TensorFlow 2

Product type Book

Published in Apr 2021

Publisher Packt

ISBN-13 9781800200883

Pages 488 pages

Edition 1st Edition

Languages

Concepts

Artificial Intelligence

Authors (2):

Joseph Babcock

Raghav Bali

View More author details

Table of Contents (16) Chapters

Preface

1. An Introduction to Generative AI: "Drawing" Data from Models

2. Setting Up a TensorFlow Lab

3. Building Blocks of Deep Neural Networks

4. Teaching Networks to Generate Digits

5. Painting Pictures with Neural Networks Using VAEs

6. Image Generation with GANs

7. Style Transfer with GANs

8. Deepfakes with GANs

9. The Rise of Methods for Text Generation

10. NLP 2.0: Using Transformers to Generate Text

11. Composing Music with Generative Models

12. Play Video Games with Generative AI: GAIL

13. Emerging Applications in Generative AI

14. Other Books You May Enjoy

15. Index

Kubernetes: Robust management of multi-container applications

The Kubernetes project – sometimes abbreviated as k8s – was born out of an internal container management project at Google known as Borg. Kubernetes comes from the Greek word for navigator, as denoted by the seven-spoke wheel of the project's logo.¹⁸ Kubernetes is written in the Go programming language and provides a robust framework to deploy and manage Docker container applications on the underlying resources managed by cloud providers (such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)).

Kubernetes is fundamentally a tool to control applications composed of one or more Docker containers deployed in the cloud; this collection of containers is known as a pod. Each pod can have one or more copies (to allow redundancy), which is known as a replicaset. The two main components of a Kubernetes deployment are a control plane and nodes. The control plane hosts the centralized logic for deploying and managing pods, and consists of (Figure 2.4):

Figure 2.4: Kubernetes components¹⁸

Kube-api-server: This is the main application that listens to commands from the user to deploy or update a pod, or manages external access to pods via ingress.
Kube-controller-manager: An application to manage functions such as controlling the number of replicas per pod.
Cloud-controller-manager: Manages functions particular to a cloud provider.
Etcd: A key-value store that maintains the environment and state variables of different pods.
Kube-scheduler: An application that is responsible for finding workers to run a pod.

While we could set up our own control plane, in practice we will usually have this function managed by our cloud provider, such as Google's Google Kubernetes Engine (GKE) or Amazon's Elastic Kubernetes Services (EKS). The Kubernetes nodes – the individual machines in the cluster – each run an application known as a kubelet, which monitors the pod(s) running on that node.

Now that we have a high-level view of the Kubernetes system, let's look at the important commands you will need to interact with a Kubernetes cluster, update its components, and start and stop applications.

Important Kubernetes commands

In order to interact with a Kubernetes cluster running in the cloud, we typically utilize the Kubernetes command-line tool (kubectl). Instructions for installing kubectl for your operating system can be found at (https://kubernetes.io/docs/tasks/tools/install-kubectl/). To verify that you have successfully installed kubectl, you can again run the help command in the terminal:

kubectl --help

Like Docker, kubectl has many commands; the important one that we will use is the apply command, which, like docker-compose, takes in a YAML file as input and communicates with the Kubernetes control plane to start, update, or stop pods:

kubectl apply -f <file.yaml>

As an example of how the apply command works, let us look at a YAML file for deploying a web server (nginx) application:

apiVersion: v1
kind: Service
metadata:
  name: my-nginx-svc
  labels:
    app: nginx
spec:
  type: LoadBalancer
  ports:
  - port: 80
  selector:
    app: nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80

The resources specified in this file are created on the Kubernetes cluster nodes in the order in which they are listed in the file. First, we create the load balancer, which routes external traffic between copies of the nginx web server. The metadata is used to tag these applications for querying later using kubectl. Secondly, we create a set of 3 replicas of the nginx pod, using a consistent container (image 1.7.9), which uses port 80 on their respective containers.

The same set of physical resources of a Kubernetes cluster can be shared among several virtual clusters using namespaces – this allows us to segregate resources among multiple users or groups. This can allow, for example, each team to run their own set of applications and logically behave as if they are the only users. Later, in our discussion of Kubeflow, we will see how this feature can be used to logically partition projects on the same Kubeflow instance.

Kustomize for configuration management

Like most code, we most likely want to ultimately store the YAML files we use to issue commands to Kubernetes in a version control system such as Git. This leads to some cases where this format might not be ideal: for example, in a machine learning pipeline, we might perform hyperparameter searches where the same application is being run with slightly different parameters, leading to a glut of duplicate command files.

Or, we might have arguments, such as AWS account keys, that for security reasons we do not want to store in a text file. We might also want to increase reuse by splitting our command into a base and additions; for example, in the YAML file shown in Code 2.1, if we wanted to run ngnix alongside different databases, or specify file storage in the different cloud object stores provided by Amazon, Google, and Microsoft Azure.

For these use cases, we will make use of the Kustomize tool (https://kustomize.io), which is also available through kubectl as:

kubectl apply -k <kustomization.yaml>

Alternatively, we could use the Kustomize command-line tool. A kustomization.yaml is a template for a Kubernetes application; for example, consider the following template for the training job in the Kubeflow example repository (https://github.com/kubeflow/pipelines/blob/master/manifests/kustomize/sample/kustomization.yaml):

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
  # Or
# github.com/kubeflow/pipelines/manifests/kustomize/env/gcp?ref=1.0.0
  - ../env/gcp
  # Kubeflow Pipelines servers are capable of 
  # collecting Prometheus metrics.
  # If you want to monitor your Kubeflow Pipelines servers 
  # with those metrics, you'll need a Prometheus server 
  # in your Kubeflow Pipelines cluster.
  # If you don't already have a Prometheus server up, you 
  # can uncomment the following configuration files for Prometheus.
  # If you have your own Prometheus server up already 
  # or you don't want a Prometheus server for monitoring, 
  # you can comment the following line out.
  # - ../third_party/prometheus
  # - ../third_party/grafana
# Identifier for application manager to apply ownerReference.
# The ownerReference ensures the resources get garbage collected
# when application is deleted.
commonLabels:
  application-crd-id: kubeflow-pipelines
# Used by Kustomize
configMapGenerator:
  - name: pipeline-install-config
    env: params.env
    behavior: merge
secretGenerator:
  - name: mysql-secret
    env: params-db-secret.env
    behavior: merge
# !!! If you want to customize the namespace,
# please also update 
# sample/cluster-scoped-resources/kustomization.yaml's 
# namespace field to the same value
namespace: kubeflow
#### Customization ###
# 1. Change values in params.env file
# 2. Change values in params-db-secret.env 
# file for CloudSQL username and password
# 3. kubectl apply -k ./
####

We can see that this file refers to a base set of configurations in a separate kustomization.yaml file located at the relative path ../base. To edit variables in this file, for instance, to change the namespace for the application, we would run:

kustomize edit set namespace mykube

We could also add configuration maps to pass to the training job, using a key-value format, for example:

kustomize edit add configmap configMapGenerator --from-literal=myVar=myVal

Finally, when we are ready to execute these commands on Kubernetes, we can build the necessary kubectl command dynamically and apply it, assuming kustomization.yaml is in the current directory.

kustomize build . |kubectl apply -f -

Hopefully, these examples demonstrate how Kustomize provides a flexible way to generate the YAML we need for kubectl using a template; we will make use of it often in the process of parameterizing our workflows later in this book.

Now that we have covered how Kubernetes manages Docker applications in the cloud, and how Kustomize can allow us to flexibly reuse kubectl yaml commands, let's look at how these components are tied together in Kubeflow to run the kinds of experiments we will be undertaking later to create generative AI models in TensorFlow.