You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Product typeBook

Published inFeb 2024

Reading LevelN/a

PublisherPackt

ISBN-139781805128724

Edition3rd Edition

Languages

Python

Tools

PyTorch

Concepts

Deep Learning

Author (1)

Denis Rothman

Transcending the Image-Text Boundary with Stable Diffusion

The essence of a diffusion model relies on the freedom it possesses to invent pixels when generating an image. From that perspective, diffusion models have taken text-to-image generation to another level. Instead of trying to create the exact representation of images they have learned, diffusion models can imagine pixels within the boundaries of the text provided.

Stability AI is a leader in Generative AI. They produced Stable Diffusion, one of the fastest-growing AI projects. From that concept, mind-blowing applications have begun to appear in every direction, including Midjourney’s application on Discord and Runway’s Gen-2 that we will encounter in Chapter 19, On the Road to Functional AGI with HuggingGPT and its Peers and Chapter 20, Beyond Human-Designed Prompts with Generative Ideation.

The goal of this chapter is not to attempt to analyze the many Stable Diffusion architectures flowing into the...

Transcending image generation boundaries

Let’s begin with a thought experiment. Imagine an art teacher telling your class of students a story about visiting a wonderful house with a big garden with old trees and beautiful flowers.

Now, the teacher gives you a piece of strange canvas with many dots (pixels of noise in an image). This mysterious piece of paper is a potential (latent) space of hidden forms you must find in your mental representation of the words (text) the teacher spoke. As you erase the dots and replace them with your ideas, you are dispersing them (diffusion). You obtain a small sketch of the objects you imagined. Your drawing is incomplete, and it’s a smaller view of what you thought. You just represented the main forms you saw. You downsampled your representation.

The fun now begins. You show each other your sketches. Although every drawing shows a house, not one is the same! Your teacher now provides incredible oil painting techniques to fill...

Part I: Defining text-to-image with Stable Diffusion

We will explore at a very low level the main Python files of the Keras version of Stable Diffusion, as shown in Figure 17.2. The complete code can be found at: https://github.com/keras-team/keras-cv/tree/master/keras_cv/models/stable_diffusion:

A diagram of a flowchart Description automatically generated

Figure 17.2: Stable Diffusion, Keras implementation

Figure 17.2 shows the Stable Diffusion architecture of the code we will explore that can be summed up in five phases:

Text embedding.
Random image creation.
Stable Diffusion downsampling.
Decoder upsampling.
Output image.

The Keras Stable Diffusion code itself is only 500 lines long!

We will describe each function’s function, make a high-level mathematical representation, and find the Python classes that execute the process.

We will end the analysis by running a Keras notebook illustrating their talented compact code approach.

1. Text embedding using a transformer...

Part II: Running text-to-image with Stable Diffusion

Stability AI is a leader in Generative AI. They produced Stable Diffusion, one of the fastest-growing AI projects. This section takes you to the forefront of Stable Diffusion. To get started, you must sign up and obtain an API key: https://platform.stability.ai/docs/getting-started/python-sdk. Check the pricing policy before running the Stable Diffusion model.

Open Stable _Vision_Stability_AI.ipynb.

We first install the SDK:

!pip install stability-sdk

Then we clone the Stability modules:

!git clone --recurse-submodules https://github.com/Stability-AI/stability-sdk

We define the Stability host, which is where to connect to the Stability API server, and our key, or define it when making the request:

!export STABILITY_HOST=grpc.stability.ai:443
#!export STABILITY_KEY=[YOUR_KEY]

We can now ask Stability to generate an image based on the following prompt:

!python -m stability_sdk generate "...

Part III: Video

Text-to-video opens new horizons for diffusion models. The models generate n frames and make incredible animations and videos.

Open Stable_Vision_Stability_AI_Animation.ipynb.

Text-to-video with Stability AI animation

First, make sure you have signed up on Stability AI and have your API key: https://platform.stability.ai/docs/features/animation.

We will now install the Stability SDK for animations:

!pip install "stability_sdk[anim_ui]"   # Install the Animation SDK
!git clone --recurse-submodules https://github.com/Stability-AI/stability-sdk

We import the API and initialize the host. We also set our API key:

from stability_sdk import api
STABILITY_HOST = "grpc.stability.ai:443"
STABILITY_KEY = [ENTER YOUR KEY HERE]
context = api.Context(STABILITY_HOST, STABILITY_KEY)

We now import the modules and configure the parameters. The following code uses the default Stability AI arguments:

from stability_sdk.animation...

Summary

Stable Diffusion has transcended the boundaries of classical AI imagery. Introducing creative freedom (“noise”) through diffusion in a latent space has opened the doors to huge generative computer vision possibilities.

We began the chapter by going through the Stable Diffusion process with a thought experiment and then with the talented Keras implementation. We went through the encoding of a contextualized input text, introducing a “noisy” (open to creativity) image patch, applying diffusion to this image to reduce (downsampling) it to a lower dimension, and then upsampling it to a 512x512 image patch. The output was astonishing for such a compact source code.

We then ran a Stability AI text-to-image notebook that also generated surprising images. We once again saw that diffusion is taking us to levels we never would have imagined including divergent association tasks.

Stability AI also provided a text-to-animation API to transform one...

Questions

Stable Diffusion requires a text encoder. (True/False)
Stable Diffusion requires diffusion layers. (True/False)
A Keras Stable Diffusion reduces a noisy image to a lower dimensionality. (True/False)
A Keras Stable Diffusion model upsamples an image once it is downsampled. (True/False)
The final output of a diffusion model is a “noisy” image. (True/False)
OpenAI CLIP cannot produce a text-to-video model yet. (True/False)
Stability AI cannot convert one image to another in a video. (True/False)
Meta’s TimeSformer is a scheduling algorithm, not a computer vision model. (True/False)
It will never be possible to create a complete movie automatically. (True/False)
There is a hardware limit to generate videos automatically beyond 10 seconds. (True/False)

References

Keras Stable Diffusion: https://keras.io/guides/keras_cv/generate_images_with_stable_diffusion/
Stability AI: https://stability.ai/
Stability AI Stable Diffusion: https://stability.ai/stablediffusion
OpenAI CLIP implementation with ModelScope: https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis
TimeSformer: https://huggingface.co/docs/transformers/model_doc/timesformer

Denis Rothman graduated from Sorbonne University and Paris-Diderot University, designing one of the very first word2matrix patented embedding and patented AI conversational agents. He began his career authoring one of the first AI cognitive Natural Language Processing (NLP) chatbots applied as an automated language teacher for Moet et Chandon and other companies. He authored an AI resource optimizer for IBM and apparel producers. He then authored an Advanced Planning and Scheduling (APS) solution used worldwide.
Read more about Denis Rothman

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Transcending the Image-Text Boundary with Stable Diffusion

Transcending image generation boundaries

Part I: Defining text-to-image with Stable Diffusion

1. Text embedding using a transformer...

Part II: Running text-to-image with Stable Diffusion

Part III: Video

Text-to-video with Stability AI animation

Summary

Questions

References

Further reading

Join our community on Discord

Author (1)

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook

You're reading from Transformers for Natural Language Processing and Computer Vision - Third Edition

Transcending the Image-Text Boundary with Stable Diffusion

Transcending image generation boundaries

Part I: Defining text-to-image with Stable Diffusion

1. Text embedding using a transformer...

Part II: Running text-to-image with Stable Diffusion

Part III: Video

Text-to-video with Stability AI animation

Summary

Questions

References

Further reading

Join our community on Discord

Unlock this book and the full library FREE for 7 days

Author (1)

Et al.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

Mastering Tableau 2023

Building AI Applications with ChatGPT APIs

Building AI Applications with ChatGPT APIs

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

Modern Data Architecture on AWS

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

TinyML Cookbook