#44:Generative AI with PyTorch, uv Update, Choosing the Best Visualization Type, and FastAPI for Rapid Development

Hi ,

Welcome to a brand new issue of PythonPro!

In today’sExpert Insight we bring you an excerpt from the recently published book, Generative AI Foundations in Python, which provides a hands-on guide to implementing generative AI models—GANs, diffusion models, and transformers—using PyTorch and the diffusers library.

News Highlights:Theuv Python packaging tool now offers comprehensive project management, tool installation, and support for single-file scripts; and Tach, written in Rust, enforces strict interfaces and dependency management for Python

Here are my top 5 picks from our learning resources today:

And, in today’sFeatured Study, we introduce PyRoboCOP, a Python-based package designed for optimizing robotic control and collision avoidance in complex environments.

Stay awesome!

Divya Anne Selvaraj

Editor-in-Chief

P.S.:We have covered all requests made so far this month, in this issue.

Sign Up|Advertise

🐍 Python in the Tech 💻 Jungle 🌳

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-0

🗞️News

uv: Unified Python packaging:The tool now offers end-to-end project management, tool installation, Python bootstrapping, and support for single-file scripts with embedded dependencies, all within a unified, fast, and reliable interface.
Tach - Strict interfaces and dep management forPython, written in Rust:Inspired by modular monolithic architecture, Tach allows you to define dependencies and ensures that modules only import from authorized dependencies.

💼Case Studies and Experiments🔬

Using ffmpeg, yt-dlp, and gpt-4o to Automate Extraction and Explanation of Python Code from YouTubeVideos:Details downloading video segments, capturing screenshots, extracting code from images using GPT, and then explaining the code with an LLM.
Packaging Python and PyTorch for a Machine Learning Application:Discusses the challenges of packaging Python and PyTorch for the Transformer Lab application, aiming for a seamless user experience across various operating systems.

📊Analysis

🎥Charlie Marsh on Astral, uv, and the Python packaging ecosystem:Discusses insights on the development of Astral's uv tool, a cargo-like tool for Python, following a significant upgrade.
CPython Compiler Hardening:Outlines the author’s process of selecting and testing compiler options, addressing challenges like excessive warnings, performance impacts, and developing tools to track and manage these warnings

🎓Tutorials and Guides🤓

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-1

Flatten JSON data with different methods using Python:Techniques discussed include usingpandas'json_normalize, recursive functions, theflatten_jsonlibrary, custom functions, and tools like PySpark and SQL.
FastAPI Tutorial - Build APIs with Python in Minutes:Guides you through setting up a development environment, creating a FastAPI app, building a logistic regression classifier, defining data models with Pydantic, and setting up API endpoints for predictions.
What's the deal with setuptools, setup.py, pyproject.toml, and wheels?:Provides a detailed explanation of Python packaging tools and practices, offering insights and recommendations for how to approach packaging in modern projects.
Python's Preprocessor:Debunks the myth that Python lacks a preprocessor by demonstrating how Python can be extended and customized through the use of custom codecs and path configuration files.
📖Open Access Book |Kalman and Bayesian Filters in Python:Addresses the need for a practical introduction to Kalman filtering, offering accessible explanations and examples, along with exercises with answers and supporting libraries.
Python Backend Development - A Complete Guide for Beginners:Provides a step-by-step guide to building web applications, including advanced topics like asynchronous programming, performance optimization, and real-time data handling.
Working with Excel Spreadsheets in Python:Focuses on automating tasks using theopenpyxlmodule.Read to learn about reading, writing, modifying, and formatting Excel files, and advanced features like plotting charts and integrating images.

🔑Best Practices and Advice🔏

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-2

Visualisation 101 - Choosing the Best Visualisation Type:Explores how visualizations improve data-driven decisions, focusing on understanding context, audience, and visual perception.Readto learn how to implement visualizations.
Simone's Creative Cooking Club • If You Haven't Got a Clue What "Pass by Value" or "Pass by Reference" Mean, Read On…:Demonstrates how Python handles function arguments, particularly mutable and immutable objects.
How I ask GPT-4 to make tiny Python scripts in practice:Succinctly describes starting with a basic script, then converting it into a command-line interface using click, and adding features like stdin/stdout handling and error logging.
Linear Algebra Concepts Every Data Scientist Should Know:Introduces key concepts such as vectors, vector operations, vector spaces, and matrices, with visual explanations and code examples to demonstrate their application in real-world data science tasks.
🎥Python From a Java Developer's Perspective:Provides guidance for Java developers to write Python code effectively.Watch to learn how to smoothly transition between Java and Python while leveraging your existing Java knowledge.

🔍Featured Study: Mastering Robotic Control with PyRoboCOP for Complex Tasks💥

In “PyRoboCOP: Python-based Robotic Control & Optimization Package for Manipulation and Collision Avoidance” Raghunathan et al. introduce a Python-based software package designed for the optimisation and control of robotic systems. The package excels in handling complex interactions like contact and collision avoidance, crucial for autonomous robotic manipulation.

Context

Robotic systems often operate in environments with numerous obstacles and objects, making it essential to model and optimise these interactions mathematically. These interactions, defined by complementarity constraints, are challenging to manage because they do not follow standard optimisation assumptions. Most existing physics engines simulate these interactions but do not offer real-time optimisation capabilities.PyRoboCOPaddresses this gap by providing a flexible and user-friendly package that allows robots to reason about their environment and optimise their behaviour, which is critical for achieving autonomous manipulation tasks.

Key Features of PyRoboCOP

PyRoboCOP is characterised by its ability to automatically reformulate complex mathematical constraints and integrate seamlessly with powerful optimisation tools. Key features include:

Automatic Reformulation of Complementarity Constraints:Handles difficult constraints that describe object interactions.
Direct Transcription via Orthogonal Collocation:Converts DAEs into a solvable set of nonlinear equations.
Integration with ADOL-C and IPOPT:Supports automatic differentiation and efficient optimisation.
Built-in Support for Contact and Obstacle Avoidance Constraints:Simplifies the setup of complex robotic tasks.
Flexible User Interface:Allows for customisation and adaptation to various robotic systems.

What This Means for You

The package is particularly relevant for researchers, developers, and engineers working in the field of robotics, especially those involved in designing autonomous systems that require precise control and optimisation. PyRoboCOP’s ability to handle complex robotic interactions makes it a valuable tool for developing real-time, model-based control solutions in environments where contact and collision avoidance are critical.

Examining the Details

PyRoboCOP's performance was rigorously tested across several robotic scenarios, including planar pushing, car parking, and belt drive unit assembly. In a planar pushing task, PyRoboCOP optimised the robot's trajectory, balancing a normal force of 0.5 N and a friction coefficient of 0.3, successfully navigating from (0,0,0)(0,0,0)(0,0,0) to (0.5,0.5,0)(0.5,0.5,0)(0.5,0.5,0) and (−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2). In a car parking scenario, the software optimised movement from (1,4,0,0)(1,4,0,0)(1,4,0,0) to (2,2.5,π/2,0)(2,2.5,π/2,0)(2,2.5,π/2,0), effectively avoiding obstacles. PyRoboCOP also managed the complex task of assembling a belt drive unit, demonstrating its ability to handle intricate manipulations. When benchmarked againstCasADiandPyomo, PyRoboCOP showed comparable performance, solving an acrobot system in a mean time of 2.282 seconds with 1,296 variables, versus CasADi's 1.175 seconds with 900 variables and Pyomo's 2.374 seconds with 909 variables.

You can learn more by reading the entirepaperor access the packagehere.

🧠 Expert insight 📚Tasks💥

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-3

Here’s an excerpt from “Chapter 2: Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and Transformers” in the book,Generative AI Foundations in PythonbyCarlos Rodriguez , published in July 2024.

Applying GAI models – image generation using GANs, diffusers, and transformers

In this hands-on section…You’ll get a first-hand experience and deep dive into theactual implementation of generative models, specifically GANs, diffusion models, and transformers….I'm a new paragraph block.

We’ll be utilizing the highly versatilePyTorchlibrary, a popular choice among machine learning practitioners, to facilitate our operations.PyTorchprovides a powerful and dynamic toolset to define and compute gradients, which is central to trainingthese models.

In addition, we’ll also use thediffuserslibrary. It’s a specialized library that provides functionality to implement diffusion models. This library enables us to reproduce state-of-the-art diffusion models directly from our workspace. It underpins the creation, training, and usage of denoising diffusion probabilistic models at an unprecedented level of simplicity, without compromising themodels’ complexity.

Through this practical session, we’ll explore how to operate and integrate these libraries and implement and manipulate GANs, diffusers, and transformers using the Python programming language. This hands-on experience will complement the theoretical knowledge we have gained in the chapter, enabling us to see these models in action in thereal world….

Working with Jupyter Notebook and Google Colab

Jupyter notebooks enable live code execution, visualization, and explanatory text, suitable for prototyping and data analysis. Google Colab, conversely, is a cloud-based version of Jupyter Notebook, designed for machine learning prototyping. It provides free GPU resources and integrates with Google Drive for file storage and sharing. We’ll leverage Colab as our prototyping environmentgoing forward.

Stable diffusion transformer

We begin with a pre-trained stable diffusion model, a text-to-image latent diffusion model created by researchers and engineers from CompVis, Stability AI, and LAION (Patil et al., 2022). The diffusion process is used to draw samples from complex, high-dimensional distributions, and when it interacts with the text embeddings, it creates a powerful conditional imagesynthesis model.

The term “stable” in this context refers to the fact that during training, a model maintains certain properties that stabilize the learning process. Stable diffusion models offer rich potential to create entirely new samples from a given data distribution, based ontext prompts.

Again, for our practical example, we will Google Colab to alleviate a lot of initial setups. Colab also provides all of the computational resources needed to begin experimenting right away. We start by installing some libraries, and with three simple functions, we will build out a minimalStableDiffusionPipelineusing a well-established open-source implementation of the stablediffusion method.

First, let’s navigate to our pre-configured Python environment, Google Colab, and install thediffusersopen-source library, which will provide most of the key underlying components we need forour experiment.

In the first cell, we install all dependencies using the followingbashcommand. Note the exclamation point at the beginning of the line, which tells our environment to reach down to its underlying process and install the packageswe need:

!pip install pytorch-fid torch diffusers clip transformers accelerate

Next, we import the libraries we’ve just installed to make them available to ourPython program:

from typing import List

import torch

import matplotlib.pyplot as plt

from diffusers import StableDiffusionPipeline, DDPMScheduler

Now, we’re ready for our three functions, which will execute the three tasks – loading the pre-trained model, generating the images based on prompting, and renderingthe images:

def load_model(model_id: str) -> StableDiffusionPipeline:

"""Load model with provided model_id."""

return StableDiffusionPipeline.from_pretrained(

model_id,

torch_dtype=torch.float16,

revision="fp16",

use_auth_token=False

).to("cuda")

def generate_images(

pipe: StableDiffusionPipeline,

prompts: List[str]

) -> torch.Tensor:

"""Generate images based on provided prompts."""

with torch.autocast("cuda"):

images = pipe(prompts).images

return images

def render_images(images: torch.Tensor):

"""Plot the generated images."""

plt.figure(figsize=(10, 5))

for i, img in enumerate(images):

plt.subplot(1, 2, i + 1)

plt.imshow(img)

plt.axis("off")

plt.show()

In summary,load_modelloads a machine learning model identified bymodel_idonto a GPU for faster processing. Thegenerate_imagesfunction takes this model and a list of prompts to create our images. Within this function, you will notice torch.autocast("cuda"), which is a special command that allows PyTorch (our underlying machine learning library) to perform operations faster while maintaining accuracy. Lastly, the render_images function displays these images in a simple grid format, making use of the matplotlib visualization library to renderour output.

With our functions defined, we select our model version, define our pipeline, and execute our imagegeneration process:

# Execution

model_id = "CompVis/stable-diffusion-v1-4"

prompts = [

"A hyper-realistic photo of a friendly lion",

"A stylized oil painting of a NYC Brownstone"

]

pipe = load_model(model_id)

images = generate_images(pipe, prompts)

render_images(images)

The output inFigure 2.1is a vivid example of the imaginativeness and creativity we typically expect from human art, generated entirely by the diffusion process. Except, how do we measure whether the model was faithful to thetext provided?

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-4

Figure 2.1: Output for the prompts “A hyper-realistic photo of a friendly lion” (left) and “A stylized oil painting of a NYC Brownstone” (right)

The next step is to evaluate the quality and relevance of our generated images in relation to the prompts. This is where CLIP comes into play. CLIP is designed to measure the alignment between text and images by analyzing their semantic similarities, giving us a true quantitative measure of the fidelity of our synthetic images tothe prompts.

Scoring with the CLIP model

CLIP is trained to understand the relationship between text and images by learning to place similar images and text near each other in a shared space. When evaluating a generated image, CLIP checks how closely the image aligns with the textual description provided. A higher score indicates a better match, meaning the image accurately represents the text. Conversely, a lower score suggests a deviation from the text, indicating a lesser quality or fidelity to the prompt, providing a quantitative measure of how well the generated image adheres to theintended description.

Again, we will import thenecessary libraries:

from typing import List, Tuple

from PIL import Image

import requests

from transformers import CLIPProcessor, CLIPModel

import torch

We begin by loading the CLIP model, processor, andnecessary parameters:

# Constants

CLIP_REPO = "openai/clip-vit-base-patch32"

def load_model_and_processor(

model_name: str

) -> Tuple[CLIPModel, CLIPProcessor]:

"""

Loads the CLIP model and processor.

"""

model = CLIPModel.from_pretrained(model_name)

processor = CLIPProcessor.from_pretrained(model_name)

return model, processor

Next, we define a processing function to adjust the textual prompts and images, ensuring that they are in the correct format forCLIP inference:

def process_inputs(

processor: CLIPProcessor, prompts: List[str],

images: List[Image.Image]) -> dict:

"""

Processes the inputs using the CLIP processor.

"""

return processor(text=prompts, images=images,

return_tensors="pt", padding=True)

In this step, we initiate the evaluation process by inputting the images and textual prompts into the CLIP model. This is done in parallel across multiple devices to optimize performance. The model then computes similarity scores, known as logits, for each image-text pair. These scores indicate how well each image corresponds to the text prompts. To interpret these scores more intuitively, we convert them into probabilities, which indicate the likelihood that an image aligns with any of thegiven prompts:

def get_probabilities(

model: CLIPModel, inputs: dict) -> torch.Tensor:

"""

Computes the probabilities using the CLIP model.

"""

outputs = model(**inputs)

logits = outputs.logits_per_image

# Define temperature - higher temperature will make the distribution more uniform.

T = 10

# Apply temperature to the logits

temp_adjusted_logits = logits / T

probs = torch.nn.functional.softmax(

temp_adjusted_logits, dim=1)

return probs

Lastly, we display the images along with their scores, visually representing how well each image adheres to theprovided prompts:

def display_images_with_scores(

images: List[Image.Image], scores: torch.Tensor) -> None:

"""

Displays the images alongside their scores.

"""

# Set print options for readability

torch.set_printoptions(precision=2, sci_mode=False)

for i, image in enumerate(images):

print(f"Image {i + 1}:")

display(image)

print(f"Scores: {scores[i, :]}")

print()

With everything detailed, let’s execute the pipelineas follows:

# Load CLIP model

model, processor = load_model_and_processor(CLIP_REPO)

# Process image and text inputs together

inputs = process_inputs(processor, prompts, images)

# Extract the probabilities

probs = get_probabilities(model, inputs)

# Display each image with corresponding scores

display_images_with_scores(images, probs)

We now have scores for each of our synthetic images that quantify the fidelity of the synthetic image to the text provided, based on the CLIP model, which interprets both image and text data as one combined mathematical representation (or geometric space) and can measuretheir similarity.

pythonpro-44-generative-ai-with-pytorch-uv-update-choosing-the-best-visualization-type-and-fastapi-for-rapid-development-img-5

Figure 2.2: CLIP scores

For our “friendly lion,” we computed scores of 83% and 17% for each prompt, which we can interpret as an 83% likelihood that the image aligns with thefirst prompt.

Packt library subscribers cancontinue readingthe entire book for free. You can buyGenerative AI Foundations in Pythonby Carlos Rodriguez,here.

Get the eBook for $31.99$21.99!

And that’s a wrap.

We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most usefulhere. The complete PythonPro archives can be foundhere.