You're reading from 3D Deep Learning with Python

Product typeBook

Published inOct 2022

PublisherPackt

ISBN-139781803247823

Edition1st Edition

Concepts

Computer Vision

Authors (3):

Xudong Ma

Vishakh Hegde

Lilit Yolyan

View More author details

Exploring Controllable Neural Feature Fields

In the previous chapter, you learned how to represent a 3D scene using Neural Radiance Fields (NeRF). We trained a single neural network on posed multi-view images of a 3D scene to learn an implicit representation of it. Then, we used the NeRF model to render the 3D scene from various other viewpoints and viewing angles. With this model, we assumed that the objects and the background are unchanging.

But it is fair to wonder whether it is possible to generate variations of the 3D scene. Can we control the number of objects, their poses, and the scene background? Can we learn about the 3D nature of things without posed images and without understanding the camera parameters?

By the end of this chapter, you will learn that it is indeed possible to do all these things. Concretely, you should have a better understanding of GIRAFFE, a very novel method for controllable 3D image synthesis. This combines ideas from the fields of image synthesis...

Technical requirements

In order to run the example code snippets in this book, ideally, you need to have a computer with a GPU that has around 8 GB of GPU memory. Running code snippets with only CPUs is not impossible but will be extremely slow. The recommended computer configuration is as follows:

A GPU device – for example, the Nvidia GTX series or the RTX series with at least 8 GB of memory
Python 3.7+
Anaconda3

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Understanding GAN-based image synthesis

Deep generative models have been shown to produce photorealistic 2D images when trained on a distribution from a particular domain. Generative Adversarial Networks (GANs) are one of the most widely used frameworks for this purpose. They can synthesize high-quality photorealistic images at resolutions of 1,024 x 1,024 and beyond. For example, they have been used to generate realistic faces:

Figure 7.1: Randomly generated faces as high-quality 2D images using StyleGAN2

GANs can be trained to generate similar-looking images from any data distribution. The same StyleGAN2 model, when trained on a car dataset, can generate high-resolution images of cars:

Figure 7.2: Randomly generated cars as 2D images using StyleGAN2

GANs are based on a game-theoretic scenario where a generator neural network generates an image. However, in order to be successful, it must...

Introducing compositional 3D-aware image synthesis

Our goal is controllable image synthesis. We need control over the number of objects in the image, their position, shape, size, and pose. The GIRAFFE model is one of the first to achieve all these desirable properties while also generating high-resolution photorealistic images. In order to have control over these attributes, the model must have some awareness of the 3D nature of the scene.

Now, let us look at how the GIRAFFE model builds on top of other established ideas to achieve this. It makes use of the following high-level concepts:

Learning 3D representation: A NeRF-like model for learning implicit 3D representation and feature fields. Unlike the standard NeRF model, this model outputs a feature field instead of the color intensity. This NeRF-like model is used to enforce a 3D consistency in the images generated.
Compositional operator: A parameter-free compositional operator to compose feature fields of multiple...

Generating feature fields

The first step of the scene generation process is generating a feature field. This is analogous to generating an RGB image in the NeRF model. In the NeRF model, the output of the model is a feature field that happens to be an image made up of RGB values. However, a feature field can be any abstract notion of the image. It is a generalization of an image matrix. The difference here is that instead of generating a three-channel RGB image, the GIRAFFE model generates a more abstract image that we refer to as the feature field with dimensions HV, WV, and Mf, where HV is the height of the feature field, WV is its width, and Mf is the number of channels in the feature field.

For this section, let us assume that we have a trained GIRAFFE model. It has been trained on some predefined dataset that we are not going to think about now. To generate a new image, we need to do the following three things:

Specify the camera pose: This defines the viewing angle...

Mapping feature fields to images

After we generate a feature field of dimensions HV x WV x Mf, we need to map this to an image of dimension H x W x 3. Typically, HV < H, WV < W, and Mf > 3. The GIRAFFE model uses the two-stage approach since an ablation analysis showed it to be better than using a single-stage approach to generate the image directly.

The mapping operation is a parametric function that can be learned with data, and using a 2D CNN is best suited for this task since it is a function in the image domain. You can think of this function as an upsampling neural network like a decoder in an auto-encoder. The output of this neural network is the rendered image that we can see, understand, and evaluate. Mathematically, this can be defined as follows:

This neural network consists of a series of upsampling layers done using n blocks of nearest neighbor upsampling, followed by a 3 x 3 convolution and leaky ReLU. This creates a series of n...

Exploring controllable scene generation

To truly appreciate and learn what a computer vision model generates, we need to visualize the outputs of the trained model. Since we are dealing with a generative approach, it is easy to do this by simply visualizing the images generated by the model. In this section, we will explore pre-trained GIRAFFE models and look at how well they can generate controllable scenes. We will use pre-trained checkpoints provided by the creators of the GIRAFFE model. The instructions provided in this section are based on the open source GitHub repository at https://github.com/autonomousvision/giraffe.

Create the Anaconda environment called giraffe with the following commands:

$ cd chap7/giraffe
$ conda env create -f environment.yml
$ conda activate giraffe

Once the conda environment has been activated, you can start rendering images for various datasets using their corresponding pre-trained checkpoints. The creators of the GIRAFFE model have shared...

Training the GIRAFFE model

So far in this chapter, we have understood how a trained GIRAFFE model works. We have understood the different components that make up the generator part of the model.

But to train the model, there is another part that we have not looked at so far, namely, the discriminator. Like in any other GAN model, this discriminator part of the model is not used during image synthesis, but it is a vital component for training the model. In this chapter, we will investigate it in more detail and gain an understanding of the loss function used. We will train a new model from scratch using the training module provided by the authors of GIRAFFE.

The generator takes as input the various latent code corresponding to object rotation, background rotation, camera elevation, horizontal and depth translation, and object size. This is used to first generate a feature field and then map it to RGB pixels using a neural rendering module. This is the generator. The discriminator...

Summary

In this chapter, you explored controllable 3D-aware image synthesis using the GIRAFFE model. This model borrows concepts from NeRF, GANs, and 2D CNNs to create 3D scenes that are controllable. First, we had a refresher on GANs. Then, we dove deeper into the GIRAFFE model, how feature fields are generated, and how those feature fields are then transformed into RGB images. We then explored the outputs of this model and understood its properties and limitations. Finally, we briefly touched on how to train this model.

In the next chapter, we are going to explore a relatively new technique used to generate realistic human bodies in three dimensions called the SMPL model. Notably, the SMPL model is one of the small numbers of models that do not use deep neural networks. Instead, it uses more classical statistical techniques such as principal component analysis to achieve its objectives. You will learn the importance of good mathematical problem formulation in building models that...

The rest of the chapter is locked

You have been reading a chapter from

3D Deep Learning with Python

Published in: Oct 2022Publisher: PacktISBN-13: 9781803247823

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Authors (3)

Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages