Reader small image

You're reading from  3D Deep Learning with Python

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247823
Edition1st Edition
Right arrow
Authors (3):
Xudong Ma
Xudong Ma
author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

Vishakh Hegde
Vishakh Hegde
author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

Lilit Yolyan
Lilit Yolyan
author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan

View More author details
Right arrow

Exploring Controllable Neural Feature Fields

In the previous chapter, you learned how to represent a 3D scene using Neural Radiance Fields (NeRF). We trained a single neural network on posed multi-view images of a 3D scene to learn an implicit representation of it. Then, we used the NeRF model to render the 3D scene from various other viewpoints and viewing angles. With this model, we assumed that the objects and the background are unchanging.

But it is fair to wonder whether it is possible to generate variations of the 3D scene. Can we control the number of objects, their poses, and the scene background? Can we learn about the 3D nature of things without posed images and without understanding the camera parameters?

By the end of this chapter, you will learn that it is indeed possible to do all these things. Concretely, you should have a better understanding of GIRAFFE, a very novel method for controllable 3D image synthesis. This combines ideas from the fields of image synthesis...

Technical requirements

In order to run the example code snippets in this book, ideally, you need to have a computer with a GPU that has around 8 GB of GPU memory. Running code snippets with only CPUs is not impossible but will be extremely slow. The recommended computer configuration is as follows:

  • A GPU device – for example, the Nvidia GTX series or the RTX series with at least 8 GB of memory
  • Python 3.7+
  • Anaconda3

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Understanding GAN-based image synthesis

Deep generative models have been shown to produce photorealistic 2D images when trained on a distribution from a particular domain. Generative Adversarial Networks (GANs) are one of the most widely used frameworks for this purpose. They can synthesize high-quality photorealistic images at resolutions of 1,024 x 1,024 and beyond. For example, they have been used to generate realistic faces:

Figure 7.1: Randomly generated faces as high-quality 2D images using StyleGAN2

GANs can be trained to generate similar-looking images from any data distribution. The same StyleGAN2 model, when trained on a car dataset, can generate high-resolution images of cars:

Figure 7.2: Randomly generated cars as 2D images using StyleGAN2

GANs are based on a game-theoretic scenario where a generator neural network generates an image. However, in order to be successful, it must...

Introducing compositional 3D-aware image synthesis

Our goal is controllable image synthesis. We need control over the number of objects in the image, their position, shape, size, and pose. The GIRAFFE model is one of the first to achieve all these desirable properties while also generating high-resolution photorealistic images. In order to have control over these attributes, the model must have some awareness of the 3D nature of the scene.

Now, let us look at how the GIRAFFE model builds on top of other established ideas to achieve this. It makes use of the following high-level concepts:

  • Learning 3D representation: A NeRF-like model for learning implicit 3D representation and feature fields. Unlike the standard NeRF model, this model outputs a feature field instead of the color intensity. This NeRF-like model is used to enforce a 3D consistency in the images generated.
  • Compositional operator: A parameter-free compositional operator to compose feature fields of multiple...

Generating feature fields

The first step of the scene generation process is generating a feature field. This is analogous to generating an RGB image in the NeRF model. In the NeRF model, the output of the model is a feature field that happens to be an image made up of RGB values. However, a feature field can be any abstract notion of the image. It is a generalization of an image matrix. The difference here is that instead of generating a three-channel RGB image, the GIRAFFE model generates a more abstract image that we refer to as the feature field with dimensions HV, WV, and Mf, where HV is the height of the feature field, WV is its width, and Mf is the number of channels in the feature field.

For this section, let us assume that we have a trained GIRAFFE model. It has been trained on some predefined dataset that we are not going to think about now. To generate a new image, we need to do the following three things:

  1. Specify the camera pose: This defines the viewing angle...

Mapping feature fields to images

After we generate a feature field of dimensions HV x WV x Mf, we need to map this to an image of dimension H x W x 3. Typically, HV < H, WV < W, and Mf > 3. The GIRAFFE model uses the two-stage approach since an ablation analysis showed it to be better than using a single-stage approach to generate the image directly.

The mapping operation is a parametric function that can be learned with data, and using a 2D CNN is best suited for this task since it is a function in the image domain. You can think of this function as an upsampling neural network like a decoder in an auto-encoder. The output of this neural network is the rendered image that we can see, understand, and evaluate. Mathematically, this can be defined as follows:

This neural network consists of a series of upsampling layers done using n blocks of nearest neighbor upsampling, followed by a 3 x 3 convolution and leaky ReLU. This creates a series of n...

Exploring controllable scene generation

To truly appreciate and learn what a computer vision model generates, we need to visualize the outputs of the trained model. Since we are dealing with a generative approach, it is easy to do this by simply visualizing the images generated by the model. In this section, we will explore pre-trained GIRAFFE models and look at how well they can generate controllable scenes. We will use pre-trained checkpoints provided by the creators of the GIRAFFE model. The instructions provided in this section are based on the open source GitHub repository at https://github.com/autonomousvision/giraffe.

Create the Anaconda environment called giraffe with the following commands:

$ cd chap7/giraffe
$ conda env create -f environment.yml
$ conda activate giraffe

Once the conda environment has been activated, you can start rendering images for various datasets using their corresponding pre-trained checkpoints. The creators of the GIRAFFE model have shared...

Training the GIRAFFE model

So far in this chapter, we have understood how a trained GIRAFFE model works. We have understood the different components that make up the generator part of the model.

But to train the model, there is another part that we have not looked at so far, namely, the discriminator. Like in any other GAN model, this discriminator part of the model is not used during image synthesis, but it is a vital component for training the model. In this chapter, we will investigate it in more detail and gain an understanding of the loss function used. We will train a new model from scratch using the training module provided by the authors of GIRAFFE.

The generator takes as input the various latent code corresponding to object rotation, background rotation, camera elevation, horizontal and depth translation, and object size. This is used to first generate a feature field and then map it to RGB pixels using a neural rendering module. This is the generator. The discriminator...

Summary

In this chapter, you explored controllable 3D-aware image synthesis using the GIRAFFE model. This model borrows concepts from NeRF, GANs, and 2D CNNs to create 3D scenes that are controllable. First, we had a refresher on GANs. Then, we dove deeper into the GIRAFFE model, how feature fields are generated, and how those feature fields are then transformed into RGB images. We then explored the outputs of this model and understood its properties and limitations. Finally, we briefly touched on how to train this model.

In the next chapter, we are going to explore a relatively new technique used to generate realistic human bodies in three dimensions called the SMPL model. Notably, the SMPL model is one of the small numbers of models that do not use deep neural networks. Instead, it uses more classical statistical techniques such as principal component analysis to achieve its objectives. You will learn the importance of good mathematical problem formulation in building models that...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
3D Deep Learning with Python
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247823
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan