Reader small image

You're reading from  3D Deep Learning with Python

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247823
Edition1st Edition
Right arrow
Authors (3):
Xudong Ma
Xudong Ma
author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

Vishakh Hegde
Vishakh Hegde
author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

Lilit Yolyan
Lilit Yolyan
author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan

View More author details
Right arrow

Exploring Neural Radiance Fields (NeRF)

In the previous chapter, you learned about Differentiable Volume Rendering where you reconstructed the 3D volume from several multi-view images. With this technique, you modeled a volume consisting of N x N x N voxels. The space requirement for storing this volume scale would therefore be O(N3). This is undesirable, especially if we want to transmit this information over the network. Other methods can overcome such large disk space requirements, but they are prone to smoothing geometry and texture. Therefore, we cannot use them to model very complex or textured scenes reliably.

In this chapter, we are going to discuss a breakthrough new approach to representing 3D scenes, called Neural Radiance Fields (NeRF). This is one of the first techniques to model a 3D scene that requires less constant disk space and at the same time, captures the fine geometry and texture of complex scenes.

In this chapter, you will learn about the following topics...

Technical requirements

In order to run the example code snippets in this book, you need to have a computer, ideally with a GPU with about 8 GB of GPU memory. Running code snippets only using CPUs is not impossible but will be extremely slow. The recommended computer configuration is as follows:

  • A GPU device – for example, Nvidia GTX series or RTX series with at least 8 GB of memory
  • Python 3.7+
  • The PyTorch and PyTorch3D libraries

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Understanding NeRF

View synthesis is a long-standing problem in 3D computer vision. The challenge is to synthesize new views of a 3D scene using a small number of available 2D snapshots of the scene. It is particularly challenging because the view of a complex scene can depend on a lot of factors such as object artifacts, light sources, reflections, opacity, object surface texture, and occlusions. Any good representation should capture this information either implicitly or explicitly. Additionally, many objects have complex structures that are not completely visible from a certain viewpoint. The challenge is to construct complete information about the world given incomplete and noisy information.

As the name suggests, NeRF uses neural networks to model the world. As we will learn later in the chapter, NeRF uses neural networks in a very unconventional manner. It was a concept first developed by a team of researchers from UC Berkeley, Google Research, and UC San Diego. Because of...

Training a NeRF model

In this section, we are going to train a simple NeRF model on images generated from the synthetic cow model. We are only going to instantiate the NeRF model without worrying about how it is implemented. The implementation details are covered in the next section. A single neural network (NeRF model) is trained to represent a single 3D scene. The following codes can be found in train_nerf.py, which can be found in this chapter’s GitHub repository. It is modified from a PyTorch3D tutorial. Let us go through the code to train a NeRF model on the synthetic cow scene:

  1. First, let us import the standard modules:
    import torch
    import matplotlib.pyplot as plt
  2. Next, let us import the functions and classes used for rendering. These are pytorch3d data structures:
    from pytorch3d.renderer import (
    FoVPerspectiveCameras,
    NDCMultinomialRaysampler,
    MonteCarloRaysampler,
    EmissionAbsorptionRaymarcher,
    ImplicitRenderer,
    )
    from utils.helper_functions import (generate_rotating_nerf...

Understanding the NeRF model architecture

So far, we have used the NeRF model class without fully knowing what it looks like. In this section, we will first visualize what the neural network looks like and then go through the code in detail and understand how it is implemented.

The neural network takes the harmonic embedding of the spatial location (x, y, z) and the harmonic embedding of (θ, ∅) as its input and outputs the predicted density σ and the predicted color (r, g, b). The following figure illustrates the network architecture that we are going to implement in this section:

Figure 6.5: The simplified model architecture of the NeRF model

Note

The model architecture that we are going to implement is different from the original NeRF model architecture. In this implementation, we are implementing a simplified version of it. This simplified architecture makes it faster and easier to train.

Let us start defining the NeuralRadianceField...

Understanding volume rendering with radiance fields

Volume rendering allows you to create a 2D projection of a 3D image or scene. In this section, we will learn about rendering a 3D scene from different viewpoints. For the purposes of this section, assume that the NeRF model is fully trained and that it accurately maps the input coordinates (x, y, z, d­­­x, dy, dz) to an output (r, g, b, σ). Here are the definitions of these input and output coordinates:

  • (x, y, z): A point in the 3D scene in the World Coordinates
  • (d­­­x, dy, dz): This is a unit vector that represents the direction along which we are viewing the point (x, y, z)
  • (r, g, b): This is the radiance value (or the emitted color) of the point (x, y, z)
  • σ: The volume density at the point (x, y, z)

In the previous chapter, you came to understand the concepts underlying volumetric rendering. You used the technique of ray sampling to get volume densities and colors...

Summary

In this chapter, we came to understand how a neural network can be used to model and represent a 3D scene. This neural network is called the NeRF model. We then trained a simple NeRF model on a synthetic 3D scene. We then dug deeper into the NeRF model architecture and its implementation in code. We also understood the main components of the model. We then understood the principles behind rendering volumes with the NeRF model. The NeRF model is used to capture a single scene. Once we build this model, we can use it to render that 3D scene from different angles. It is logical to wonder whether there is a way to capture multiple scenes with a single model and whether we can predictably manipulate certain objects and attributes in the scene. This is our topic of exploration in the next chapter where we will explore the GIRAFFE model.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
3D Deep Learning with Python
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247823
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan