Reader small image

You're reading from  3D Deep Learning with Python

Product typeBook
Published inOct 2022
PublisherPackt
ISBN-139781803247823
Edition1st Edition
Right arrow
Authors (3):
Xudong Ma
Xudong Ma
author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

Vishakh Hegde
Vishakh Hegde
author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

Lilit Yolyan
Lilit Yolyan
author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan

View More author details
Right arrow

Introducing 3D Computer Vision and Geometry

In this chapter, we will learn about some basic concepts of 3D computer vision and geometry that will be especially useful for later chapters in this book. We will start by discussing what rendering, rasterization, and shading are. We will go through different lighting models and shading models, such as point light sources, directional light sources, ambient lighting, diffusion, highlights, and shininess. We will go through a coding example for rendering a mesh model using different lighting models and parameters.

We will then learn how to use PyTorch for solving optimization problems. Particularly, we will go through stochastic gradient descent over heterogeneous mini-batches, which becomes possible by using PyTorch3D. We will also learn about different formats for mini-batches in PyTorch3D, including the list, padded, and packed formats, and learn how to convert between the different formats.

In the last part of the chapter, we will...

Technical requirements

To run the example code snippets in this book, the readers need to have a computer, ideally with a GPU. However, running the code snippets only with CPUs is not impossible.

The recommended computer configuration includes the following:

  • A modern GPU – for example, the Nvidia GTX series or RTX series with at least 8 GB of memory
  • Python 3
  • PyTorch library and PyTorch3D libraries

The code snippets with this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Exploring the basic concepts of rendering, rasterization, and shading

Rendering is a process that takes 3D data models of the world around our camera as input and output images. It is an approximation to the physical process where images are formed in our camera in the real world. Typically, the 3D data models are meshes. In this case, rendering is usually done using ray tracing:

Figure 2.1: Rendering by ray tracing (rays are generated from camera origins and go through the image pixels for finding relevant mesh faces)

An example of ray tracing processing is shown in Figure 2.1. In the example, the world model contains one 3D sphere, which is represented by a mesh model. To form the image of the 3D sphere, for each image pixel, we generate one ray, starting from the camera origin and going through the image pixel. If one ray intersects with one mesh face, then we know the mesh face can project its color to the image pixel. We also need to trace the depth of...

Coding exercises for 3D rendering

In this section, we will look at a concrete coding exercise using PyTorch3D for rendering a mesh model. We are going to learn how to define a camera model and how to define a light source in PyTorch3D. We will also learn how to change the incoming light components and material properties so that more realistic images can be rendered by controlling the three light components (ambient, diffusion, and glossy):

  1. First, we need to import all the Python modules that we need:
    import open3d
    import os
    import sys
    import torch
    import matplotlib.pyplot as plt
    from pytorch3d.io import load_objs_as_meshes
    from pytorch3d.renderer import (
        look_at_view_transform,
        PerspectiveCameras,
        PerspectiveCameras,
        PointLights,
        Materials,
        RasterizationSettings,
        MeshRenderer,
        MeshRasterizer...

Using PyTorch3D heterogeneous batches and PyTorch optimizers

In this section, we are going to learn how to use the PyTorch optimizer on PyTorch3D heterogeneous mini-batches. In deep learning, we are usually given a list of data examples, such as the following ones – .. Here, are the observations and are the prediction values. For example, may be some images and the ground-truth classification results – for example, “cat” or “dog”. A deep neural network is then trained so that the outputs of the neural networks are as close to as possible. Usually, a loss function between the neural network outputs and is defined so that the loss function values decrease as the neural network outputs become closer to .

Thus, training a deep learning network is usually done by minimizing the loss function that is evaluated on all training data examples, and. A straightforward method used in many optimization algorithms is computing the gradients first...

Understanding transformations and rotations

In 3D deep learning and computer vision, we usually need to work with 3D transformations, such as rotations and 3D rigid motions. PyTorch3D provides a high-level encapsulation of these transformations in its pytorch3d.transforms.Transform3d class. One advantage of the Transform3d class is that it is mini-batch based. Thus, as frequently needed in 3D deep learning, it is possible to apply a mini-batch of transformations on a mini-batch of meshes only within several lines of code. Another advantage of Transform3d is that gradient backpropagation can straightforwardly pass through Transform3d.

PyTorch3D also provides many lower-level APIs for computations in the Lie groups SO(3) and SE(3). Here, SO(3) denotes the special orthogonal group in 3D and SE(3) denotes the special Euclidean group in 3D. Informally speaking, SO(3) denotes the set of all the rotation transformations and SE(3) denotes the set of all the rigid transformations in 3D....

Summary

In this chapter, we learned about the basic concepts of rendering, rasterization, and shading, including light source models, the Lambertian shading model, and the Phong lighting model. We learned how to implement rendering, rasterization, and shading using PyTorch3D. We also learned how to change the parameters in the rendering process, such as ambient lighting, shininess, and specular colors, and how these parameters would affect the rendering results.

We then learned how to use the PyTorch optimizer. We went through a coding example, where the PyTorch optimizer was used on a PyTorch3D mini-batch. In the last part of the chapter, we learned how to use the PyTorch3D APIs for converting between the different representations or rotations and transformations.

In the next chapter, we will learn some more advanced techniques for using deformable mesh models for fitting real-world 3D data.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
3D Deep Learning with Python
Published in: Oct 2022Publisher: PacktISBN-13: 9781803247823
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Authors (3)

author image
Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Read more about Xudong Ma

author image
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Read more about Vishakh Hegde

author image
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Read more about Lilit Yolyan