Home

Data

3D Deep Learning with Python

By Xudong Ma , Vishakh Hegde , Lilit Yolyan

Book

eBook $33.99 $22.99

Print $41.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $33.99 $22.99

Print $41.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

With this hands-on guide to 3D deep learning, developers working with 3D computer vision will be able to put their knowledge to work and get up and running in no time. Complete with step-by-step explanations of essential concepts and practical examples, this book lets you explore and gain a thorough understanding of state-of-the-art 3D deep learning. You’ll see how to use PyTorch3D for basic 3D mesh and point cloud data processing, including loading and saving ply and obj files, projecting 3D points into camera coordination using perspective camera models or orthographic camera models, rendering point clouds and meshes to images, and much more. As you implement some of the latest 3D deep learning algorithms, such as differential rendering, Nerf, synsin, and mesh RCNN, you’ll realize how coding for these deep learning models becomes easier using the PyTorch3D library. By the end of this deep learning book, you’ll be ready to implement your own 3D deep learning models confidently.

Publication date:: October 2022
Publisher: Packt
Pages: 236
ISBN: 9781803247823
Download code from GitHub

Introducing 3D Data Processing

In this chapter, we are going to discuss some basic concepts that are very fundamental to 3D deep learning and that will be used frequently in later chapters. We will begin by learning about the most frequently used 3D data formats, as well as the many ways that we are going to manipulate them and convert them to different formats. We will start by setting up our development environment and installing all the necessary software packages, including Anaconda, Python, PyTorch, and PyTorch3D. We will then talk about the most frequently used ways to represent 3D data – for example, point clouds, meshes, and voxels. We will then move on to the 3D data file formats, such as PLY and OBJ files. We will then discuss 3D coordination systems. Finally, we will discuss camera models, which are mostly related to how 3D data is mapped to 2D images.

After reading this chapter, you will be able to debug 3D deep learning algorithms easily by inspecting output data files. With a solid understanding of coordination systems and camera models, you will be ready to build on that knowledge and learn about more advanced 3D deep learning topics.

In this chapter, we’re going to cover the following main topics:

Setting up a development environment and installing Anaconda, PyTorch, and PyTorch3D
3D data representation
3D data formats – PLY and OBJ files
3D coordination systems and conversion between them
Camera models – perspective and orthographic cameras

Technical requirements

In order to run the example code snippets in this book, you will need to have a computer ideally with a GPU. However, running the code snippets with only CPUs is possible.

The recommended computer configuration includes the following:

A GPU such as the GTX series or RTX series with at least 8 GB of memory
Python 3
The PyTorch library and PyTorch3D libraries

The code snippets for this chapter can be found at https://github.com/PacktPublishing/3D-Deep-Learning-with-Python.

Setting up a development environment

Let us first set up a development environment for all the coding exercises in this book. We recommend using a Linux machine for all the Python code examples in this book:

We will first set up Anaconda. Anaconda is a widely used Python distribution that bundles with the powerful CPython implementation. One advantage of using Anaconda is its package management system, enabling users to create virtual environments easily. The individual edition of Anaconda is free for solo practitioners, students, and researchers. To install Anaconda, we recommend visiting the website, anaconda.com, for detailed instructions. The easiest way to install Anaconda is usually by running a script downloaded from their website. After setting up Anaconda, run the following command to create a virtual environment of Python 3.7:
```
$ conda create -n python3d python=3.7
```

This command will create a virtual environment with Python version 3.7. In order to use this virtual environment, we need to activate it first by running the command:

Activate the newly created virtual environments with the following command:
```
$ source activate python3d
```
Install PyTorch. Detailed instructions on installing PyTorch can be found on its web page at www.pytorch.org/get-started/locally/. For example, I will install PyTorch 1.9.1 on my Ubuntu desktop with CUDA 11.1, as follows:
```
$ conda install pytorch torchvision torchaudio cudatoolkit-11.1 -c pytorch -c nvidia
```
Install PyTorch3D. PyTorch3D is an open source Python library for 3D computer vision recently released by Facebook AI Research. PyTorch3D provides many utility functions to easily manipulate 3D data. Designed with deep learning in mind, almost all 3D data can be handled by mini-batches, such as cameras, point clouds, and meshes. Another key feature of PyTorch3D is the implementation of a very important 3D deep learning technique, called differentiable rendering. However, the biggest advantage of PyTorch3D as a 3D deep learning library is its close ties to PyTorch.

PyTorch3D may need some dependencies, and detailed instructions on how to install these dependencies can be found on the PyTorch3D GitHub home page at github.com/facebookresearch/pytorch3d. After all the dependencies have been installed by following the instructions from the website, installing PyTorch3D can be easily done by running the following command:

$ conda install pytorch3d -c pytorch3d

Now that we have set up the development environment, let’s go ahead and start learning data representation.

3D data representation

In this section, we will learn the most frequently used data representation of 3D data. Choosing data representation is a particularly important design decision for many 3D deep learning systems. For example, point clouds do not have grid-like structures, thus convolutions cannot be usually used directly for them. Voxel representations have grid-like structures; however, they tend to consume a high amount of computer memory. We will discuss the pros and cons of these 3D representations in more detail in this section. Widely used 3D data representations usually include point clouds, meshes, and voxels.

Understanding point cloud representation

A 3D point cloud is a very straightforward representation of 3D objects, where each point cloud is just a collection of 3D points, and each 3D point is represented by one three-dimensional tuple (x, y, or z). The raw measurements of many depth cameras are usually 3D point clouds.

From a deep learning point of view, 3D point clouds are one of the unordered and irregular data types. Unlike regular images, where we can define neighboring pixels for each individual pixel, there are no clear and regular definitions for neighboring points for each point in a point cloud – that is, convolutions usually cannot be applied to point clouds. Thus, special types of deep learning models need to be used for processing point clouds, such as PointNet: https://arxiv.org/abs/1612.00593.

Another issue for point clouds as training data for 3D deep learning is the heterogeneous data issue – that is, for one training dataset, different point clouds may contain different numbers of 3D points. One approach for avoiding such a heterogeneous data issue is forcing all the point clouds to have the same number of points. However, this may not be always possible – for example, the number of points returned by depth cameras may be different from frame to frame.

The heterogeneous data may create some difficulties for mini-batch gradient descent in training deep learning models. Most deep learning frameworks assume that each mini-batch contains training examples of the same size and dimensions. Such homogeneous data is preferred because it can be most efficiently processed by modern parallel processing hardware, such as GPUs. Handling heterogeneous mini-batches in an efficient way needs some additional work. Luckily, PyTorch3D provides many ways of handling heterogeneous mini-batches efficiently, which are important for 3D deep learning.

Understanding mesh representation

Meshes are another widely used 3D data representation. Like points in point clouds, each mesh contains a set of 3D points called vertices. In addition, each mesh also contains a set of polygons called faces, which are defined on vertices.

In most data-driven applications, meshes are a result of post-processing from raw measurements of depth cameras. Often, they are manually created during the process of 3D asset design. Compared to point clouds, meshes contain additional geometric information, encode topology, and have surface-normal information. This additional information becomes especially useful in training learning models. For example, graph convolutional neural networks usually treat meshes as graphs and define convolutional operations using the vertex neighboring information.

Just like point clouds, meshes also have similar heterogeneous data issues. Again, PyTorch3D provides efficient ways for handling heterogeneous mini-batches for mesh data, which makes 3D deep learning efficient.

Understanding voxel representation

Another important 3D data representation is voxel representation. A voxel is the counterpart of a pixel in 3D computer vision. A pixel is defined by dividing a rectangle in 2D into smaller rectangles and each small rectangle is one pixel. Similarly, a voxel is defined by dividing a 3D cube into smaller-sized cubes and each cube is called one voxel. The processes are shown in the following figure:

Figure 1.1 – Voxel representation is the 3D counterpart of 2D pixel representation, where a cubic space is divided into small volume elements

Voxel representations usually use Truncated Signed Distance Functions (TSDFs) to represent 3D surfaces. A Signed Distance Function (SDF) can be defined at each voxel as the (signed) distance between the center of the voxel to the closest point on the surface. A positive sign in an SDF indicates that the voxel center is outside an object. The only difference between a TSDF and an SDF is that the values of a TSDF are truncated, such that the values of a TSDF always range from -1 to +1.

Unlike point clouds and meshes, voxel representation is ordered and regular. This property is like pixels in images and enables the use of convolutional filters in deep learning models. One potential disadvantage of voxel representation is that it usually requires more computer memory, but this can be reduced by using techniques such as hashing. Nevertheless, voxel representation is an important 3D data representation.

There are 3D data representations other than the ones mentioned here. For example, multi-view representations use multiple images taken from different viewpoints to represent a 3D scene. RGB-D representations use an additional depth channel to represent a 3D scene. However, in this book, we will not be diving too deep into these 3D representations. Now that we have learned the basics of 3D data representations, we will dive into a few commonly used file formats for point clouds and meshes.

3D data file format – Ply files

The PLY file format was developed in the mid-1990s by a group of researchers from Stanford University. It has since evolved into one of the most widely used 3D data file formats. The file format has both an ASCII version and a binary version. The binary version is preferred in cases where file sizes and processing efficiencies are needed. The ASCII version makes it quite easy to debug. Here, we will discuss the basic format of PLY files and how to use both Open3D and PyTorch3D to load and visualize 3D data from PLY files.

In this section, we are going to discuss the two most frequently used data file formats to represent point clouds and meshes, the PLY file format and the OBJ file format. We are going to discuss the formats and how to load and save these file formats using PyTorch3D. PyTorch3D provides excellent utility functions, so loading from and saving to these file formats is efficient and easy using these utility functions.

An example, a cube.ply file, is shown in the following code snippet:

ply

format ascii 1.0

comment created for the book 3D Deep Learning with Python

element vertex 8

property float32 x

property float32 y

property float32 z

element face 12

property list uint8 int32 vertex_indices

end_header

-1 -1 -1

1 -1 -1

1 1 -1

-1 1 -1

-1 -1 1

1 -1 1

1 1 1

-1 1 1

3 0 1 2

3 5 4 7

3 6 2 1

3 3 7 4

3 7 3 2

3 5 1 0

3 0 2 3

3 5 7 6

3 6 1 5

3 3 4 0

3 7 2 6

3 5 0 4

As seen here, each PLY file contains a header part and a data part. The first line of every ASCII PLY file is always ply, which indicates that this is a PLY file. The second line, format ascii 1.0, shows that the file is of the Ascii type with a version number. Any lines starting with comment will be considered as a comment line, and thus anything following comment will be ignored when the PLY file is loaded by a computer. The element vertex 8 line means that the first type of data in the PLY file is vertex and we have eight vertices. property float32 x means that each vertex has a property named x of the float32 type. Similarly, each vertex also has y and z properties. Here, each vertex is one 3D point. The element face 12 line means that the second type of data in this PLY file is of the face type and we have 12 faces. property list unit8 int32 vertex_indices shows that each face will be a list of vertex indices. The header part of the ply file always ends with an end_header line.

The first part of the data part of the PLY file consists of eight lines, where each line is the record for one vertex. The three numbers in each line represent the three x, y, and z properties of the vertex. For example, the three numbers -1, -1, -1 specify that the vertex has an x coordinate of -1, y coordinate of -1, and z coordinate of -1.

The second part of the data part of the ply file consists of 12 lines, where each line is the record for one face. The first number in the sequence of numbers indicates the number of vertices that the face has, and the following numbers are the vertex indices. The vertex indices are determined by the order that the vertices are declared in the PLY file.

We can use both Open3D and PyTorch3D to open the preceding file. Open3D is a Python package that is very handy for visualizing 3D data, and PyTorch3D is handy for using this data for deep learning models. The following is a code snippet, ply_example1.py, for visualizing the mesh in the cube.ply file and loading the vertices and meshes as PyTorch tensors:

import open3d

from pytorch3d.io import load_ply

mesh_file = "cube.ply"

print('visualizing the mesh using open3D')

mesh = open3d.io.read_triangle_mesh(mesh_file)

open3d.visualization.draw_geometries([mesh],

       mesh_show_wireframe = True,

       mesh_show_back_face = True)

print("Loading the same file with PyTorch3D")

vertices, faces = load_ply(mesh_file)

print('Type of vertices = ', type(vertices))

print("type of faces = ", type(faces))

print('vertices = ', vertices)

print('faces = ', faces)

In the preceding Python code snippet, a cube.ply mesh file is first opened by the open3d package by using the read_triangle_mesh function and all the 3D data is read into the mesh variable. The mesh can then be visualized using the Open3D library draw_geometries function. When you run this function, the Open3D library will pop up a window for interactively visualizing the mesh – that is, you can rotate, zoom into, and zoom out of the mesh using your mouse interactively. The cube.ply file, as you can guess, defines a mesh of a cube with eight vertices and six sides, where each side is covered by two faces.

We can also use the PyTorch3D library to load the same mesh. However, this time, we are going to obtain several PyTorch tensors – for example, one tensor for vertices and one tensor for faces. These tensors can be input into any PyTorch deep learning model directly. In this example, the load_ply function returns a tuple of vertices and faces, both of which are conventionally in the format of PyTorch tensors. When you run this ply_example1.py code snippet, the returned vertices should be a PyTorch tensor with a shape of [8, 3] – that is, there are eight vertices, and each vertex has three coordinates. Similarly, the returned faces should be a PyTorch tensor with a shape of [12, 3], that is, there are 12 faces, and each face has 3 vertex indices.

In the following code snippet, we show another example of the parallel_plane_mono.ply file, which can also be downloaded from our GitHub repository. The only difference between the mesh in this example and the mesh in the cube is.ply file is the number of faces. Instead of the six sides of a cube, here we have only four faces, which form two parallel planes:

ply

format ascii 1.0

comment created for the book 3D Deep Learning with Python

element vertex 8

property float32 x

property float32 y

property float32 z

element face 4

property list uint8 int32 vertex_indices

end_header

-1 -1 -1

1 -1 -1

1 1 -1

-1 1 -1

-1 -1 1

1 -1 1

1 1 1

-1 1 1

3 0 1 2

3 0 2 3

3 5 4 7

3 5 7 6

The mesh can be interactively visualized by the following ply_example2.py:

First, we import all the needed Python libraries:
```
import open3d
from pytorch3d.io import load_ply
```

We load the mesh using open3d:

mesh_file = "parallel_plane_mono.ply"
print('visualizing the mesh using open3D')
mesh = open3d.io.read_triangle_mesh(mesh_file)

We use draw_geometries to open a window for visualizing interactively with the mesh:

open3d.visualization.draw_geometries([mesh],
                  mesh_show_wireframe = True,
                  mesh_show_back_face = True)

We use pytorch3d to open the same mesh:

print("Loading the same file with PyTorch3D")
vertices, faces = load_ply(mesh_file)

We can print out the information about the loaded vertices and faces. In fact, they are just ordinary PyTorch3D tensors:

print('Type of vertices = ', type(vertices), ", type of faces = ", type(faces))
print('vertices = ', vertices)
print('faces = ', faces)

For each vertex, we can also define properties other than the x, y, and z coordinates. For example, we can also define colors for each vertex. An example of parallel_plane_color.ply is shown here:

ply

format ascii 1.0

comment created for the book 3D Deep Learning with Python

element vertex 8

property float32 x

property float32 y

property float32 z

property uchar red

property uchar green

property uchar blue

element face 4

property list uint8 int32 vertex_indices

end_header

-1 -1 -1 255 0 0

1 -1 -1 255 0 0

1 1 -1 255 0 0

-1 1 -1 255 0 0

-1 -1 1 0 0 255

1 -1 1 0 0 255

1 1 1 0 0 255

-1 1 1 0 0 255

3 0 1 2

3 0 2 3

3 5 4 7

3 5 7 6

Note that in the preceding example, along with x, y, and z, we also define some additional properties for each vertex – that is, the red, green, and blue properties, all in the uchar data type. Now, each record for one vertex is one line of six numbers. The first three are x, y, and z coordinates. The following three numbers are the RGB values.

The mesh can be visualized by using ply_example3.py as follows:

import open3d

from pytorch3d.io import load_ply

mesh_file = "parallel_plane_color.ply"

print('visualizing the mesh using open3D')

mesh = open3d.io.read_triangle_mesh(mesh_file)

open3d.visualization.draw_geometries([mesh],

                     mesh_show_wireframe = True,

                     mesh_show_back_face = True)

print("Loading the same file with PyTorch3D")

vertices, faces = load_ply(mesh_file)

print('Type of vertices = ', type(vertices), ", type of faces = ", type(faces))

print('vertices = ', vertices)

print('faces = ', faces)

We also provide cow.ply, which is a real-world example of a 3D mesh. The readers can visualize the mesh using ply_example4.py.

By now, we have talked about the basic elements of the PLY file format, such as vertices and faces. Next, we will discuss the OBJ 3D data format.

3D data file format – OBJ files

In this section, we are going to discuss another widely used 3D data file format, the OBJ file format. The OBJ file format was first developed by Wavefront Technologies Inc. Like the PLY file format, the OBJ format also has both an ASCII version and a binary version. The binary version is proprietary and undocumented. So, we are going to discuss the ASCII version in this section.

Like the previous section, here we are going to learn the file format by looking at examples. The first example, cube.obj, is shown as follows. As you can guess, the OBJ file defines a mesh of a cube.

The first line, mtlib ./cube.mtl, declares the companion Material Template Library (MTL) file. The MTL file describes surface shading properties, which will be explained in the next code snippet.

For the o cube line, the starting letter, o, indicates that the line defines an object, where the name of the object is cube. Any line starting with # is a comment line – that is, the rest of the line will be ignored by a computer. Each line starts with v, which indicates that each line defines a vertex. For example, v -0.5 -0.5 0.5 defines a vertex with an x coordinate of 0.5, a y coordinate of 0.5, and a z coordination of 0.5. For each line starting with f, f indicates that each line contains a definition for one face. For example, the f 1 2 3 line defines a face, with its three vertices being the vertices with indices 1, 2, and 3.

The usemtl Door line declares that the surfaces declared after this line should be shaded using a material property defined in the MTL file, named Door:

mtllib ./cube.mtl

o cube

# Vertex list

v -0.5 -0.5 0.5

v -0.5 -0.5 -0.5

v -0.5 0.5 -0.5

v -0.5 0.5 0.5

v 0.5 -0.5 0.5

v 0.5 -0.5 -0.5

v 0.5 0.5 -0.5

v 0.5 0.5 0.5

# Point/Line/Face list

usemtl Door

f 1 2 3

f 6 5 8

f 7 3 2

f 4 8 5

f 8 4 3

f 6 2 1

f 1 3 4

f 6 8 7

f 7 2 6

f 4 5 1

f 8 3 7

f 6 1 5

The cube.mtl companion MTL file is shown as follows. The file defines a material property called Door:

newmtl Door

Ka  0.8 0.6 0.4

Kd  0.8 0.6 0.4

Ks  0.9 0.9 0.9

d  1.0

Ns  0.0

illum 2

We will not discuss these material properties in detail except for map_Kd. If you are curious, you can refer to a standard computer graphics textbook such as Computer Graphics: Principles and Practice. We will list some rough descriptions of these properties as follows, just for the sake of completeness:

Ka: Specifies an ambient color
Kd: Specifies a diffuse color
Ks: Specifies a specular color
Ns: Defines the focus of specular highlights
Ni: Defines the optical density (a.k.a index of refraction)
d: Specifies a factor for dissolve
illum: Specifies an illumination model
map_Kd: Specifies a color texture file to be applied to the diffuse reflectivity of the material

The cube.obj file can be opened by both Open3D and PyTorch3D. The following code snippet, obj_example1.py, can be downloaded from our GitHub repository:

import open3d

from pytorch3d.io import load_obj

mesh_file = "cube.obj"

print('visualizing the mesh using open3D')

mesh = open3d.io.read_triangle_mesh(mesh_file)

open3d.visualization.draw_geometries([mesh],

                 mesh_show_wireframe = True,

                 mesh_show_back_face = True)

print("Loading the same file with PyTorch3D")

vertices, faces, aux = load_obj(mesh_file)

print('Type of vertices = ', type(vertices))

print("Type of faces = ", type(faces))

print("Type of aux = ", type(aux))

print('vertices = ', vertices)

print('faces = ', faces)

print('aux = ', aux)

In the preceding code snippet, the defined mesh of a cube can be interactively visualized by using the Open3D draw_geometries function. The mesh will be shown in a window, and you can rotate, zoom into, and zoom out of the mesh using your mouse. The mesh can also be loaded using the PyTorch3D load_obj function. The load_obj function will return the vertices, faces, and aux variables, either in the format of a PyTorch tensor or tuples of PyTorch tensors.

An example output of the obj_example1.py code snippet is shown as follows:

visualizing the mesh using open3D

Loading the same file with PyTorch3D

Type of vertices =  <class 'torch.Tensor'>

Type of faces =  <class 'pytorch3d.io.obj_io.Faces'>

Type of aux =  <class 'pytorch3d.io.obj_io.Properties'>

vertices =  tensor([[-0.5000, -0.5000,  0.5000],

        [-0.5000, -0.5000, -0.5000],

        [-0.5000,  0.5000, -0.5000],

        [-0.5000,  0.5000,  0.5000],

        [ 0.5000, -0.5000,  0.5000],

        [ 0.5000, -0.5000, -0.5000],

        [ 0.5000,  0.5000, -0.5000],

        [ 0.5000,  0.5000,  0.5000]])

faces =  Faces(verts_idx=tensor([[0, 1, 2],

        [5, 4, 7],

        [6, 2, 1],

...

        [3, 4, 0],

        [7, 2, 6],

        [5, 0, 4]]), normals_idx=tensor([[-1, -1, -1],

        [-1, -1, -1],

        [-1, -1, -1],

        [-1, -1, -1],

...

        [-1, -1, -1],

        [-1, -1, -1]]), textures_idx=tensor([[-1, -1, -1],

        [-1, -1, -1],

        [-1, -1, -1],

...

        [-1, -1, -1],

        [-1, -1, -1]]), materials_idx=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

aux =  Properties(normals=None, verts_uvs=None, material_colors={'Door': {'ambient_color': tensor([0.8000, 0.6000, 0.4000]), 'diffuse_color': tensor([0.8000, 0.6000, 0.4000]), 'specular_color': tensor([0.9000, 0.9000, 0.9000]), 'shininess': tensor([0.])}}, texture_images={}, texture_atlas=None)

From the code snippet output here, we know that the returned vertices variable is a PyTorch tensor with a shape of 8 x 3, where each row is a vertex with the x, y, and z coordinates. The returned variable, faces, is a named tuple of three PyTorch tensors, verts_idx, normals_idx, and textures_idx. In the preceding example, all the normals_idx and textures_idx tensors are invalid because cube.obj does not include definitions for normal and textures. We will see in the next example how normals and textures can be defined in the OBJ file format. verts_idx is the vertex indices for each face. Note that the vertex indices are 0-indexed here in PyTorch3D, where the indices start from 0. However, the vertex indices in OBJ files are 1-indexed, where the indices start from 1. PyTorch3D has already made the conversion between the two ways of vertex indexing for us.

The return variable, aux, contains some extra mesh information. Note that the texture_image field of the aux variable is empty. The texture images are used in MTL files to define colors on vertices and faces. Again, we will show how to use this feature in our next example.

In the second example, we will use an example cube_texture.obj file to highlight more OBJ file features. The file is shown as follows.

The cube_texture.obj file is like the cube.obj file, except for the following differences:

There are some additional lines starting with vt. Each such line declares a texture vertex with x and y coordinates. Each texture vertex defines a color. The color is the pixel color at a so-called texture image, where the pixel location is the x coordinate of the texture vertex x width, and the y coordinate of the texture vertex x height. The texture image would be defined in the cube_texture.mtl companion.
There are additional lines starting with vn. Each such line declares a normal vector – for example, the vn 0.000000 -1.000000 0.000000 line declares a normal vector pointing to the negative z axis.
Each face definition line now contains more information about each vertex. For example, the f 2/1/1 3/2/1 4/3/1 line contains the definitions for the three vertices. The first triple, 2/1/1, defines the first vertex, the second triple, 3/2/1, defines the second vertex, and the third triple, 4/3/1, defines the third vertex. Each such triplet is the vertex index, texture vertex index, and normal vector index. For example, 2/1/1 defines a vertex, where the vertex geometric location is defined in the second line starting with v, the color is defined in the first line starting with vt, and the normal vector is defined in the first line starting with vn:

mtllib cube_texture.mtl

v 1.000000 -1.000000 -1.000000

v 1.000000 -1.000000 1.000000

v -1.000000 -1.000000 1.000000

v -1.000000 -1.000000 -1.000000

v 1.000000 1.000000 -0.999999

v 0.999999 1.000000 1.000001

v -1.000000 1.000000 1.000000

v -1.000000 1.000000 -1.000000

vt 1.000000 0.333333

vt 1.000000 0.666667

vt 0.666667 0.666667

vt 0.666667 0.333333

vt 0.666667 0.000000

vt 0.000000 0.333333

vt 0.000000 0.000000

vt 0.333333 0.000000

vt 0.333333 1.000000

vt 0.000000 1.000000

vt 0.000000 0.666667

vt 0.333333 0.333333

vt 0.333333 0.666667

vt 1.000000 0.000000

vn 0.000000 -1.000000 0.000000

vn 0.000000 1.000000 0.000000

vn 1.000000 0.000000 0.000000

vn -0.000000 0.000000 1.000000

vn -1.000000 -0.000000 -0.000000

vn 0.000000 0.000000 -1.000000

g main

usemtl Skin

s 1

f 2/1/1 3/2/1 4/3/1

f 8/1/2 7/4/2 6/5/2

f 5/6/3 6/7/3 2/8/3

f 6/8/4 7/5/4 3/4/4

f 3/9/5 7/10/5 8/11/5

f 1/12/6 4/13/6 8/11/6

f 1/4/1 2/1/1 4/3/1

f 5/14/2 8/1/2 6/5/2

f 1/12/3 5/6/3 2/8/3

f 2/12/4 6/8/4 3/4/4

f 4/13/5 3/9/5 8/11/5

f 5/6/6 1/12/6 8/11/6

The cube_texture.mtl companion is as follows, where the line starting with map_Kd declares the texture image. Here, wal67ar_small.jpg is a 250 x 250 RGB image file in the same folder as the MTL file:

newmtl Skin

Ka 0.200000 0.200000 0.200000

Kd 0.827451 0.792157 0.772549

Ks 0.000000 0.000000 0.000000

Ns 0.000000

map_Kd ./wal67ar_small.jpg

Again, we can use Open3D and PyTorch3D to load the mesh in the cube_texture.obj file – for example, by using the following obj_example2.py file:

import open3d

from pytorch3d.io import load_obj

import torch

mesh_file = "cube_texture.obj"

print('visualizing the mesh using open3D')

mesh = open3d.io.read_triangle_mesh(mesh_file)

open3d.visualization.draw_geometries([mesh],

                  mesh_show_wireframe = True,

                  mesh_show_back_face = True)

print("Loading the same file with PyTorch3D")

vertices, faces, aux = load_obj(mesh_file)

print('Type of vertices = ', type(vertices))

print("Type of faces = ", type(faces))

print("Type of aux = ", type(aux))

print('vertices = ', vertices)

print('faces = ', faces)

print('aux = ', aux)

texture_images = getattr(aux, 'texture_images')

print('texture_images type = ', type(texture_images))

print(texture_images['Skin'].shape)

The output of the obj_example2.py code snippet should be as follows:

visualizing the mesh using open3D

Loading the same file with PyTorch3D

Type of vertices =  <class 'torch.Tensor'>

Type of faces =  <class 'pytorch3d.io.obj_io.Faces'>

Type of aux =  <class 'pytorch3d.io.obj_io.Properties'>

vertices =  tensor([[ 1.0000, -1.0000, -1.0000],

        [ 1.0000, -1.0000,  1.0000],

        [-1.0000, -1.0000,  1.0000],

        [-1.0000, -1.0000, -1.0000],

        [ 1.0000,  1.0000, -1.0000],

        [ 1.0000,  1.0000,  1.0000],

        [-1.0000,  1.0000,  1.0000],

        [-1.0000,  1.0000, -1.0000]])

faces =  Faces(verts_idx=tensor([[1, 2, 3],

        [7, 6, 5],

        [4, 5, 1],

        [5, 6, 2],

        [2, 6, 7],

        [0, 3, 7],

        [0, 1, 3],

...

        [3, 3, 3],

        [4, 4, 4],

        [5, 5, 5]]), textures_idx=tensor([[ 0,  1,  2],

        [ 0,  3,  4],

        [ 5,  6,  7],

        [ 7,  4,  3],

        [ 8,  9, 10],

        [11, 12, 10],

...

        [12,  8, 10],

        [ 5, 11, 10]]), materials_idx=tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

aux =  Properties(normals=tensor([[ 0., -1.,  0.],

        [ 0.,  1.,  0.],

        [ 1.,  0.,  0.],

        [-0.,  0.,  1.],

        [-1., -0., -0.],

        [ 0.,  0., -1.]]), verts_uvs=tensor([[1.0000, 0.3333],

...

        [0.3333, 0.6667],

        [1.0000, 0.0000]]), material_colors={'Skin': {'ambient_color': tensor([0.2000, 0.2000, 0.2000]), 'diffuse_color': tensor([0.8275, 0.7922, 0.7725]), 'specular_color': tensor([0., 0., 0.]), 'shininess': tensor([0.])}}, texture_images={'Skin': tensor([[[0.2078, 0.1765, 0.1020],

         [0.2039, 0.1725, 0.0980],

         [0.1961, 0.1647, 0.0902],

         ...,

          [0.2235, 0.1882, 0.1294]]])}, texture_atlas=None)

texture_images type =  <class 'dict'>

Skin

torch.Size([250, 250, 3])

Note

This is not the complete output; please check this while you run the code.

Compared with the output of the obj_example1.py code snippet, the preceding output has the following differences.

The normals_idx and textures_idx fields of the faces variable all contain valid indices now instead of taking a -1 value.
The normals field of the aux variable is a PyTorch tensor now, instead of being None.
The verts_uvs field of the aux variable is a PyTorch tensor now, instead of being None.
The texture_images field of the aux variable is not an empty dictionary any longer. The texture_images dictionary contains one entry with a key, Skin, and a PyTorch tensor with a shape of (250, 250, 3). This tensor is exactly the same as the image contained in the wal67ar_small.jpg file, as defined in the mtl_texture.mtl file.

We have learned how to use basic 3D data file formats and PLY and OBJ files. In the next section, we will learn the basic concepts of 3D coordination systems.

Understanding 3D coordination systems

In this section, we are going to learn about the frequently used coordination systems in PyTorch3D. This section is adapted from PyTorch’s documentation of camera coordinate systems: https://pytorch3d.org/docs/cameras. To understand and use the PyTorch3D rendering system, we usually need to know these coordination systems and how to use them. As discussed in the previous sections, 3D data can be represented by points, faces, and voxels. The location of each point can be represented by a set of x, y, and z coordinates, with respect to a certain coordination system. We usually need to define and use multiple coordination systems, depending on which one is most convenient.

Figure 1.2 – A world coordinate system, where the origin and axis are defined independently of the camera positions

The first coordination system we frequently use is called the world coordination system. This coordinate system is a 3D coordination system chosen with respect to all the 3D objects, such that the locations of the 3D objects can be easy to determine. Usually, the axis of the world coordination system does not agree with the object orientation or camera orientation. Thus, there exist some non-zero rotations and displacements between the origin of the world coordination system and the object and camera orientations. A figure showing the world coordination system is shown here:

Figure 1.3 – The camera view coordinate system, where the origin is at the camera projection center and the three axes are defined according to the imaging plane

Since the axis of the world coordination system usually does not agree with the camera orientation, for many situations, it is more convenient to define and use a camera view coordination system. In PyTorch3D, the camera view coordination system is defined such that the origin is at the projection point of the camera, the x axis points to the left, the y axis points upward, and the z axis points to the front.

Figure 1.4 – The NDC coordinate system, in which the volume is confined to the ranges that the camera can render

The normalized device coordinate (NDC) confines the volume that a camera can render. The x coordinate values in the NDC space range from -1 to +1, as do the y coordinate values. The z coordinate values range from znear to zfar, where znear is the nearest depth and zfar is the farthest depth. Any object out of this znear to zfar range would not be rendered by the camera.

Finally, the screen coordinate system is defined in terms of how the rendered images are shown on our screens. The coordinate system contains the x coordinate as the columns of the pixels, the y coordinate as the rows of the pixels, and the z coordinate corresponding to the depth of the object.

To render the 3D object correctly on our 2D screens, we need to switch between these coordinate systems. Luckily, these conversions can be easily carried out by using the PyTorch3D camera models. We will discuss coordinatation conversion in more detail after we discuss the camera models.

Understanding camera models

In this section, we will learn about camera models. In 3D deep learning, usually we need to use 2D images for 3D detection. Either 3D information is detected solely from 2D images, or 2D images are fused with depth for high accuracy. Nevertheless, camera models are essential to build correspondence between the 2D space and the 3D world.

In PyTorch3D, there are two major camera models, the orthographic camera defined by the OrthographicCameras class and the perspective camera model defined by the PerspectiveCameras class. The following figure shows the differences between the two camera models.

Figure 1.5 – Two major camera models implemented in PyTorch3D, perspective and orthographic

The orthographic cameras use orthographic projections to map objects in the 3D world to 2D images, while the perspective cameras use perspective projections to map objects in the 3D world to 2D images. The orthographic projections map objects to 2D images, disregarding the object depth. For example, just as shown in the figure, two objects with the same geometric size at different depths would be mapped to 2D images of the same size. On the other hand, in perspective projections, if an object moved far away from the camera, it would be mapped to a smaller size on the 2D images.

Now that we have learned about the basic concept of camera models, let us look at some coding examples to see how we can create and use these camera models.

Coding for camera models and coordination systems

In this section, we are going to leverage everything we have learned to build a concrete camera model and convert between different coordinate systems, using a concrete code snippet example written in Python and PyTorch3D:

First, we are going to use the following mesh defined by a cube.obj file. Basically, the mesh is a cube:

mtllib ./cube.mtl
o cube
# Vertex list
v -50 -50 20
v -50 -50 10
v -50 50 10
v -50 50 20
v 50 -50 20
v 50 -50 10
v 50 50 10
v 50 50 20
# Point/Line/Face list
usemtl Door
f 1 2 3
f 6 5 8
f 7 3 2
f 4 8 5
f 8 4 3
f 6 2 1
f 1 3 4
f 6 8 7
f 7 2 6
f 4 5 1
f 8 3 7
f 6 1 5
# End of file

The example code snippet is camera.py, which can be downloaded from the book’s GitHub repository.

Let us import all the modules that we need:

import open3d
import torch
import pytorch3d
from pytorch3d.io import load_obj
from scipy.spatial.transform import Rotation as Rotation
from pytorch3d.renderer.cameras import PerspectiveCameras

We can load and visualize the mesh by using Open3D’s draw_geometrics function:

#Load meshes and visualize it with Open3D
mesh_file = "cube.obj"
print('visualizing the mesh using open3D')
mesh = open3d.io.read_triangle_mesh(mesh_file)
open3d.visualization.draw_geometries([mesh],
                 mesh_show_wireframe = True,
                 mesh_show_back_face = True)

We define a camera variable as a PyTorch3D PerspectiveCamera object. The camera here is actually mini-batched. For example, the rotation matrix, R, is a PyTorch tensor with a shape of [8, 3, 3], which actually defines eight cameras, each with one of the eight rotation matrices. This is the same case for all other camera parameters, such as image sizes, focal lengths, and principal points:

#Define a mini-batch of 8 cameras
image_size = torch.ones(8, 2)
image_size[:,0] = image_size[:,0] * 1024
image_size[:,1] = image_size[:,1] * 512
image_size = image_size.cuda()
focal_length = torch.ones(8, 2)
focal_length[:,0] = focal_length[:,0] * 1200
focal_length[:,1] = focal_length[:,1] * 300
focal_length = focal_length.cuda()
principal_point = torch.ones(8, 2)
principal_point[:,0] = principal_point[:,0] * 512
principal_point[:,1] = principal_point[:,1] * 256
principal_point = principal_point.cuda()
R = Rotation.from_euler('zyx', [
    [n*5, n, n]  for n in range(-4, 4, 1)], degrees=True).as_matrix()
R = torch.from_numpy(R).cuda()
T = [ [n, 0, 0] for n in range(-4, 4, 1)]
T = torch.FloatTensor(T).cuda()
camera = PerspectiveCameras(focal_length = focal_length,
                            principal_point = principal_point,
                            in_ndc = False,
                            image_size = image_size,
                            R = R,
                            T = T,
                            device = 'cuda')

Once we have defined the camera variable, we can call the get_world_to_view_transform class member method to obtain a Transform3d object, world_to_view_transform. We can then use the transform_points member method to convert from world coordination to camera view coordination. Similarly, we can also use the get_full_projection_transform member method to obtain a Transform3d object, which is for the conversion from world coordination to screen coordination:

world_to_view_transform = camera.get_world_to_view_transform()
world_to_screen_transform = camera.get_full_projection_transform()
#Load meshes using PyTorch3D
vertices, faces, aux = load_obj(mesh_file)
vertices = vertices.cuda()
world_to_view_vertices = world_to_view_transform.transform_points(vertices)
world_to_screen_vertices = world_to_screen_transform.transform_points(vertices)
print('world_to_view_vertices = ', world_to_view_vertices)
print('world_to_screen_vertices = ', world_to_screen_vertices

The code example shows the basic ways that PyTorch3D cameras can be used and how easy it is to switch between different coordinate systems using PyTorch3D.

Summary

In this chapter, we first learned how to set up our development environment. We then talked about the most widely used 3D data representations. We then explored some concrete examples of 3D data representation by learning about the 3D data file formats, the PLY format and the OBJ format. Then, we learned about the basic concepts of 3D coordination systems and camera models. In the last part of the chapter, we learned how to build camera models and convert between different coordination systems through a hands-on coding example.

In the next chapter, we will talk about more important 3D deep learning concepts, such as rendering to convert 3D models to 2D images, heterogeneous mini-batching, and several ways to represent rotations.

About the Authors

Xudong Ma

Xudong Ma is a Staff Machine Learning engineer with Grabango Inc. at Berkeley California. He was a Senior Machine Learning Engineer at Facebook(Meta) Oculus and worked closely with the 3D PyTorch Team on 3D facial tracking projects. He has many years of experience working on computer vision, machine learning and deep learning. He holds a Ph.D. in Electrical and Computer Engineering.
Browse publications by this author
Vishakh Hegde

Vishakh Hegde is a Machine Learning and Computer Vision researcher. He has over 7 years of experience in this field during which he has authored multiple well cited research papers and published patents. He holds a masters from Stanford University specializing in applied mathematics and machine learning, and a BS and MS in Physics from IIT Madras. He previously worked at Schlumberger and Matroid. He is a Senior Applied Scientist at Ambient.ai, where he helped build their weapon detection system which is deployed at several Global Fortune 500 companies. He is now leveraging his expertise and passion to solve business challenges to build a technology startup in Silicon Valley. You can learn more about him on his personal website.
Browse publications by this author
Lilit Yolyan

Lilit Yolyan is a machine learning researcher working on her Ph.D. at YSU. Her research focuses on building computer vision solutions for smart cities using remote sensing data. She has 5 years of experience in the field of computer vision and has worked on a complex driver safety solution to be deployed by many well-known car manufacturing companies.
Browse publications by this author