You're reading from Mastering Graphics Programming with Vulkan

Product type Book

Published in Feb 2023

Publisher Packt

ISBN-13 9781803244792

Pages 382 pages

Edition 1st Edition

Languages

Concepts

Game Optimization

Authors (2):

Marco Castorina

Gabriel Sassone

View More author details

Table of Contents (21) Chapters

Preface

1. Part 1: Foundations of a Modern Rendering Engine

2. Chapter 1: Introducing the Raptor Engine and Hydra

3. Chapter 2: Improving Resources Management

4. Chapter 3: Unlocking Multi-Threading

5. Chapter 4: Implementing a Frame Graph

6. Chapter 5: Unlocking Async Compute

7. Part 2: GPU-Driven Rendering

8. Chapter 6: GPU-Driven Rendering

9. Chapter 7: Rendering Many Lights with Clustered Deferred Rendering

10. Chapter 8: Adding Shadows Using Mesh Shaders

11. Chapter 9: Implementing Variable Rate Shading

12. Chapter 10: Adding Volumetric Fog

13. Part 3: Advanced Rendering Techniques

14. Chapter 11: Temporal Anti-Aliasing

15. Chapter 12: Getting Started with Ray Tracing

16. Chapter 13: Revisiting Shadows with Ray Tracing

17. Chapter 14: Adding Dynamic Diffuse Global Illumination with Ray Tracing

18. Chapter 15: Adding Reflections with Ray Tracing

19. Index

Why subscribe?

20. Other Books You May Enjoy

Unlocking Async Compute

In this chapter, we are going to improve our renderer by allowing compute work to be done in parallel with graphics tasks. So far, we have been recording and submitting all of our work to a single queue. We can still submit compute tasks to this queue to be executed alongside graphics work: in this chapter, for instance, we have started using a compute shader for the fullscreen lighting rendering pass. We don’t need a separate queue in this case as we want to reduce the amount of synchronization between separate queues.

However, it might be beneficial to run other compute workloads on a separate queue and allow the GPU to fully utilize its compute units. In this chapter, we are going to implement a simple cloth simulation using compute shaders that will run on a separate compute queue. To unlock this new functionality, we will need to make some changes to our engine.

In this chapter, we’re going to cover the following main topics:

...

Technical requirements

The code for this chapter can be found at the following URL: https://github.com/PacktPublishing/Mastering-Graphics-Programming-with-Vulkan/tree/main/source/chapter5

Replacing multiple fences with a single timeline semaphore

In this section, we are going to explain how fences and semaphores are currently used in our renderer and how to reduce the number of objects we must use by taking advantage of timeline semaphores.

Our engine already supports rendering multiple frames in parallel using fences. Fences must be used to ensure the GPU has finished using resources for a given frame. This is accomplished by waiting on the CPU before submitting a new batch of commands to the GPU.

Figure 5.1 – The CPU is working on the current frame while the GPU is rendering the previous frame

There is a downside, however; we need to create a fence for each frame in flight. This means we will have to manage at least two fences for double buffering and three if we want to support triple buffering.

We also need multiple semaphores to ensure the GPU waits for certain operations to complete before moving on. For instance, we...

Adding a separate queue for async compute

In this section, we are going to illustrate how to use separate queues for graphics and compute work to make full use of our GPU. Modern GPUs have many generic compute units that can be used both for graphics and compute work. Depending on the workload for a given frame (shader complexity, screen resolution, dependencies between rendering passes, and so on), it’s possible that the GPU might not be fully utilized.

Moving some of the computation done on the CPU to the GPU using compute shaders can increase performance and lead to better GPU utilization. This is possible because the GPU scheduler can determine if any of the compute units are idle and assign work to them to overlap existing work:

Figure 5.3 – Top: graphics workload is not fully utilizing the GPU; Bottom: compute workload can take advantage of unused resources for optimal GPU utilization

In the remainder of this section, we are going...

Implementing cloth simulation using async compute

In this section, we are going to implement a simple cloth simulation on the GPU as an example use case of a compute workload. We start by explaining why running some tasks on the GPU might be beneficial. Next, we provide an overview of compute shaders. Finally, we show how to port code from the CPU to the GPU and highlight some of the differences between the two platforms.

Benefits of using compute shaders

In the past, physics simulations mainly ran on the CPU. GPUs only had enough compute capacity for graphics work, and most stages in the pipeline were implemented by dedicated hardware blocks that could only perform one task. As GPUs evolved, pipeline stages moved to generic compute blocks that could perform different tasks.

This increase both in flexibility and compute capacity has allowed engine developers to move some workloads on the GPU. Aside from raw performance, running some computations on the GPU avoids expensive...

Summary

In this chapter, we have built the foundations to support compute shaders in our renderer. We started by introducing timeline semaphores and how they can be used to replace multiple semaphores and fences. We have shown how to wait for a timeline semaphore on the CPU and how a timeline semaphore can be used as part of a queue submission, either for it to be signaled or to be waited on.

Next, we demonstrated how to use the newly introduced timeline semaphore to synchronize execution across the graphics and compute queue.

In the last section, we showed an example of how to approach porting code written for the CPU to the GPU. We first explained some of the benefits of running computations on the GPU. Next, we gave an overview of the execution model for compute shaders and the configuration of local and global workgroup sizes. Finally, we gave a concrete example of a compute shader for cloth simulation and highlighted the main differences with the same code written for the...