Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Mastering Graphics Programming with Vulkan

You're reading from  Mastering Graphics Programming with Vulkan

Product type Book
Published in Feb 2023
Publisher Packt
ISBN-13 9781803244792
Pages 382 pages
Edition 1st Edition
Languages
Authors (2):
Marco Castorina Marco Castorina
Profile icon Marco Castorina
Gabriel Sassone Gabriel Sassone
Profile icon Gabriel Sassone
View More author details

Table of Contents (21) Chapters

Preface 1. Part 1: Foundations of a Modern Rendering Engine
2. Chapter 1: Introducing the Raptor Engine and Hydra 3. Chapter 2: Improving Resources Management 4. Chapter 3: Unlocking Multi-Threading 5. Chapter 4: Implementing a Frame Graph 6. Chapter 5: Unlocking Async Compute 7. Part 2: GPU-Driven Rendering
8. Chapter 6: GPU-Driven Rendering 9. Chapter 7: Rendering Many Lights with Clustered Deferred Rendering 10. Chapter 8: Adding Shadows Using Mesh Shaders 11. Chapter 9: Implementing Variable Rate Shading 12. Chapter 10: Adding Volumetric Fog 13. Part 3: Advanced Rendering Techniques
14. Chapter 11: Temporal Anti-Aliasing 15. Chapter 12: Getting Started with Ray Tracing 16. Chapter 13: Revisiting Shadows with Ray Tracing 17. Chapter 14: Adding Dynamic Diffuse Global Illumination with Ray Tracing 18. Chapter 15: Adding Reflections with Ray Tracing 19. Index 20. Other Books You May Enjoy

Unlocking Async Compute

In this chapter, we are going to improve our renderer by allowing compute work to be done in parallel with graphics tasks. So far, we have been recording and submitting all of our work to a single queue. We can still submit compute tasks to this queue to be executed alongside graphics work: in this chapter, for instance, we have started using a compute shader for the fullscreen lighting rendering pass. We don’t need a separate queue in this case as we want to reduce the amount of synchronization between separate queues.

However, it might be beneficial to run other compute workloads on a separate queue and allow the GPU to fully utilize its compute units. In this chapter, we are going to implement a simple cloth simulation using compute shaders that will run on a separate compute queue. To unlock this new functionality, we will need to make some changes to our engine.

In this chapter, we’re going to cover the following main topics:

    ...

Technical requirements

Replacing multiple fences with a single timeline semaphore

In this section, we are going to explain how fences and semaphores are currently used in our renderer and how to reduce the number of objects we must use by taking advantage of timeline semaphores.

Our engine already supports rendering multiple frames in parallel using fences. Fences must be used to ensure the GPU has finished using resources for a given frame. This is accomplished by waiting on the CPU before submitting a new batch of commands to the GPU.

Figure 5.1 – The CPU is working on the current frame while the GPU is rendering the previous frame

Figure 5.1 – The CPU is working on the current frame while the GPU is rendering the previous frame

There is a downside, however; we need to create a fence for each frame in flight. This means we will have to manage at least two fences for double buffering and three if we want to support triple buffering.

We also need multiple semaphores to ensure the GPU waits for certain operations to complete before moving on. For instance, we...

Adding a separate queue for async compute

In this section, we are going to illustrate how to use separate queues for graphics and compute work to make full use of our GPU. Modern GPUs have many generic compute units that can be used both for graphics and compute work. Depending on the workload for a given frame (shader complexity, screen resolution, dependencies between rendering passes, and so on), it’s possible that the GPU might not be fully utilized.

Moving some of the computation done on the CPU to the GPU using compute shaders can increase performance and lead to better GPU utilization. This is possible because the GPU scheduler can determine if any of the compute units are idle and assign work to them to overlap existing work:

Figure 5.3 – Top: graphics workload is not fully utilizing the GPU; Bottom: compute workload can take advantage of unused resources for optimal GPU utilization

Figure 5.3 – Top: graphics workload is not fully utilizing the GPU; Bottom: compute workload can take advantage of unused resources for optimal GPU utilization

In the remainder of this section, we are going...

Implementing cloth simulation using async compute

In this section, we are going to implement a simple cloth simulation on the GPU as an example use case of a compute workload. We start by explaining why running some tasks on the GPU might be beneficial. Next, we provide an overview of compute shaders. Finally, we show how to port code from the CPU to the GPU and highlight some of the differences between the two platforms.

Benefits of using compute shaders

In the past, physics simulations mainly ran on the CPU. GPUs only had enough compute capacity for graphics work, and most stages in the pipeline were implemented by dedicated hardware blocks that could only perform one task. As GPUs evolved, pipeline stages moved to generic compute blocks that could perform different tasks.

This increase both in flexibility and compute capacity has allowed engine developers to move some workloads on the GPU. Aside from raw performance, running some computations on the GPU avoids expensive...

Summary

In this chapter, we have built the foundations to support compute shaders in our renderer. We started by introducing timeline semaphores and how they can be used to replace multiple semaphores and fences. We have shown how to wait for a timeline semaphore on the CPU and how a timeline semaphore can be used as part of a queue submission, either for it to be signaled or to be waited on.

Next, we demonstrated how to use the newly introduced timeline semaphore to synchronize execution across the graphics and compute queue.

In the last section, we showed an example of how to approach porting code written for the CPU to the GPU. We first explained some of the benefits of running computations on the GPU. Next, we gave an overview of the execution model for compute shaders and the configuration of local and global workgroup sizes. Finally, we gave a concrete example of a compute shader for cloth simulation and highlighted the main differences with the same code written for the...

Further reading

Synchronization is likely one of the most complex aspects of Vulkan. We have mentioned some of the concepts in this and previous chapters. If you want to improve your understanding, we suggest reading the following resources:

We only touched the surface when it comes to compute shaders. The following resources go more in depth and also provide suggestions to get the most out of individual devices:

Real-time cloth simulation for computer graphics has been a subject of study...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Mastering Graphics Programming with Vulkan
Published in: Feb 2023 Publisher: Packt ISBN-13: 9781803244792
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}