Overlaying Multiple Operations
An interesting possibility with GPUs is to execute multiple operations at the same time. Remember that NVIDIA GPUs have special hardware dedicated to performing memory transfers separate from the execution cores. By using CUDA streams, it is possible to execute memory transfers to and from the GPU at the same time as we execute a kernel, thereby overlaying the operations. This means we can have three things happening in an interleaved way.
Another possibility is to have more than one GPU on the same machine, although its higher cost means this is a less common setup. We will touch on this interesting alternative in a brief overview; it requires careful thinking about the algorithm to ensure we make the most of our devices.
But before we jump into those interesting topics, we will start this chapter by exploring how to use VS Code to debug our code. This is a useful skill, especially when we have multiple operations happening at the same time...