Summary
In this chapter we’ve covered the internal details needed to write programs in a more considered way. It will be important to pay attention to these details, indeed rather more than is the case when writing single-threaded programs for CPUs. When dealing with GPUs, we have to consider things like having only 48KB of space to share between threads, and how to use these resources wisely in order to speed up our code a little bit more. We have seen that by using different streams, we can execute many things at once, and can even copy data to and from the device at the same time; but again, we have to be mindful of the relevant details. Rest assured that although this seems complicated now, it will become easier. Shortly we will be writing programs that use all these resources, and that will help the details stick in our minds.
In the next chapter we’ll cover practical aspects of the concepts we’ve got to know here. We will discuss what makes a parallel...