Summary
In this chapter, we discussed what GPGPU is, what the motivation behind it is, and what the application areas are. We explained that GPU computing is valuable for problems that can be solved with data parallelism, i.e., dividing the problem into small chunks and running the same task on all those chunks. A GPU has a high theoretical compute capacity due to its massive number of cores. If a problem cannot be split up and distributed over multiple cores, the GPU will not speed up the computation.
After considering theoretical compute capacity, we refined our estimates of code speedup using Amdahl's law and showed that the fraction of non-parallelizable code eventually dominates. We illustrated this with an example of calculating a Julia set fractal. We profiled the code and related the results back to Amdahl's law. We also discussed the limitations of Amdahl's law and other factors that limit parallelism. We demonstrated the effect of these factors on the performance of the Julia set example by measuring parallelization efficiency. In the process, we briefly learned about Numba, JIT compilation, and parallelization over CPU cores. We also compared our CPU implementation against a GPU implementation, which showed that we cannot simply compare a CPU thread and a GPU thread.
Even though we have not learned how to program a GPU, we are now well equipped to estimate, measure, and recognize the limitations of using a GPU for speeding up computations. In the next chapter, we will learn how to set up an environment that will allow us to write and execute CUDA code.