Summary
In this chapter we learned what makes a parallel algorithm really parallel, and how to look for those special parts that can be used to reduce the execution time. We also discussed the idea that we are looking for a specific type of parallelism that will fit in well with the way GPUs execute programs.
We discussed synchronization, now with more context, and saw it in practice with our example programs. Those addressed real-world problems like matrix multiplication, numerical integration, and parallel reductions, we tackled the hard challenge of sorting in parallel, and finally we saw how to process millions of data points from sensors using a weighted moving average.
In the next chapter we will discuss performance strategies to improve application efficiency, using techniques that are more advanced in order to make the most of our hardware.
Unlock this book’s exclusive benefits nowScan... |