Introducing optimization
When we go through official documentation it is easy to feel lost or overwhelmed by the information about different hardware architectures. Every new GPU release brings hardware innovations that affect the way we need to write our code, and knowing our hardware is of paramount importance if we are to squeeze out every last drop of performance. Furthermore, knowing the details of our hardware brings to light any limitations we may have in terms of memory bandwidth, available memory, and clock speed.
It is also our responsibility to gauge the effort needed to achieve greater performance, and that is when using a profiler is a great advantage, for it enables us to figure out what the best items are to optimize. Code optimization may lead to code that is harder to read, because we need to deal with many more hardware details, but that does not mean that the code will be impossible to maintain. We just need to be careful and follow some software engineering...