Summary
In this chapter we learned how to access GPU kernels from Python, which is an important topic that opens up a whole new world for us to explore. We can mix the acceleration that GPU kernels can achieve with the ease of use of Python.
We looked at two distinct approaches to effecting the integration: by using ctypes and by creating a Python extension. The great advantage of the ctypes approach is that it is purely Python-based, and does not involve compiling anything extra. The second approach allows finer grained control, but we must compile the extension to make it available. Two variants of the Python extension approach were discussed, the first involving copying data and the second using numpy internal data pointers directly, which is much faster.
Regardless of the approach taken, we should expend some effort on reusing memory, or we lose the gains that the GPU can provide, as we saw in the performance analysis section.
In the next chapter we discuss where we...