You're reading from GPU-Accelerated Computing with Python 3 and CUDA From low-level kernels to real-world applications in scientific computing and machine learning

Product type Paperback

Published in Mar 2026

Publisher Packt

ISBN-13 9781803245423

Length 534 pages

Edition 1st Edition

Languages

Python

Tools

CUDA

Concepts

Programming Language

Authors (2):

Niels Cautaerts

Hossein Ghorbanfekr

View More author details

Table of Contents (24) Chapters

Preface

Free benefits with your book

1. Part 1: Fundamentals of GPU programming with CUDA in Python 3

2. Chapter 1: Why GPU Programming with CUDA in Python 3? FREE CHAPTER

3. Chapter 2: Setting Up a GPU Programming Environment Locally and in the Cloud

4. Chapter 3: Writing and Executing CUDA Kernels with Numba-CUDA

5. Chapter 4: Profiling and Debugging CUDA Code

6. Part 2: Performance Optimization and Advanced CUDA Topics

7. Chapter 5: Optimizing the Performance of CUDA Code

8. Chapter 6: Enabling Concurrency Using CUDA Streams

9. Chapter 7: Scaling to Multiple GPUs

10. Part 3: Using High-Level Python Libraries for GPU Computation

11. Chapter 8: Bringing NumPy and SciPy to the GPU with CuPy

12. Chapter 9: Bringing pandas and scikit-learn to the GPU with Rapids

13. Chapter 10: Solving Optimization Problems on the GPU with JAX

14. Part 4: Real-World Example Applications

15. Chapter 11: Solving the Heat Equation on the GPU

16. Chapter 12: Image Processing and Computer Vision on the GPU

17. Chapter 13: Simulating Atomic Interactions on the GPU

18. Chapter 14: Implementing Your Own Transformer-Based Language Model

19. Part 5: Beyond This Book

20. Chapter 15: Expanding and Deepening Your GPU Programming Knowledge

21. Chapter 16: Unlock Your Exclusive Benefits

Unlock this Book's Free Benefits in 3 Easy Steps

22. Other Books You May Enjoy

Subscribe to Deep Engineering

23. Index

Summary

In this chapter, we discussed what GPGPU is, what the motivation behind it is, and what the application areas are. We explained that GPU computing is valuable for problems that can be solved with data parallelism, i.e., dividing the problem into small chunks and running the same task on all those chunks. A GPU has a high theoretical compute capacity due to its massive number of cores. If a problem cannot be split up and distributed over multiple cores, the GPU will not speed up the computation.

After considering theoretical compute capacity, we refined our estimates of code speedup using Amdahl's law and showed that the fraction of non-parallelizable code eventually dominates. We illustrated this with an example of calculating a Julia set fractal. We profiled the code and related the results back to Amdahl's law. We also discussed the limitations of Amdahl's law and other factors that limit parallelism. We demonstrated the effect of these factors on the performance of the Julia set example by measuring parallelization efficiency. In the process, we briefly learned about Numba, JIT compilation, and parallelization over CPU cores. We also compared our CPU implementation against a GPU implementation, which showed that we cannot simply compare a CPU thread and a GPU thread.

Even though we have not learned how to program a GPU, we are now well equipped to estimate, measure, and recognize the limitations of using a GPU for speeding up computations. In the next chapter, we will learn how to set up an environment that will allow us to write and execute CUDA code.

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You have been reading a chapter from

GPU-Accelerated Computing with Python 3 and CUDA

Published in: Mar 2026

Publisher: Packt

ISBN-13: 9781803245423

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at ₹800/month. Cancel anytime

Authors (2)

Niels Cautaerts

Dr. Niels Cautaerts has 10 years of experience writing Python for scientific applications. Five years ago he became interested to leverage hardware acceleration in his code. Soon after, he began contributing CUDA kernels to open source projects in his field of research. He has since applied his expertise to build GPU accelerated code in various projects, including a low latency framework for object detection in continuous image streams. Niels maintains a small following on YouTube and Medium, where he shares educational content about tech. Currently Niels works as a research software developer and data scientist. He has also worked as a big-data engineer. Niels has a background in materials science and holds a Ph.D. in applied Physics.

See other products by Niels Cautaerts

Hossein Ghorbanfekr

Hossein Ghorbanfekr is a computational physicist with over a decade of expertise in scientific programming for material modeling, specializing in C/C++ and Python. During his Ph.D., he wrote various codes, utilizing parallel computing and GPU acceleration. Since 2020, he has been working as a data scientist, focusing on machine learning and high-performance computing in research projects. Hossein has contributed to the development of an object detection framework for waste stream analysis and created GEOBERTje, a domain-specific large language model in geology. His recent work includes Pantea, an open-source, GPU-accelerated machine learning framework for molecular simulations.

See other products by Hossein Ghorbanfekr