You're reading from GPU Programming with C++ and CUDA Uncover effective techniques for writing efficient GPU-parallel C++ applications

Product type Paperback

Published in Aug 2025

Publisher Packt

ISBN-13 9781805124542

Length 270 pages

Edition 1st Edition

Languages

C++

Tools

CUDA

Concepts

Application Development

Author (1):

Paulo Motta

View More author details

Table of Contents (17) Chapters

Preface

1. Understanding Where We Are Heading

2. Introduction to Parallel Programming FREE CHAPTER

3. Setting Up Your Development Environment

4. Hello CUDA

5. Hello Again, but in Parallel

6. Bring It On!

7. A Closer Look into the World of GPUs

8. Parallel Algorithms with CUDA

9. Performance Strategies

10. Moving Forward

11. Overlaying Multiple Operations

12. Exposing Your Code to Python

13. Exploring Existing GPU Models

14. Unlock Your Book’s Exclusive Benefits

How to unlock these benefits in three easy steps

15. Other Books You May Enjoy

16. Index

Considering other algorithms

Before we close this chapter, we are going to look at some interesting results from other algorithms, which might seem to represent better alternatives to our first, naive implementation. Remembering the discussion about coalesced memory access from Chapter 6, it may seem natural to suppose that we would obtain higher performance if we changed our kernel so that each CUDA thread could process an entire row of our matrix. On the other hand, we might think that having each CUDA thread calculate an entire column would yield the worst performance possible. Let’s see the code for both kernels.

__global__ void matrixMulKernel_row(float *A, float *B, float *C,
    int width) {
    int row = threadIdx.x + blockIdx.x * blockDim.x;
    if (row < width) {
        for (int col = 0; col < width; col++) {
            float sum = 0.0f;
            for (int i = 0; i < width; i++) {
                sum += A[row * width + i] * B[i * N + col];
      ...

The rest of the chapter is locked

Tech Concepts

Programming languages

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

You're reading from GPU Programming with C++ and CUDA Uncover effective techniques for writing efficient GPU-parallel C++ applications

Table of Contents (17) Chapters

Considering other algorithms

Authors (1)

Personalised recommendations for you

You're reading from GPU Programming with C++ and CUDA Uncover effective techniques for writing efficient GPU-parallel C++ applications

Table of Contents (17) Chapters

Considering other algorithms

Authors (1)

Personalised recommendations for you

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access