You're reading from Hands-On GPU Programming with Python and CUDA

Product typeBook

Published inNov 2018

Reading LevelBeginner

PublisherPackt

ISBN-139781788993913

Edition1st Edition

Languages

Python

Tools

CUDA

Concepts

Graphics Programming

Author (1)

Dr. Brian Tuomanen

Streams, Events, Contexts, and Concurrency

In the prior chapters, we saw that there are two primary operations we perform from the host when interacting with the GPU:

Copying memory data to and from the GPU
Launching kernel functions

We know that within a single kernel, there is one level of concurrency among its many threads; however, there is another level of concurrency over multiple kernels and GPU memory operations that are also available to us. This means that we can launch multiple memory and kernel operations at once, without waiting for each operation to finish. However, on the other hand, we will have to be somewhat organized to ensure that all inter-dependent operations are synchronized; this means that we shouldn't launch a particular kernel until its input data is fully copied to the device memory, or we shouldn't copy the output data of a launched kernel...

Technical requirements

A Linux or Windows 10 PC with a modern NVIDIA GPU (2016—onward) is required for this chapter, with all necessary GPU drivers and the CUDA Toolkit (9.0–onward) installed. A suitable Python 2.7 installation (such as Anaconda Python 2.7) with the PyCUDA module is also required.

This chapter's code is also available on GitHub:

https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA

For more information about the prerequisites, check the Preface of this book, and for the software and hardware requirements, check the README in https://github.com/PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA.

CUDA device synchronization

Before we can use CUDA streams, we need to understand the notion of device synchronization. This is an operation where the host blocks any further execution until all operations issued to the GPU (memory transfers and kernel executions) have completed. This is required to ensure that operations dependent on prior operations are not executed out-of-order—for example, to ensure that a CUDA kernel launch is completed before the host tries to read its output.

In CUDA C, device synchronization is performed with the cudaDeviceSynchronize function. This function effectively blocks further execution on the host until all GPU operations have completed. cudaDeviceSynchronize is so fundamental that it is usually one of the very first topics covered in most books on CUDA C—we haven't seen this yet, because PyCUDA has been invisibly calling this...

Events

Events are objects that exist on the GPU, whose purpose is to act as milestones or progress markers for a stream of operations. Events are generally used to provide measure time duration on the device side to precisely time operations; the measurements we have been doing so far have been with host-based Python profilers and standard Python library functions such as time. Additionally, events they can also be used to provide a status update for the host as to the state of a stream and what operations it has already completed, as well as for explicit stream-based synchronization.

Let's start with an example that uses no explicit streams and uses events to measure only one single kernel launch. (If we don't explicitly use streams in our code, CUDA actually invisibly defines a default stream that all operations will be placed into).

Here, we will use the same useless...

Contexts

A CUDA context is usually described as being analogous to a process in an operating system. Let's review what this means—a process is an instance of a single program running on a computer; all programs outside of the operating system kernel run in a process. Each process has its own set of instructions, variables, and allocated memory, and is, generally speaking, blind to the actions and memory of other processes. When a process ends, the operating system kernel performs a cleanup, ensuring that all memory that the process allocated has been de-allocated, and closing any files, network connections, or other resources the process has made use of. (Curious Linux users can view the processes running on their computer with the command-line top command, while Windows users can view them with the Windows Task Manager).

Similar to a process, a context is associated...

Summary

We started this chapter by learning about device synchronization and the importance of synchronization of operations on the GPU from the host; this allows dependent operations to allow antecedent operations to finish before proceeding. This concept has been hidden from us, as PyCUDA has been handling synchronization for us automatically up to this point. We then learned about CUDA streams, which allow for independent sequences of operations to execute on the GPU simultaneously without synchronizing across the entire GPU, which can give us a big performance boost; we then learned about CUDA events, which allow us to time individual CUDA kernels within a given stream, and to determine if a particular operation in a stream has occurred. Next, we learned about contexts, which are analogous to processes in a host operating system. We learned how to synchronize across an entire...

Questions

In the launch parameters for the kernel in the first example, our kernels were each launched over 64 threads. If we increase the number of threads to and beyond the number of cores in our GPU, how does this affect the performance of both the original to the stream version?
Consider the CUDA C example that was given at the very beginning of this chapter, which illustrated the use of cudaDeviceSynchronize. Do you think it is possible to get some level of concurrency among multiple kernels without using streams and only using cudaDeviceSynchronize?
If you are a Linux user, modify the last example that was given to operate over processes rather than threads.
Consider the multi-kernel_events.py program; we said it is good that there was a low standard deviation of kernel execution durations. Why would it be bad if there were a high standard deviation?
We only used 10 host...

The rest of the chapter is locked

You have been reading a chapter from

Hands-On GPU Programming with Python and CUDA

Published in: Nov 2018Publisher: PacktISBN-13: 9781788993913

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Dr. Brian Tuomanen

Dr. Brian Tuomanen has been working with CUDA and General-Purpose GPU Programming since 2014. He received his Bachelor of Science in Electrical Engineering from the University of Washington in Seattle, and briefly worked as a Software Engineer before switching to Mathematics for Graduate School. He completed his Ph.D. in Mathematics at the University of Missouri in Columbia, where he first encountered GPU programming as a means for studying scientific problems. Dr. Tuomanen has spoken at the US Army Research Lab about General Purpose GPU programming, and has recently lead GPU integration and development at a Maryland based start-up company. He currently lives and works in the Seattle area.
Read more about Dr. Brian Tuomanen

Other recommended products

Related to this chapter

Hands-On GPU-Accelerated Computer Vision with OpenCV and CUDA

This book is a guide to explore how accelerating of computer vision applications using GPUs will help you develop algorithms that work on complex image data in real time. It will solve the problems you face while deploying these algorithms on embedded platforms with the help of development boards from NVIDIA such as the Jetson TX1, Jetson TX2, and Jetson TK1.

BookSep 2018380 pages

CUDA Cookbook

This book is for programmers who want to delve into parallel computing, become part of the high-performance computing community and apply those techniques to build modern applications. Experience with C++ programming is assumed. There are some sample examples on equivalent Fortran code. For Deep Learning enthusiasts python based sample code is also provided.

BookSep 2019508 pages

Hands-On GPU Computing with Python

GPU technologies are the paradigm shift in modern computing. This book will take you through architecting your GPU-based systems to deploying the computational models on GPUs for faster processing. You will learn to program your GPUs to build a GPU-accelerated environment for accelerating machine learning models and other data-intensive processing

BookMay 2019452 pages

Julia 1.0 High Performance

Julia is a high-level, high-performance dynamic programming language for numerical computing. This book will help you understand the performance characteristics of your Julia programs and achieve near-C levels of performance in Julia.

BookJun 2019218 pages

Python Parallel Programming Cookbook

Python Parallel Programming Cookbook, Second Edition, covers recipes that will help you how to build multithreaded, multiprocess and asynchronous applications in Python. The book will help you build applications for the GPU using CUDA and PyOPENCL and implement effective debugging and testing techniques.

BookSep 2019370 pages

Personalised recommendations for you

Based on your interests and search pattern

The Essential Guide to Creating Multiplayer Games with Godot 4.0

This Essential Guide to Creating Multiplayer Games with Godot 4.0 teaches you how to use the high-level network API with concrete use cases. You’ll learn the fundamentals of multiplayer games and advanced techniques to improve players’ experience.

BookDec 2023326 pages5

Creating an RTS Game in Unity 2023

A practical guide packed with essential concepts and techniques of game development, best coding practices for C# programming language, the most used design patterns to build high-quality projects, and techniques on implementing gameplay features to develop an RTS game using Unity

BookOct 2023548 pages

Unity Cookbook

Unlock the limitless potential of Unity 2023 game development with this new edition. Dive into over 140 expertly crafted recipes that empower you to pioneer VR and AR experiences, conquer mobile game development, and master audio techniques, all while building a strong foundation in Unity's latest tools and features. Elevate your skills, captivate your audience, and craft your gaming masterpiece with this essential resource.

BookNov 2023780 pages

Enhancing Virtual Reality Experiences with Unity 2022

Enhancing Virtual Reality Experiences with Unity 2022 guides you through the latest features of Unity. Starting with the basics of understanding virtual reality, you’ll systematically build the technical skills for building VR experiences in Unity, and finally implement everything you’ve learned in real-world projects.

BookNov 2023566 pages

Photorealistic Materials and Textures in Blender Cycles

This comprehensive, beginner-friendly, AI-assisted, step-by-step guide is carefully tailored to guide you through the journey of progressing from a beginner to an expert artist. The book helps you to master materials, procedural textures, lighting, and rendering in Blender’s powerful Cycles render engine.

BookOct 2023394 pages

XR Development with Unity

This practical guide helps you create immersive VR, AR, and MR experiences using Unity 2021.3 or later versions. You’ll learn to add physics, animations, teleportation, sound, effects, and hand-tracking to XR scenes and deploy them on VR headsets, simulators, and mobile devices—all that you need to create interactive XR projects in Unity is here.

BookNov 2023284 pages2

Unreal Engine 5 - The Intermediate Course

Elevate your game design journey with "Unreal Engine 5: The Intermediate Course." Dive into detailed studies of Materials and Textures, Landscapes and Open Worlds, Skeletal Meshes and Animations, and advanced Blueprint techniques. Transform your basic knowledge into professional-level expertise in Unreal Engine 5.

VideoDec 202318 hours 55 minutes5

Learn Unity Game Development - Build Six Games with Unity 2023

Get ready to dive into the exciting world of Unity game development and C# scripting! With a hands-on approach, you will craft a variety of thrilling 2D and 3D games using Unity and C#. Uncover the art of building and exporting games to the Android mobile platform. This course is tailor-made for someone who wants to learn Unity and C# through real-world projects.

VideoSep 202312 hours 3 minutes

Learn Intermediate C# Scripting for Unity Game Development

Prepare to immerse yourself in the thrilling realm of Unity game development and C# scripting! If you have already acquired the fundamentals of C# scripting with Unity and are eager to elevate your skills to the next tier, then you have found the ideal Intermediate C# Scripting Course. This course is custom-crafted for individuals seeking to master Unity and C# by working on practical, real-world projects.

VideoOct 20238 hours 6 minutes

Low Poly Modeling for Absolute Beginners

Immerse yourself in the mesmerizing realm of low poly modeling with Blender! Unleash your boundless creativity as you sculpt stunning 3D designs. Master Blender’s tools to create captivating low poly trees, rocks, and more. Elevate your artistry with powerful modifiers and bring your unique vision to life in the world of 3D.

VideoAug 202314 hours 37 minutes5