Basic GPU profiling tools
Before discussing NVIDIA profilers, we make use of lightweight tools that can help us to get a quick sense of GPU utilization, execution time, and resource usage. We have already seen Python's built-in timing modules and the Scalene package in Chapter 1. These tools are useful during early development or smaller-scale experiments, where diving into the hundreds of metrics provided by a full GPU profiler is an overkill.
Next, we'll discuss a few categories of basic GPU profiling tools available to Python developers and Linux users.
Time profiling using the time and timeit modules
At the most fundamental level, timing our code is the simplest form of profiling. Python's built-in time and timeit modules are widely used for this purpose. The time module provides a straightforward way to measure elapsed wall-clock time between two points in the code. This is extremely useful for quick benchmarks and comparisons between different kernel implementations...