CUDA Profiling for HPC¶
When using HPC systems it can be difficult to attach profiling GUIs for interactive profiling. Instead it is best to profile remotely capturing as much data as possible and viewing profile information locally.
CUDA profiling tools offer timeline-style tracing of GPU activity, plus detailed kernel level information.
Newer CUDA versions and GPU architectures can use Nsight Systems and Nsight Compute. All current CUDA versions and GPU architectures support Nvprof and the Visual Profiler.
Compiler settings for profiling¶
Compile with -lineinfo
(or --generate-line-info
) to include source-level profile information.
Do not profile debug builds, it will not be representative.
Nsight Systems and Nsight Compute¶
Note
Requires CUDA >= 10.0 and Compute Capability > 6.0
Trace application timeline using remote Nsight Systems CLI (nsys
):
nsys profile -o timeline ./myapplication
Import into local Nsight Systems GUI (nsight-sys
):
File > Open
- Select
timeline.dqrep
Capture Kernel level info using remote Nsight CUDA CLI (nv-nsight-cu-cli
):
nv-nsight-cu-cli -o profile ./bin/Release/atomicIncTest
Import into local Nsight CUDA GUI nv-nsight-cu
via:
nv-nsight-cu profile.nsight-cuprof-report
Or Drag profile.nsight-cuprof-report
into the nv-nsight-cu
window.
Documentation¶
Visual Profiler (nvprof and nvvp)¶
Note
Suitable for all current CUDA versions and GPU architectures
Trace application timeline remotely using nvprof
:
nvprof -o timeline.nvprof ./myapplication
Capture kernel-level metrics remotely using nvprof
(this will take a while):
nvprof --analysis-metrics -o analysis.nvprof ./myapplication
Import into the Visual Profiler GUI (nvvp
)
File > Import
- Select
Nvprof
- Select
Single process
- Select
timeline.nvvp
forTimeline data file
- Add
analysis.nvprof
toEvent/Metric data files