CUDA Profiling for HPC

When using HPC systems it can be difficult to attach profiling GUIs for interactive profiling. Instead it is best to profile remotely capturing as much data as possible and viewing profile information locally.

CUDA profiling tools offer timeline-style tracing of GPU activity, plus detailed kernel level information.

Newer CUDA versions and GPU architectures can use Nsight Systems and Nsight Compute. All current CUDA versions and GPU architectures support Nvprof and the Visual Profiler.

Compiler settings for profiling

Compile with -lineinfo (or --generate-line-info) to include source-level profile information. Do not profile debug builds, it will not be representative.

Nsight Systems and Nsight Compute

Note

Requires CUDA >= 10.0 and Compute Capability > 6.0

Trace application timeline using remote Nsight Systems CLI (nsys):

nsys profile -o timeline ./myapplication

Import into local Nsight Systems GUI (nsight-sys):

  • File > Open
  • Select timeline.dqrep

Capture Kernel level info using remote Nsight CUDA CLI (nv-nsight-cu-cli):

nv-nsight-cu-cli -o profile ./bin/Release/atomicIncTest

Import into local Nsight CUDA GUI nv-nsight-cu via:

nv-nsight-cu profile.nsight-cuprof-report

Or Drag profile.nsight-cuprof-report into the nv-nsight-cu window.

Visual Profiler (nvprof and nvvp)

Note

Suitable for all current CUDA versions and GPU architectures

Trace application timeline remotely using nvprof:

nvprof -o timeline.nvprof ./myapplication

Capture kernel-level metrics remotely using nvprof (this will take a while):

nvprof --analysis-metrics -o analysis.nvprof ./myapplication

Import into the Visual Profiler GUI (nvvp)

  • File > Import
  • Select Nvprof
  • Select Single process
  • Select timeline.nvvp for Timeline data file
  • Add analysis.nvprof to Event/Metric data files

Documentation