Sunday, February 10, 2013

Profiling a CUDA application 

Tools required : NVIDIA's Visual profiler ( nvprof / computeprof )

Firstly, identify an algorithms 'heavy' areas i.e. the most time consuming routines or kernels. 

Then prepare the code for profiling,

1. Include these headers
cuda_profiler_api.h (or cudaProfiler.h for the driver API)

2. Add functions to start and stop profile data collection.

cudaProfilerStart() is used to start profiling 
cudaProfilerStop() is used to stop profiling

(using the CUDA driver API, you get the same functionality with cuProfilerStart() and cuProfilerStop()).

3. When using the start and stop functions, you also need to instruct the profiling tool to disable profiling at the start of the application. For nvprof you do this with the --profile-from-start-off flag. For the Visual Profiler you use the "Start execution with profiling enabled" checkbox in the Settings View.

4. Flush Profile Data
To reduce profiling overhead, the profiling tools collect and record profile information into internal buffers. These buffers are then flushed asynchronously to disk with low priority to avoid perturbing application behavior. To avoid losing profile information that has not yet been flushed, the application being profiled should call cudaDeviceReset() before exiting. Doing so forces all buffered profile information to be flushed.

5. Select the metrics required to be displayed and analyse the application behaviour.



Subscribe to RSS Feed Follow me on Twitter!