Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5047 Discussions

Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune

tx-y-hiraka
Novice
520 Views

[Background]

I want to measure the CPU-side processing time in an NVIDIA CUDA program. Specifically, I want to measure the CPU processing time for the following operations:

  1. Data transfer from CPU to GPU using the cudaMemcpy function.
  2. Parallel computation on the GPU.
  3. Data transfer from GPU to CPU using the cudaMemcpy function.

As an example of source code, I apologize for the Japanese article, but you can refer to "mult.cu" in the following link: https://euske.github.io/introdl/lec6/index.html#ex6-1

In reality, I want to measure the CPU time for a program that performs object detection using the NVIDIA DeepStreams SDK, but to simplify the discussion, I am using a simple CUDA program as an example.

 

[Version Information]

Ubuntu 22.04.4 LTS

Intel VTune 2024.1 (using vtune-backend) cuda_12.4.r12.4/compiler.34097967_0

 

[Questions]

  1. Can the CPU processing time for the operations mentioned in the background (1-3) be measured using Intel VTune?
  2. From my trials, I believe that the "Input and Output" analytics in Intel VTune might have the analytics I am looking for. Is my understanding correct?
  3. Is there a way to repeatedly measure the above operations, for example, to obtain measurement data for about 1000 iterations, using either the GUI or CUI of Intel VTune?
  4. I have not been able to find a way to export the profile obtained from Intel VTune to a CSV file using vtune-backend. Is it possible to export to a CSV file?
  5. Is it possible to export an Intel VTune project and view the project contents on another server with Intel VTune?
  6. Can Intel VTune read the symbol table? Also, does it support the "NVIDIA CUDA Toolkit Symbol Server"? I am concerned that without a source of information for the symbol table like the "NVIDIA CUDA Toolkit Symbol Server," the symbols in the CUDA program might not be readable from Intel VTune.
1 Solution
yuzhang3_intel
Moderator
331 Views

Intel VTune only supports Intel GPU devices profiling. 

View solution in original post

5 Replies
yuzhang3_intel
Moderator
408 Views

Please provide the CPU and GPU type, thanks.

tx-y-hiraka
Novice
379 Views

CPU: Intel(R) Core(TM) i7-12700F
GPU: NVIDIA GeForce RTX 3060

0 Kudos
yuzhang3_intel
Moderator
344 Views

Since your host is an Intel CPU and the device is an Nvidia GPU card, I think the nsys tool can help you understand host-device data transfer, and the ncu tool is useful for understanding computing. The two tools are located in the path below:

 

xxxxxx@clx02:/usr/local/cuda-12.1/bin$ ll
total 135132
drwxr-xr-x 3 root root 4096 11月 10 2023 ./
drwxr-xr-x 16 root root 4096 11月 10 2023 ../
-rwxr-xr-x 1 root root 84752 11月 10 2023 bin2c*
lrwxrwxrwx 1 root root 4 11月 10 2023 computeprof -> nvvp*
-r-xr-xr-x 1 root root 112 11月 10 2023 compute-sanitizer*
drwxr-xr-x 2 root root 4096 11月 10 2023 crt/
-rwxr-xr-x 1 root root 6745528 11月 10 2023 cudafe++*
-rwxr-xr-x 1 root root 15657712 11月 10 2023 cuda-gdb*
-rwxr-xr-x 1 root root 807704 11月 10 2023 cuda-gdbserver*
-rwxr-xr-x 1 root root 75928 11月 10 2023 cu++filt*
-rwxr-xr-x 1 root root 519600 11月 10 2023 cuobjdump*
-rwxr-xr-x 1 root root 285728 11月 10 2023 fatbinary*
-r-xr-xr-x 1 root root 3825 11月 10 2023 ncu*
-r-xr-xr-x 1 root root 3615 11月 10 2023 ncu-ui*
-rwxr-xr-x 1 root root 1580 11月 10 2023 nsight_ee_plugins_manage.sh*
-r-xr-xr-x 1 root root 82 11月 10 2023 nsight-sys*
-r-xr-xr-x 1 root root 751 11月 10 2023 nsys*
-r-xr-xr-x 1 root root 104 11月 10 2023 nsys-exporter*
-r-xr-xr-x 1 root root 847 11月 10 2023 nsys-ui*
-rwxr-xr-x 1 root root 16050568 11月 10 2023 nvcc*
-rwxr-xr-x 1 root root 10456 11月 10 2023 __nvcc_device_query*
-rw-r--r-- 1 root root 417 11月 10 2023 nvcc.profile
-rwxr-xr-x 1 root root 50641688 11月 10 2023 nvdisasm*
-rwxr-xr-x 1 root root 20808632 11月 10 2023 nvlink*
-rwxr-xr-x 1 root root 6010176 11月 10 2023 nvprof*
-rwxr-xr-x 1 root root 109552 11月 10 2023 nvprune*
-rwxr-xr-x 1 root root 285 11月 10 2023 nvvp*
-rwxr-xr-x 1 root root 20491472 11月 10 2023 ptxas*

0 Kudos
tx-y-hiraka
Novice
340 Views

Thank you for your response. I will try using NVIDIA Nsight Systems as you suggested. In this use case, you mean that NVIDIA Nsight Systems is better than Intel VTune, right?

0 Kudos
yuzhang3_intel
Moderator
332 Views

Intel VTune only supports Intel GPU devices profiling. 

Reply