- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[Background]
I want to measure the CPU-side processing time in an NVIDIA CUDA program. Specifically, I want to measure the CPU processing time for the following operations:
- Data transfer from CPU to GPU using the cudaMemcpy function.
- Parallel computation on the GPU.
- Data transfer from GPU to CPU using the cudaMemcpy function.
As an example of source code, I apologize for the Japanese article, but you can refer to "mult.cu" in the following link: https://euske.github.io/introdl/lec6/index.html#ex6-1
In reality, I want to measure the CPU time for a program that performs object detection using the NVIDIA DeepStreams SDK, but to simplify the discussion, I am using a simple CUDA program as an example.
[Version Information]
Ubuntu 22.04.4 LTS
Intel VTune 2024.1 (using vtune-backend) cuda_12.4.r12.4/compiler.34097967_0
[Questions]
- Can the CPU processing time for the operations mentioned in the background (1-3) be measured using Intel VTune?
- From my trials, I believe that the "Input and Output" analytics in Intel VTune might have the analytics I am looking for. Is my understanding correct?
- Is there a way to repeatedly measure the above operations, for example, to obtain measurement data for about 1000 iterations, using either the GUI or CUI of Intel VTune?
- I have not been able to find a way to export the profile obtained from Intel VTune to a CSV file using vtune-backend. Is it possible to export to a CSV file?
- Is it possible to export an Intel VTune project and view the project contents on another server with Intel VTune?
- Can Intel VTune read the symbol table? Also, does it support the "NVIDIA CUDA Toolkit Symbol Server"? I am concerned that without a source of information for the symbol table like the "NVIDIA CUDA Toolkit Symbol Server," the symbols in the CUDA program might not be readable from Intel VTune.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please provide the CPU and GPU type, thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
CPU: Intel(R) Core(TM) i7-12700F
GPU: NVIDIA GeForce RTX 3060
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since your host is an Intel CPU and the device is an Nvidia GPU card, I think the nsys tool can help you understand host-device data transfer, and the ncu tool is useful for understanding computing. The two tools are located in the path below:
xxxxxx@clx02:/usr/local/cuda-12.1/bin$ ll
total 135132
drwxr-xr-x 3 root root 4096 11月 10 2023 ./
drwxr-xr-x 16 root root 4096 11月 10 2023 ../
-rwxr-xr-x 1 root root 84752 11月 10 2023 bin2c*
lrwxrwxrwx 1 root root 4 11月 10 2023 computeprof -> nvvp*
-r-xr-xr-x 1 root root 112 11月 10 2023 compute-sanitizer*
drwxr-xr-x 2 root root 4096 11月 10 2023 crt/
-rwxr-xr-x 1 root root 6745528 11月 10 2023 cudafe++*
-rwxr-xr-x 1 root root 15657712 11月 10 2023 cuda-gdb*
-rwxr-xr-x 1 root root 807704 11月 10 2023 cuda-gdbserver*
-rwxr-xr-x 1 root root 75928 11月 10 2023 cu++filt*
-rwxr-xr-x 1 root root 519600 11月 10 2023 cuobjdump*
-rwxr-xr-x 1 root root 285728 11月 10 2023 fatbinary*
-r-xr-xr-x 1 root root 3825 11月 10 2023 ncu*
-r-xr-xr-x 1 root root 3615 11月 10 2023 ncu-ui*
-rwxr-xr-x 1 root root 1580 11月 10 2023 nsight_ee_plugins_manage.sh*
-r-xr-xr-x 1 root root 82 11月 10 2023 nsight-sys*
-r-xr-xr-x 1 root root 751 11月 10 2023 nsys*
-r-xr-xr-x 1 root root 104 11月 10 2023 nsys-exporter*
-r-xr-xr-x 1 root root 847 11月 10 2023 nsys-ui*
-rwxr-xr-x 1 root root 16050568 11月 10 2023 nvcc*
-rwxr-xr-x 1 root root 10456 11月 10 2023 __nvcc_device_query*
-rw-r--r-- 1 root root 417 11月 10 2023 nvcc.profile
-rwxr-xr-x 1 root root 50641688 11月 10 2023 nvdisasm*
-rwxr-xr-x 1 root root 20808632 11月 10 2023 nvlink*
-rwxr-xr-x 1 root root 6010176 11月 10 2023 nvprof*
-rwxr-xr-x 1 root root 109552 11月 10 2023 nvprune*
-rwxr-xr-x 1 root root 285 11月 10 2023 nvvp*
-rwxr-xr-x 1 root root 20491472 11月 10 2023 ptxas*
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your response. I will try using NVIDIA Nsight Systems as you suggested. In this use case, you mean that NVIDIA Nsight Systems is better than Intel VTune, right?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel VTune only supports Intel GPU devices profiling.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page