<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1604833#M24809</link>
    <description>&lt;P&gt;[Background]&lt;/P&gt;&lt;P&gt;I want to measure the CPU-side processing time in an NVIDIA CUDA program. Specifically, I want to measure the CPU processing time for the following operations:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Data transfer from CPU to GPU using the cudaMemcpy function.&lt;/LI&gt;&lt;LI&gt;Parallel computation on the GPU.&lt;/LI&gt;&lt;LI&gt;Data transfer from GPU to CPU using the cudaMemcpy function.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;As an example of source code, I apologize for the Japanese article, but you can refer to "mult.cu" in the following link:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://euske.github.io/introdl/lec6/index.html#ex6-1" target="_blank"&gt;https://euske.github.io/introdl/lec6/index.html#ex6-1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In reality, I want to measure the CPU time for a program that performs object detection using the NVIDIA DeepStreams SDK, but to simplify the discussion, I am using a simple CUDA program as an example.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[Version Information]&lt;/P&gt;&lt;P&gt;Ubuntu 22.04.4 LTS&lt;/P&gt;&lt;P&gt;Intel VTune 2024.1 (using vtune-backend) cuda_12.4.r12.4/compiler.34097967_0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[Questions]&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Can the CPU processing time for the operations mentioned in the background (1-3) be measured using Intel VTune?&lt;/LI&gt;&lt;LI&gt;From my trials, I believe that the "Input and Output" analytics in Intel VTune might have the analytics I am looking for. Is my understanding correct?&lt;/LI&gt;&lt;LI&gt;Is there a way to repeatedly measure the above operations, for example, to obtain measurement data for about 1000 iterations, using either the GUI or CUI of Intel VTune?&lt;/LI&gt;&lt;LI&gt;I have not been able to find a way to export the profile obtained from Intel VTune to a CSV file using vtune-backend. Is it possible to export to a CSV file?&lt;/LI&gt;&lt;LI&gt;Is it possible to export an Intel VTune project and view the project contents on another server with Intel VTune?&lt;/LI&gt;&lt;LI&gt;Can Intel VTune read the symbol table? Also, does it support the "NVIDIA CUDA Toolkit Symbol Server"? I am concerned that without a source of information for the symbol table like the "NVIDIA CUDA Toolkit Symbol Server," the symbols in the CUDA program might not be readable from Intel VTune.&lt;/LI&gt;&lt;/OL&gt;</description>
    <pubDate>Fri, 07 Jun 2024 12:05:07 GMT</pubDate>
    <dc:creator>tx-y-hiraka</dc:creator>
    <dc:date>2024-06-07T12:05:07Z</dc:date>
    <item>
      <title>Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1604833#M24809</link>
      <description>&lt;P&gt;[Background]&lt;/P&gt;&lt;P&gt;I want to measure the CPU-side processing time in an NVIDIA CUDA program. Specifically, I want to measure the CPU processing time for the following operations:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Data transfer from CPU to GPU using the cudaMemcpy function.&lt;/LI&gt;&lt;LI&gt;Parallel computation on the GPU.&lt;/LI&gt;&lt;LI&gt;Data transfer from GPU to CPU using the cudaMemcpy function.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;As an example of source code, I apologize for the Japanese article, but you can refer to "mult.cu" in the following link:&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://euske.github.io/introdl/lec6/index.html#ex6-1" target="_blank"&gt;https://euske.github.io/introdl/lec6/index.html#ex6-1&lt;/A&gt;&lt;/P&gt;&lt;P&gt;In reality, I want to measure the CPU time for a program that performs object detection using the NVIDIA DeepStreams SDK, but to simplify the discussion, I am using a simple CUDA program as an example.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[Version Information]&lt;/P&gt;&lt;P&gt;Ubuntu 22.04.4 LTS&lt;/P&gt;&lt;P&gt;Intel VTune 2024.1 (using vtune-backend) cuda_12.4.r12.4/compiler.34097967_0&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;[Questions]&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Can the CPU processing time for the operations mentioned in the background (1-3) be measured using Intel VTune?&lt;/LI&gt;&lt;LI&gt;From my trials, I believe that the "Input and Output" analytics in Intel VTune might have the analytics I am looking for. Is my understanding correct?&lt;/LI&gt;&lt;LI&gt;Is there a way to repeatedly measure the above operations, for example, to obtain measurement data for about 1000 iterations, using either the GUI or CUI of Intel VTune?&lt;/LI&gt;&lt;LI&gt;I have not been able to find a way to export the profile obtained from Intel VTune to a CSV file using vtune-backend. Is it possible to export to a CSV file?&lt;/LI&gt;&lt;LI&gt;Is it possible to export an Intel VTune project and view the project contents on another server with Intel VTune?&lt;/LI&gt;&lt;LI&gt;Can Intel VTune read the symbol table? Also, does it support the "NVIDIA CUDA Toolkit Symbol Server"? I am concerned that without a source of information for the symbol table like the "NVIDIA CUDA Toolkit Symbol Server," the symbols in the CUDA program might not be readable from Intel VTune.&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Fri, 07 Jun 2024 12:05:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1604833#M24809</guid>
      <dc:creator>tx-y-hiraka</dc:creator>
      <dc:date>2024-06-07T12:05:07Z</dc:date>
    </item>
    <item>
      <title>Re: Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1605361#M24813</link>
      <description>&lt;P&gt;Please provide the CPU and GPU type, thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 10 Jun 2024 14:22:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1605361#M24813</guid>
      <dc:creator>yuzhang3_intel</dc:creator>
      <dc:date>2024-06-10T14:22:36Z</dc:date>
    </item>
    <item>
      <title>Re: Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606125#M24835</link>
      <description>&lt;P&gt;CPU: Intel(R) Core(TM) i7-12700F&lt;BR /&gt;GPU:&amp;nbsp;NVIDIA GeForce RTX 3060&lt;/P&gt;</description>
      <pubDate>Wed, 12 Jun 2024 12:29:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606125#M24835</guid>
      <dc:creator>tx-y-hiraka</dc:creator>
      <dc:date>2024-06-12T12:29:17Z</dc:date>
    </item>
    <item>
      <title>Re: Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606301#M24836</link>
      <description>&lt;P&gt;Since your host is an Intel CPU and the device is an Nvidia GPU card, I think the nsys tool can help you understand host-device data transfer, and the ncu tool is useful for understanding computing. The two tools are located in the path below:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;xxxxxx@clx02:/usr/local/cuda-12.1/bin$ ll&lt;BR /&gt;total 135132&lt;BR /&gt;drwxr-xr-x 3 root root 4096 11月 10 2023 ./&lt;BR /&gt;drwxr-xr-x 16 root root 4096 11月 10 2023 ../&lt;BR /&gt;-rwxr-xr-x 1 root root 84752 11月 10 2023 bin2c*&lt;BR /&gt;lrwxrwxrwx 1 root root 4 11月 10 2023 computeprof -&amp;gt; nvvp*&lt;BR /&gt;-r-xr-xr-x 1 root root 112 11月 10 2023 compute-sanitizer*&lt;BR /&gt;drwxr-xr-x 2 root root 4096 11月 10 2023 crt/&lt;BR /&gt;-rwxr-xr-x 1 root root 6745528 11月 10 2023 cudafe++*&lt;BR /&gt;-rwxr-xr-x 1 root root 15657712 11月 10 2023 cuda-gdb*&lt;BR /&gt;-rwxr-xr-x 1 root root 807704 11月 10 2023 cuda-gdbserver*&lt;BR /&gt;-rwxr-xr-x 1 root root 75928 11月 10 2023 cu++filt*&lt;BR /&gt;-rwxr-xr-x 1 root root 519600 11月 10 2023 cuobjdump*&lt;BR /&gt;-rwxr-xr-x 1 root root 285728 11月 10 2023 fatbinary*&lt;BR /&gt;-r-xr-xr-x 1 root root 3825 11月 10 2023 &lt;STRONG&gt;ncu&lt;/STRONG&gt;*&lt;BR /&gt;-r-xr-xr-x 1 root root 3615 11月 10 2023 ncu-ui*&lt;BR /&gt;-rwxr-xr-x 1 root root 1580 11月 10 2023 nsight_ee_plugins_manage.sh*&lt;BR /&gt;-r-xr-xr-x 1 root root 82 11月 10 2023 nsight-sys*&lt;BR /&gt;-r-xr-xr-x 1 root root 751 11月 10 2023 &lt;STRONG&gt;nsys&lt;/STRONG&gt;*&lt;BR /&gt;-r-xr-xr-x 1 root root 104 11月 10 2023 nsys-exporter*&lt;BR /&gt;-r-xr-xr-x 1 root root 847 11月 10 2023 nsys-ui*&lt;BR /&gt;-rwxr-xr-x 1 root root 16050568 11月 10 2023 nvcc*&lt;BR /&gt;-rwxr-xr-x 1 root root 10456 11月 10 2023 __nvcc_device_query*&lt;BR /&gt;-rw-r--r-- 1 root root 417 11月 10 2023 nvcc.profile&lt;BR /&gt;-rwxr-xr-x 1 root root 50641688 11月 10 2023 nvdisasm*&lt;BR /&gt;-rwxr-xr-x 1 root root 20808632 11月 10 2023 nvlink*&lt;BR /&gt;-rwxr-xr-x 1 root root 6010176 11月 10 2023 nvprof*&lt;BR /&gt;-rwxr-xr-x 1 root root 109552 11月 10 2023 nvprune*&lt;BR /&gt;-rwxr-xr-x 1 root root 285 11月 10 2023 nvvp*&lt;BR /&gt;-rwxr-xr-x 1 root root 20491472 11月 10 2023 ptxas*&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 01:37:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606301#M24836</guid>
      <dc:creator>yuzhang3_intel</dc:creator>
      <dc:date>2024-06-13T01:37:10Z</dc:date>
    </item>
    <item>
      <title>Re: Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606305#M24838</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Thank you for your response. I will try using NVIDIA Nsight Systems as you suggested. In this use case, you mean that NVIDIA Nsight Systems is better than Intel VTune, right?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 01:52:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606305#M24838</guid>
      <dc:creator>tx-y-hiraka</dc:creator>
      <dc:date>2024-06-13T01:52:18Z</dc:date>
    </item>
    <item>
      <title>Re: Measuring CPU-Side Processing Time in NVIDIA CUDA Programs Using Intel VTune</title>
      <link>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606309#M24839</link>
      <description>&lt;P&gt;Intel VTune only supports Intel GPU devices profiling.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 13 Jun 2024 02:04:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Measuring-CPU-Side-Processing-Time-in-NVIDIA-CUDA-Programs-Using/m-p/1606309#M24839</guid>
      <dc:creator>yuzhang3_intel</dc:creator>
      <dc:date>2024-06-13T02:04:07Z</dc:date>
    </item>
  </channel>
</rss>

