<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Line-by-line time profiling with an OpenCL kernel in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Line-by-line-time-profiling-with-an-OpenCL-kernel/m-p/1119228#M5458</link>
    <description>&lt;P&gt;hi, I am working on a project to optimize an OpenCL code. This kernel is computationally dense, and I'd like to see where is the bottleneck.&lt;/P&gt;

&lt;P&gt;I haven't installed Intel's CL libraries yet, but I am wondering if it is possible to do a line-by-line profiling with my OpenCL kerbel when running on the CPU? we have profiled the code with CodeXL on an AMD GPU, but the profiler only reports abstract metrics, which are not exactly helpful in pinpointing the hotspots.&lt;/P&gt;

&lt;P&gt;If I run the CL code with Intel's CL backend, can I use cachegrind/kcachegrind to obtain such info? or there is another tool I should use?&lt;/P&gt;

&lt;P&gt;thanks&lt;/P&gt;</description>
    <pubDate>Fri, 05 Feb 2016 18:38:50 GMT</pubDate>
    <dc:creator>QFang1</dc:creator>
    <dc:date>2016-02-05T18:38:50Z</dc:date>
    <item>
      <title>Line-by-line time profiling with an OpenCL kernel</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Line-by-line-time-profiling-with-an-OpenCL-kernel/m-p/1119228#M5458</link>
      <description>&lt;P&gt;hi, I am working on a project to optimize an OpenCL code. This kernel is computationally dense, and I'd like to see where is the bottleneck.&lt;/P&gt;

&lt;P&gt;I haven't installed Intel's CL libraries yet, but I am wondering if it is possible to do a line-by-line profiling with my OpenCL kerbel when running on the CPU? we have profiled the code with CodeXL on an AMD GPU, but the profiler only reports abstract metrics, which are not exactly helpful in pinpointing the hotspots.&lt;/P&gt;

&lt;P&gt;If I run the CL code with Intel's CL backend, can I use cachegrind/kcachegrind to obtain such info? or there is another tool I should use?&lt;/P&gt;

&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Fri, 05 Feb 2016 18:38:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Line-by-line-time-profiling-with-an-OpenCL-kernel/m-p/1119228#M5458</guid>
      <dc:creator>QFang1</dc:creator>
      <dc:date>2016-02-05T18:38:50Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Line-by-line-time-profiling-with-an-OpenCL-kernel/m-p/1119229#M5459</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I've never tried the tools you mentioned, but could be worth a try. Also, please check Intel(R) Vtune Amplifier &lt;A href="https://software.intel.com/en-us/intel-vtune-amplifier-xe"&gt;https://software.intel.com/en-us/intel-vtune-amplifier-xe&lt;/A&gt; - you can get an evaluation version to try it out. The following link could be worth checking as well: &lt;A href="http://stackoverflow.com/questions/5132628/profiling-opencl-kernels"&gt;http://stackoverflow.com/questions/5132628/profiling-opencl-kernels&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;We have OpenCL code profiling on the GPU in VTune, but not yet on the CPU.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Feb 2016 16:45:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Line-by-line-time-profiling-with-an-OpenCL-kernel/m-p/1119229#M5459</guid>
      <dc:creator>Robert_I_Intel</dc:creator>
      <dc:date>2016-02-08T16:45:16Z</dc:date>
    </item>
  </channel>
</rss>

