<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Device Build Issue in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264355#M1014</link>
    <description>&lt;P&gt;My code builds and runs fine on the CPU. However, although it builds fine for the GPU, after it starts running it is killed as it uses too much memory.&amp;nbsp; This seems to happen on the first queue.submit call and stops if I comment out some of the later kernels. My assumption is that the first queue.submit call causes all kernels to be compiled for the device and one of the kernels I commented out causes this situation. Is my understanding of this correct and if so how can I debug it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 15 Mar 2021 08:50:37 GMT</pubDate>
    <dc:creator>Laurence</dc:creator>
    <dc:date>2021-03-15T08:50:37Z</dc:date>
    <item>
      <title>Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264355#M1014</link>
      <description>&lt;P&gt;My code builds and runs fine on the CPU. However, although it builds fine for the GPU, after it starts running it is killed as it uses too much memory.&amp;nbsp; This seems to happen on the first queue.submit call and stops if I comment out some of the later kernels. My assumption is that the first queue.submit call causes all kernels to be compiled for the device and one of the kernels I commented out causes this situation. Is my understanding of this correct and if so how can I debug it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Mar 2021 08:50:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264355#M1014</guid>
      <dc:creator>Laurence</dc:creator>
      <dc:date>2021-03-15T08:50:37Z</dc:date>
    </item>
    <item>
      <title>Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264776#M1022</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;DPC++ runtime follows asynchronous execution. If the kernels don't have any data dependency between them, they can execute simultaneously. One way to synchronize these kernels is to put a &lt;B&gt;queue.wait()&lt;/B&gt; call after every kernel. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;In your case, probably the memory is getting used up by multiple kernels (concurrent kernel execution) due to limited global memory availability on GPU. Whereas, on CPU it may be working fine due to higher global memory availability.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;You may read more about DPC++ concepts in the link below:&lt;/P&gt;&lt;P&gt;&lt;A href="https://www.apress.com/us/book/9781484255735" target="_blank"&gt;https://www.apress.com/us/book/9781484255735&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 16 Mar 2021 12:17:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264776#M1022</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-03-16T12:17:57Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264793#M1028</link>
      <description>&lt;P&gt;Thanks for the reply. All of my kernel calls have queue.wait() afterwards so they should not be executing simultaneously. Am I correct in assuming that the all the kernels are built for the device on the first queue.submit call?&lt;/P&gt;
&lt;P&gt;P.S. I have read the book you suggested from cover to cover.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Mar 2021 13:13:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1264793#M1028</guid>
      <dc:creator>Laurence</dc:creator>
      <dc:date>2021-03-16T13:13:56Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1265176#M1030</link>
      <description>&lt;P&gt;I confirmed it is a build issue by trying to build for a Nvidia V100. The process is killed when running the following command. &lt;/P&gt;
&lt;LI-CODE lang="bash"&gt;"/usr/local/cuda-10.1//bin/ptxas" -m64 -O0 -v --gpu-name sm_50 --output-file /tmp/extras-85f00b.o /tmp/extras-5d7db0.s&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;A custom complex number class is contained within extras. The issue is triggered by the setpolar function.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;void complex::setpolar(double mag, double phi)
{
 phi = phi * M_PI / 180;
 r = sycl::cos(phi) * mag;
 i = sycl::sin(phi) * mag;                                                                                                                                                                                                                                                      
}

&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;I could not reproduce this with a simple test case but it occurs when I compile the full code base.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Mar 2021 14:00:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1265176#M1030</guid>
      <dc:creator>Laurence</dc:creator>
      <dc:date>2021-03-17T14:00:49Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1266519#M1034</link>
      <description>&lt;P&gt;I manged to work around the issue by using a different implementation of the complex number class. The following commands can be used to reproduce the original error.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;git clone https://github.com/lfield/madgraph4gpu.git
cd madgraph4gpu
git checkout e27b31be
cd epoch2/sycl/gg_ttgg
source /opt/intel/oneapi/setvars.sh 
cmake -B build
cmake --build build
cd SubProcesses/P1_Sigma_sm_gg_ttxgg
./check_sa.exe  1 4 1
&lt;/LI-CODE&gt;</description>
      <pubDate>Mon, 22 Mar 2021 09:56:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1266519#M1034</guid>
      <dc:creator>Laurence</dc:creator>
      <dc:date>2021-03-22T09:56:02Z</dc:date>
    </item>
    <item>
      <title>Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1268019#M1041</link>
      <description>&lt;P&gt;Hi Laurence,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Apologies for the late response.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;gt;&amp;gt;&lt;I&gt; Am I correct in assuming that the all the kernels are built for the device on the first queue.submit call&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Your assumption is right. The DPC++ compiler produces an intermediate representation known as SPIRV for JIT(Just in Time) compilation. When the first kernel is hit, the entire SPIRV module it belongs to is compiled. In the case of multiple kernels (like in your case), when the first kernel gets hit, the entire SPIRV module containing multiple kernels gets compiled.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I followed your build commands and tried running the application on Gen9 iGPU. The original issue is reproducible. Are you able to solve this issue with the help of the workaround that you have mentioned? If yes, could you please share your workaround for the benefit of other community users?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Since you are running your application on Nvidia V-100, could you let me know whether you are using the Github version of DPC++ that supports the CUDA backend?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 26 Mar 2021 05:19:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1268019#M1041</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-03-26T05:19:44Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1268147#M1042</link>
      <description>&lt;P&gt;The issue was resolved by using a &lt;A href="https://github.com/lfield/madgraph4gpu/blob/ep2_gg_ttgg/epoch2/sycl/gg_ttgg/src/extras.h" target="_self"&gt;different implementation&lt;/A&gt; of the complex number class which does not use trig functions. Investigations showed that commenting out the &lt;A href="https://github.com/lfield/madgraph4gpu/blob/e27b31be43dd926f067cd5a53fa35d41342442cf/epoch2/sycl/gg_ttgg/src/extras.cc#L44" target="_self"&gt;trig functions&lt;/A&gt;, would allow the code to build. The setpolar function worked in a simple test case. The cause is still unknown.&lt;/P&gt;
&lt;P&gt;I used both dpcpp on my Intel NUC and the Github version of DPC++ on a different machine to support the CUDA backend. I only built for the V100 as it was already setup for AOT compilation. The fact that it fails in both of these situations suggests that it is a code issue rather than a compiler issue.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 26 Mar 2021 12:01:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1268147#M1042</guid>
      <dc:creator>Laurence</dc:creator>
      <dc:date>2021-03-26T12:01:47Z</dc:date>
    </item>
    <item>
      <title>Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1269560#M1053</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;It looks like a code issue like you mentioned. Since your issue is resolved, shall I go ahead and close this thread from my end?&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 31 Mar 2021 09:47:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1269560#M1053</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-03-31T09:47:24Z</dc:date>
    </item>
    <item>
      <title>Re:Device Build Issue</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1271091#M1068</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;I have not heard back from you, so I will go ahead and close this thread from my end. Intel will no longer monitor this thread. Feel free to&amp;nbsp;post a new query if you require further assistance from Intel.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 06 Apr 2021 04:35:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Device-Build-Issue/m-p/1271091#M1068</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-04-06T04:35:44Z</dc:date>
    </item>
  </channel>
</rss>

