<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Advice on how to handle structs with huge arrays in OpenCL* for CPU</title>
    <link>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079054#M4616</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Dear all,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;currently I am working on a Monte Carlo code for particle transport simulations using OpenCL and I am facing a problem with the size of some arguments given to the OpenCL kernel. For example, to store some data from the simulated geometry I use the following struct.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;typedef struct ALIGNED(ALIGNMENT) region_data_t {
    // reg = 0, front of geometry, reg = MXREG+1, back of geometry
    cl_float rhof[MXREG + 2];
    cl_float pcut[MXREG + 2];
    cl_float ecut[MXREG + 2];
    cl_int med[MXREG + 2];
    cl_int flags[MXREG + 2];
   
} region_data_t;&lt;/PRE&gt;

&lt;P&gt;I use the same definition on the host and on the device (I have some definitions to change the cl_* types to the "normal" ones). The fact is that MXREG, the maximum number of regions allowed in the problem could be quite large, and therefore I generally reach the stack limit of my OS. I can handle that giving the "ulimit -s hard" command, but it is clear that it is not the ideal case.&lt;/P&gt;

&lt;P&gt;So the question is, how would you handle this kind of struct?. I could just dynamically allocate all the arrays and pass them separately to the kernel, but it would be nice to maintain the structs use inside my code. I have a couple more of such structs and the number of arguments could rapidly increase. Thanks for your help!.&lt;/P&gt;</description>
    <pubDate>Wed, 25 Jan 2017 13:36:09 GMT</pubDate>
    <dc:creator>Edgardo_Doerner</dc:creator>
    <dc:date>2017-01-25T13:36:09Z</dc:date>
    <item>
      <title>Advice on how to handle structs with huge arrays</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079054#M4616</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Dear all,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;currently I am working on a Monte Carlo code for particle transport simulations using OpenCL and I am facing a problem with the size of some arguments given to the OpenCL kernel. For example, to store some data from the simulated geometry I use the following struct.&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;typedef struct ALIGNED(ALIGNMENT) region_data_t {
    // reg = 0, front of geometry, reg = MXREG+1, back of geometry
    cl_float rhof[MXREG + 2];
    cl_float pcut[MXREG + 2];
    cl_float ecut[MXREG + 2];
    cl_int med[MXREG + 2];
    cl_int flags[MXREG + 2];
   
} region_data_t;&lt;/PRE&gt;

&lt;P&gt;I use the same definition on the host and on the device (I have some definitions to change the cl_* types to the "normal" ones). The fact is that MXREG, the maximum number of regions allowed in the problem could be quite large, and therefore I generally reach the stack limit of my OS. I can handle that giving the "ulimit -s hard" command, but it is clear that it is not the ideal case.&lt;/P&gt;

&lt;P&gt;So the question is, how would you handle this kind of struct?. I could just dynamically allocate all the arrays and pass them separately to the kernel, but it would be nice to maintain the structs use inside my code. I have a couple more of such structs and the number of arguments could rapidly increase. Thanks for your help!.&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jan 2017 13:36:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079054#M4616</guid>
      <dc:creator>Edgardo_Doerner</dc:creator>
      <dc:date>2017-01-25T13:36:09Z</dc:date>
    </item>
    <item>
      <title>I'm still checking on the</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079055#M4617</link>
      <description>&lt;P&gt;I'm still checking on the implications of using structs as parameters. &amp;nbsp;So far I have had the best experience sticking to standard types for kernel parameters, but this may just be me being conservative. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;Of course the types used for your host code, kernel parameters, and kernel code do not need to match. &amp;nbsp;As long as the data is contiguous each work item can calculate offsets and convert types to get the results you want. &amp;nbsp;For example, it is common for host code to have float data and the kernel parameters could be float4.&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;You pass addresses to the work items through the kernel parameter list. &amp;nbsp;What each work item does to calculate the offset to those pointers for what goes in and out is up to your implementation.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;The main concerns I know of are to make sure that the host side addresses for each member buffer are aligned and that you meet the other criteria for zero copy. &amp;nbsp;Dynamic aligned allocation for your member buffers could help you make sure that data I/O is efficient.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 06:38:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079055#M4617</guid>
      <dc:creator>Jeffrey_M_Intel1</dc:creator>
      <dc:date>2017-01-30T06:38:35Z</dc:date>
    </item>
    <item>
      <title>Thanks for the advice, I</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079056#M4618</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;Thanks for the advice, I think that at least for the huge structures I will stick to standard types.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;About the zero copy property, I have tested some of the Intel OpenCL examples (as the MultiDeviceBasic) and I have a question, how one can be sure that the zero-copy behavior is enabled?. And how this technique affects the execution time if I use other devices, such as AMD or Nvidia GPUs?&lt;/P&gt;

&lt;P&gt;Thanks for your help!.&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 17:30:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079056#M4618</guid>
      <dc:creator>Edgardo_Doerner</dc:creator>
      <dc:date>2017-01-30T17:30:40Z</dc:date>
    </item>
    <item>
      <title>One way to check if your</title>
      <link>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079057#M4619</link>
      <description>&lt;P&gt;One way to check if your OpenCL buffer has zero copy property is to use driver diagnostics extension.&lt;/P&gt;

&lt;P&gt;Here is sample code&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/application-performance-using-intel-opencl-driver-diagnostics-sample-users-guide" target="_blank"&gt;https://software.intel.com/en-us/articles/application-performance-using-intel-opencl-driver-diagnostics-sample-users-guide&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;And here is extension spec:&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_driver_diagnostics.txt" target="_blank"&gt;https://www.khronos.org/registry/OpenCL/extensions/intel/cl_intel_driver_diagnostics.txt&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;In this case clCreateBuffer will provide a GOOD diagnostic, indicating that zero copy is happening.&lt;/P&gt;

&lt;P&gt;Something like that:&lt;/P&gt;

&lt;P&gt;"&lt;SPAN style="color: rgb(0, 0, 0); font-family: Consolas, &amp;quot;Bitstream Vera Sans Mono&amp;quot;, &amp;quot;Courier New&amp;quot;, Courier, monospace; font-size: 14px;"&gt;Performance hint: clCreateBuffer with pointer 30d5000 and size 4096 meets alignment restrictions and buffer will share the same physical memory with CPU."&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 30 Jan 2017 18:25:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/OpenCL-for-CPU/Advice-on-how-to-handle-structs-with-huge-arrays/m-p/1079057#M4619</guid>
      <dc:creator>Michal_M_Intel</dc:creator>
      <dc:date>2017-01-30T18:25:51Z</dc:date>
    </item>
  </channel>
</rss>

