<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi, in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179827#M361</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;You are right,&amp;nbsp;usm_allocator doesn't have an overload for the `construct` method as far as I know, thus making it impossible to use STL objects inside the kernel.&lt;/P&gt;&lt;P&gt;I'd suggest you to use oneDPL(Data parallel C++ library), which supports Parallel STL implementations on the device.&lt;/P&gt;&lt;P&gt;Here are the&amp;nbsp;oneDPL links, to get started:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html"&gt;https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top.html"&gt;https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.intel.com/legacyfs/online/drupal_files/oneAPIProgrammingGuide_9.pdf"&gt;https://software.intel.com/sites/default/files/oneAPIProgrammingGuide_9.pdf &lt;/A&gt;(chapter 5)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 11 May 2020 10:19:33 GMT</pubDate>
    <dc:creator>RahulV_intel</dc:creator>
    <dc:date>2020-05-11T10:19:33Z</dc:date>
    <item>
      <title>USM with std::vector</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179824#M358</link>
      <description>&lt;P&gt;I modified the simple.cpp example from the onAPI_Intro.ipynb to&amp;nbsp;use usm_allocator with std::vector, so I can try USM with STL containers.&amp;nbsp; My modified version looks like this:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include &amp;lt;CL/sycl.hpp&amp;gt;

#include &amp;lt;vector&amp;gt;

using namespace sycl;
static const int N = 16;
int main(){
    //# define queue which has default device associated for offload
    queue q;
    usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt; q_alloc{q};
    
    std::cout &amp;lt;&amp;lt; "Device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;info::device::name&amp;gt;() &amp;lt;&amp;lt; std::endl;

    //# Unified Shared Memory Allocation enables data access on host and device
    std::vector&amp;lt;int, usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt;&amp;gt; data(q_alloc);
    data.reserve(N);

    //# Initialization
    for(int i=0; i&amp;lt;N; i++) data.push_back(i);

    //# Offload parallel computation to device
    q.parallel_for(range&amp;lt;1&amp;gt;(N), [=] (id&amp;lt;1&amp;gt; i){
        data&lt;I&gt; *= 2;
    }).wait();

    //# Print Output
    for(int i=0; i&amp;lt;N; i++) std::cout &amp;lt;&amp;lt; data&lt;I&gt; &amp;lt;&amp;lt; std::endl;

    return 0;
}
&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;When I try to compile the above code (Beta06 on devcloud), the compiler says:&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;lab/simple.cpp:34:17: error: cannot assign to return value because function 'operator[]' returns a const value
        data&lt;I&gt; *= 2;
        ~~~~~~~ ^
/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/stl_vector.h:812:7: note: function 'operator[]' which returns const-qualified type 'std::vector&amp;lt;int, cl::sycl::usm_allocator&amp;lt;int, cl::sycl::usm::alloc::shared, 0&amp;gt; &amp;gt;::const_reference' (aka 'const int &amp;amp;') declared here
      const_reference
      ^~~~~~~~~~~~~~~
lab/simple.cpp:34:9: error: kernel parameter has non-trivially copy constructible class/struct type 'std::vector&amp;lt;int, usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt; &amp;gt;'
        data&lt;I&gt; *= 2;
        ^
2 errors generated.&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;The problem here is that data is being captured by value, which is triggering a copy of the std::vector.&amp;nbsp; I also tried capturing `data` by reference, but the compiler says:&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;lab/simple.cpp:30:35: error: 'std::vector&amp;lt;int, usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt; &amp;gt; &amp;amp;' cannot be used as the type of a kernel parameter
    q.parallel_for(range&amp;lt;1&amp;gt;(N), [&amp;amp;data] (id&amp;lt;1&amp;gt; i) {
                                  ^
1 error generated.&lt;/PRE&gt;

&lt;P&gt;And, I see in the SYCL 1.2 spec that variables can only be captured by value not by reference.&amp;nbsp; Not allowing capture by reference makes sense when pointers are not valid across domains.&amp;nbsp; However, USM shared allocations are valid both on the host and the device.&amp;nbsp; Is there any way to do capture by reference for USM objects?&lt;/P&gt;
&lt;P&gt;If not, is there some other way to use usm_allocator with STL classes so that device code can use methods from the class on the device?&amp;nbsp; Of course, the object methods called in a kernel would have to not be virtual and not try to allocate or free memory.&lt;/P&gt;
&lt;P&gt;Thanks,&lt;BR /&gt;Bill.&lt;/P&gt;</description>
      <pubDate>Thu, 07 May 2020 23:44:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179824#M358</guid>
      <dc:creator>William_D_Intel</dc:creator>
      <dc:date>2020-05-07T23:44:07Z</dc:date>
    </item>
    <item>
      <title>Hi Bill,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179825#M359</link>
      <description>&lt;P&gt;Hi Bill,&lt;/P&gt;&lt;P&gt;By default, USM allocator creates alloc::shared type of memory inside the host itself (vector 'data' in this case).&amp;nbsp;&lt;/P&gt;&lt;P&gt;Since, the&amp;nbsp;vector 'data' is created inside the host memory, the following statement becomes&amp;nbsp;invalid&amp;nbsp;because you are&amp;nbsp;trying to modify the 'data' vector,&amp;nbsp;which doesn't exist inside the device's memory.&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;	    q.parallel_for(range&amp;lt;1&amp;gt;(N), [=] (id&amp;lt;1&amp;gt; i){&lt;/PRE&gt;

&lt;P&gt;As a workaround, you can capture 'data' vector by reference, inside the lambda function's capture parameter&amp;nbsp;( [=,ptr = &amp;amp;data[0]] ) and modify the vector's&amp;nbsp;reference pointer(ptr) as you like.&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;               ptr&lt;I&gt; *= 2;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;Also, as you pointed out, SYCL doesn't support capture by reference functionality inside parallel_for kernel, because of the above mentioned reason.&lt;/P&gt;
&lt;P&gt;Refer to the embedded&amp;nbsp;code snippet below for more clarity.&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include &amp;lt;CL/sycl.hpp&amp;gt;

#include &amp;lt;vector&amp;gt;

using namespace sycl;
static const int N = 16;
int main(){
    //# define queue which has default device associated for offload
    queue q;
    usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt; q_alloc{q};

    std::cout &amp;lt;&amp;lt; "Device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;info::device::name&amp;gt;() &amp;lt;&amp;lt; std::endl;

    //# Unified Shared Memory Allocation enables data access on host and device
    std::vector&amp;lt;int, usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt;&amp;gt; data(q_alloc);
    data.reserve(N);

    //# Initialization
    for(int i=0; i&amp;lt;N; i++) data.push_back(i);

    //# Offload parallel computation to device
    q.parallel_for(range&amp;lt;1&amp;gt;(N), [=,ptr = &amp;amp;data[0]] (id&amp;lt;1&amp;gt; i){
        ptr&lt;I&gt; *= 2;
    }).wait();

    //# Print Output
    for(int i=0; i&amp;lt;N; i++) std::cout &amp;lt;&amp;lt; data&lt;I&gt; &amp;lt;&amp;lt; std::endl;

    return 0;
}
&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;If you do not&amp;nbsp;wish to specify reference pointer inside the capture parameter of the lambda function, you can add the following statement, right above parallel_for statement.&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp;int *ptr = &amp;amp;data[0];
&lt;/PRE&gt;

&lt;P&gt;This way, you can get rid of the additional capture parameter(ptr=&amp;amp;data[0]), inside the lambda function of parallel_for. Inside parallel_for,&amp;nbsp;ptr is captured by value, which was in turn captured as a reference to the data vector.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hence the modification to the actual data vector is possible inside the kernel.&lt;/P&gt;
&lt;P&gt;Let us know if this resolves your query.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Regards,&lt;/P&gt;
&lt;P&gt;Rahul&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2020 09:41:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179825#M359</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-05-08T09:41:56Z</dc:date>
    </item>
    <item>
      <title>Maybe vector is a bad example</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179826#M360</link>
      <description>&lt;P&gt;Maybe vector is a bad example here, because getting a simple `int *` pointer to the underlying data gives you most of the functionality of vector.&amp;nbsp; That is, vector does not give you a lot of functionality (that is usable in a kernel) compared to the pointer to the data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The reason for the question is to understand how to use USM with STL containers more generally.&amp;nbsp; For example, what if an application needed std::unordered_map instead of std::vector?&amp;nbsp; This would be useful for a table that each worker uses to look up values depending on its work_item.&lt;/P&gt;&lt;P&gt;Using sycl::usm_allocator as the allocator for a std::unordered_map's would allow the host to build a&amp;nbsp;std::unordered_map in host memory, and then pass it to the device, where operator[] or the `at` method could be used for doing table lookups inside a kernel.&amp;nbsp; Table updates maybe would be OK as long as no memory allocation was required and the app controls concurrency correctly.&lt;/P&gt;&lt;P&gt;I tried using usm_allocator to allocate the vector itself, as well as making usm_allocator be the allocator for the int items in the vector:&lt;/P&gt;&lt;P&gt;queue q; usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt; q_alloc_int{q}; usm_allocator&amp;lt;std::vector&amp;lt;int, usm_allocator&amp;lt;int, usm::alloc::shared&amp;gt;&amp;gt;, usm::alloc::shared&amp;gt; q_alloc_vector{q}; // ... auto data = q_alloc_vector.allocate(1); q_alloc_vector.construct(data, q_alloc_int);&lt;/P&gt;&lt;P&gt;The problem is that I need to pass q_alloc_init to the std::vector constructor, but unlike std::allocator, usm_allocator does not have an overload for the `construct` method that forwards arguments to the underlying constructor (so the last line above fails to compile).&amp;nbsp; C++20 documentation says there is a std::construct that looks like it should work with memory allocated by any allocator, but it appears not to be implemented yet in the dpcpp compiler.&amp;nbsp; Is there another way to use STL containers in device code?&lt;/P&gt;&lt;P&gt;USM seems to be close to being able to pass pointers to shared STL objects if I can get the STL container constructor to run on the shared memory.&amp;nbsp; Then I could pass the shared memory pointer by value to the kernel in the capture.&amp;nbsp; Probably, it would be cleaner to pass the pointer encapsulated in a std::shared_ptr or similar smart pointer, but the compiler is refusing to allow anything that is not trivially constructible.&lt;/P&gt;&lt;P&gt;Thanks,&lt;BR /&gt;Bill.&lt;/P&gt;</description>
      <pubDate>Fri, 08 May 2020 23:19:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179826#M360</guid>
      <dc:creator>William_D_Intel</dc:creator>
      <dc:date>2020-05-08T23:19:41Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179827#M361</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;You are right,&amp;nbsp;usm_allocator doesn't have an overload for the `construct` method as far as I know, thus making it impossible to use STL objects inside the kernel.&lt;/P&gt;&lt;P&gt;I'd suggest you to use oneDPL(Data parallel C++ library), which supports Parallel STL implementations on the device.&lt;/P&gt;&lt;P&gt;Here are the&amp;nbsp;oneDPL links, to get started:&lt;/P&gt;&lt;P&gt;&lt;A href="https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html"&gt;https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top.html"&gt;https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.intel.com/legacyfs/online/drupal_files/oneAPIProgrammingGuide_9.pdf"&gt;https://software.intel.com/sites/default/files/oneAPIProgrammingGuide_9.pdf &lt;/A&gt;(chapter 5)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 May 2020 10:19:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179827#M361</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-05-11T10:19:33Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179828#M362</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Let us know if we can close this thread.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;--Rahul&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 07:53:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179828#M362</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-05-19T07:53:53Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179829#M363</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;As per the process, we will go ahead and close this thread. Feel free to&amp;nbsp;raise a new thread if your issue still persists.&lt;/P&gt;</description>
      <pubDate>Mon, 25 May 2020 10:59:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/USM-with-std-vector/m-p/1179829#M363</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2020-05-25T10:59:55Z</dc:date>
    </item>
  </channel>
</rss>

