Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel ICX Compiler , Intel® DPC++ Compatibility Tool, and GDB*

USM with std::vector

William_D_Intel
Employee
2,224 Views

I modified the simple.cpp example from the onAPI_Intro.ipynb to use usm_allocator with std::vector, so I can try USM with STL containers.  My modified version looks like this:

//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <CL/sycl.hpp>

#include <vector>

using namespace sycl;
static const int N = 16;
int main(){
    //# define queue which has default device associated for offload
    queue q;
    usm_allocator<int, usm::alloc::shared> q_alloc{q};
    
    std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;

    //# Unified Shared Memory Allocation enables data access on host and device
    std::vector<int, usm_allocator<int, usm::alloc::shared>> data(q_alloc);
    data.reserve(N);

    //# Initialization
    for(int i=0; i<N; i++) data.push_back(i);

    //# Offload parallel computation to device
    q.parallel_for(range<1>(N), [=] (id<1> i){
        data *= 2;
    }).wait();

    //# Print Output
    for(int i=0; i<N; i++) std::cout << data << std::endl;

    return 0;
}

When I try to compile the above code (Beta06 on devcloud), the compiler says:

lab/simple.cpp:34:17: error: cannot assign to return value because function 'operator[]' returns a const value
        data *= 2;
        ~~~~~~~ ^
/usr/lib/gcc/x86_64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/bits/stl_vector.h:812:7: note: function 'operator[]' which returns const-qualified type 'std::vector<int, cl::sycl::usm_allocator<int, cl::sycl::usm::alloc::shared, 0> >::const_reference' (aka 'const int &') declared here
      const_reference
      ^~~~~~~~~~~~~~~
lab/simple.cpp:34:9: error: kernel parameter has non-trivially copy constructible class/struct type 'std::vector<int, usm_allocator<int, usm::alloc::shared> >'
        data *= 2;
        ^
2 errors generated.

The problem here is that data is being captured by value, which is triggering a copy of the std::vector.  I also tried capturing `data` by reference, but the compiler says:

lab/simple.cpp:30:35: error: 'std::vector<int, usm_allocator<int, usm::alloc::shared> > &' cannot be used as the type of a kernel parameter
    q.parallel_for(range<1>(N), [&data] (id<1> i) {
                                  ^
1 error generated.

And, I see in the SYCL 1.2 spec that variables can only be captured by value not by reference.  Not allowing capture by reference makes sense when pointers are not valid across domains.  However, USM shared allocations are valid both on the host and the device.  Is there any way to do capture by reference for USM objects?

If not, is there some other way to use usm_allocator with STL classes so that device code can use methods from the class on the device?  Of course, the object methods called in a kernel would have to not be virtual and not try to allocate or free memory.

Thanks,
Bill.

0 Kudos
5 Replies
RahulV_intel
Moderator
2,224 Views

Hi Bill,

By default, USM allocator creates alloc::shared type of memory inside the host itself (vector 'data' in this case). 

Since, the vector 'data' is created inside the host memory, the following statement becomes invalid because you are trying to modify the 'data' vector, which doesn't exist inside the device's memory.

	    q.parallel_for(range<1>(N), [=] (id<1> i){

As a workaround, you can capture 'data' vector by reference, inside the lambda function's capture parameter ( [=,ptr = &data[0]] ) and modify the vector's reference pointer(ptr) as you like.

               ptr *= 2;

Also, as you pointed out, SYCL doesn't support capture by reference functionality inside parallel_for kernel, because of the above mentioned reason.

Refer to the embedded code snippet below for more clarity.

//==============================================================
// Copyright © 2020 Intel Corporation
//
// SPDX-License-Identifier: MIT
// =============================================================
#include <CL/sycl.hpp>

#include <vector>

using namespace sycl;
static const int N = 16;
int main(){
    //# define queue which has default device associated for offload
    queue q;
    usm_allocator<int, usm::alloc::shared> q_alloc{q};

    std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;

    //# Unified Shared Memory Allocation enables data access on host and device
    std::vector<int, usm_allocator<int, usm::alloc::shared>> data(q_alloc);
    data.reserve(N);

    //# Initialization
    for(int i=0; i<N; i++) data.push_back(i);

    //# Offload parallel computation to device
    q.parallel_for(range<1>(N), [=,ptr = &data[0]] (id<1> i){
        ptr *= 2;
    }).wait();

    //# Print Output
    for(int i=0; i<N; i++) std::cout << data << std::endl;

    return 0;
}

If you do not wish to specify reference pointer inside the capture parameter of the lambda function, you can add the following statement, right above parallel_for statement. 

     int *ptr = &data[0];

This way, you can get rid of the additional capture parameter(ptr=&data[0]), inside the lambda function of parallel_for. Inside parallel_for, ptr is captured by value, which was in turn captured as a reference to the data vector. 

Hence the modification to the actual data vector is possible inside the kernel.

Let us know if this resolves your query.

 

Regards,

Rahul

0 Kudos
William_D_Intel
Employee
2,224 Views

Maybe vector is a bad example here, because getting a simple `int *` pointer to the underlying data gives you most of the functionality of vector.  That is, vector does not give you a lot of functionality (that is usable in a kernel) compared to the pointer to the data. 

The reason for the question is to understand how to use USM with STL containers more generally.  For example, what if an application needed std::unordered_map instead of std::vector?  This would be useful for a table that each worker uses to look up values depending on its work_item.

Using sycl::usm_allocator as the allocator for a std::unordered_map's would allow the host to build a std::unordered_map in host memory, and then pass it to the device, where operator[] or the `at` method could be used for doing table lookups inside a kernel.  Table updates maybe would be OK as long as no memory allocation was required and the app controls concurrency correctly.

I tried using usm_allocator to allocate the vector itself, as well as making usm_allocator be the allocator for the int items in the vector:

queue q; usm_allocator<int, usm::alloc::shared> q_alloc_int{q}; usm_allocator<std::vector<int, usm_allocator<int, usm::alloc::shared>>, usm::alloc::shared> q_alloc_vector{q}; // ... auto data = q_alloc_vector.allocate(1); q_alloc_vector.construct(data, q_alloc_int);

The problem is that I need to pass q_alloc_init to the std::vector constructor, but unlike std::allocator, usm_allocator does not have an overload for the `construct` method that forwards arguments to the underlying constructor (so the last line above fails to compile).  C++20 documentation says there is a std::construct that looks like it should work with memory allocated by any allocator, but it appears not to be implemented yet in the dpcpp compiler.  Is there another way to use STL containers in device code?

USM seems to be close to being able to pass pointers to shared STL objects if I can get the STL container constructor to run on the shared memory.  Then I could pass the shared memory pointer by value to the kernel in the capture.  Probably, it would be cleaner to pass the pointer encapsulated in a std::shared_ptr or similar smart pointer, but the compiler is refusing to allow anything that is not trivially constructible.

Thanks,
Bill.

0 Kudos
RahulV_intel
Moderator
2,224 Views

Hi,

You are right, usm_allocator doesn't have an overload for the `construct` method as far as I know, thus making it impossible to use STL objects inside the kernel.

I'd suggest you to use oneDPL(Data parallel C++ library), which supports Parallel STL implementations on the device.

Here are the oneDPL links, to get started:

https://spec.oneapi.com/versions/latest/elements/oneDPL/source/index.html

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top.html

https://software.intel.com/sites/default/files/oneAPIProgrammingGuide_9.pdf (chapter 5)

 

Regards,

Rahul

 

0 Kudos
RahulV_intel
Moderator
2,224 Views

Hi,

Let us know if we can close this thread.

 

--Rahul

0 Kudos
RahulV_intel
Moderator
2,224 Views

Hi,

As per the process, we will go ahead and close this thread. Feel free to raise a new thread if your issue still persists.

0 Kudos
Reply