- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I am porting my software project from CUDA to DPC++ and am struggling with the seemingly unnecessarily restrictive standard for objects that can be accessed inside a kernel.
For instance, one very simple way of adapting a code that is built around container classes with data allocated using managed memory is to simply have a suitably scoped switch controlling the functionality of the copy constructor:
static bool shallow_copy = false;
class MyContainer{
double *v;
size_t sz;
public:
MyContainer(size_t n){
v = (double*)acceleratorAllocShared(n*sizeof(double));
sz = n;
}
MyContainer(const MyContainer &r){
if(shallow_copy){
sz = r.sz;
v = r.v;
}else{
sz = r.sz;
v = (double*)acceleratorAllocShared(sz*sizeof(double));
memcpy(v, r.v, sz*sizeof(double));
}
}
inline double operator[](const size_t i) const{ return v[i]; }
inline double &operator[](const size_t i){ return v[i]; }
~MyContainer(){
if(!shallow_copy)
acceleratorFreeShared(v);
}
};
which can be used as
MyContainer c(100),d(100);
shallow_copy = true;
accelerator_for(i, 100, 1, {
d[i] = c[i]*c[i];
});
shallow_copy = false;
This pattern works perfectly with CUDA and is much simpler to implement than creating view class instances for every object accessed within the lambda (which seems to defeat the purpose of automatic lambda captures!) just to work within the restrictions of the standard. However it fails with dpc++:
test_container.C:83:7: error: kernel parameter has non-trivially copy constructible class/struct type 'MyContainer'
d[i] = c[i]*c[i];
^
test_container.C:83:14: error: kernel parameter has non-trivially copy constructible class/struct type 'MyContainer'
d[i] = c[i]*c[i];
^
I ask the devs to consider allowing users to bypass this restriction, perhaps by allowing us to tell the compiler that the class will behave as if trivially constructible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
Hope the above information addressed your concerns. Is it okay to now close this thread?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
Thanks for reaching out to us.
To access the class instance inside a kernel, try creating a class with a trivial copy constructor and trivially destructible. Using such a class will not give any of those errors which you are having in your current use-case. So instead of freeing the memory inside a destructor, you may create a function inside your class to achieve the same or may use the smart pointers inside your class.
Please refer to the below implementation of your current use-case to get insight.
class MyContainer {
double* v;
size_t sz;
public:
MyContainer(size_t n) {
v = (double*)acceleratorAllocShared(n * sizeof(double));
sz = n;
}
inline double& operator[](const size_t i) const{ return v[i]; }
void free() {
acceleratorFreeShared(v);
}
};
int main(void) {
cl::sycl::gpu_selector selector;
cl::sycl::device selectedDevice{ selector };
theGridAccelerator = new sycl::queue(selectedDevice);
MyContainer c(100), d(100);
for (size_t i = 0; i < 100; i++) {
c[i] = 1;
}
accelerator_for(i, 100, 1, {
d[i] = c[i] * c[i];
});
for(size_t i=0;i<20;i++){
std::cout<<d[i]<<" ";
}
std::cout << "\n";
std::cout << std::is_trivially_destructible<MyContainer>::value << std::endl;
std::cout << std::is_trivially_copy_constructible<MyContainer>::value << std::endl;
d.free();
c.free();
return 0;
}
$ ./a.out
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1
1
You can update it according to your requirements. Hope the provided details will help you to solve your issue.
Warm Regards,
Abhishek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Abhishek,
Thank you for your reply. The problem with your solution is that it breaks c++ conventions that expect a copy constructor to perform a "deep copy". I imagine most developers would be reluctant to provide code that defies these conventions.
In order to comply with conventions in the main code body a developer is typically forced to create temporary "view" class for a container that contains the underlying pointer and has a shallow copy constructor. This then requires the developer to manually instantiate views for every object accessed inside the device lambda, defeating the major convenience of automatic lambda capture.
I provided an alternative in my original post that used a variable to control the behavior of the constructor, and which works perfectly on CUDA but is disallowed in DPC++ due to the overly restrictive standard. What I was asking is for the developer to be able to tell the compiler that the class will act as if trivially constructible during the kernel execution.
Best,
Chris
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
The latest SYCL 2020 specification allows implementations to support explicit declaration of certain class types as device copyable by the user as long as they have atleast one eligible copy constructor, move constructor, copy assignment operator or move assignment operator, more details on this can be found at https://www.khronos.org/registry/SYCL/specs/sycl-2020/html/sycl-2020.html#sec::device.copyable.
DPC++ Compiler does not support this feature at this time but we plan to implement it in one of our future releases. The status of support for SYCL 2020 features can be tracked at https://software.intel.com/content/www/us/en/develop/articles/sycl-2020-features-dpc-language-oneapi-c.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the update, I look forward to testing out this feature once it becomes available. The interesting points in the standard are:
-
Type T has at least one eligible copy constructor, move constructor, copy assignment operator, or move assignment operator;
-
Each eligible copy constructor, move constructor, copy assignment operator, and move assignment operator is public;
-
When doing an inter-device transfer of an object of type T, the effect of each eligible copy constructor, move constructor, copy assignment operator, and move assignment operator is the same as a bitwise copy of the object;
What I want to know is how "eligible" is defined here. In my example I have a constructor and destructor whose behavior changes between a shallow and deep copy depending on an external variable. Will such a constructor be considered "eligible"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Eligible basically just means if it is callable. Exact definition is: http://eel.is/c++draft/special#6
A user is allowed to specialize device_copyable for their type if the operations do the same as bitwise. In the example shallow copy is the same as bitwise. If it is guaranteed that for device copy always the shallow copy is used, then it is legal to use device_copyable to allow this type to be device copied.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christopher,
Hope the above information addressed your concerns. Is it okay to now close this thread?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page