- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone !!
I have the following problem:
I have a CUDA code that uses texture, for example:
texture<unsigned char, 2, cudaReadModeElementType> text;
unsigned char a = tex2D(text, cx + lx, cy);
So when I use DPCT, I have this output:
/*
DPCT1059:12: SYCL only supports 4-channel image format. Adjust the code.
*/
dpct::image_wrapper<unsigned char, 2> text;
SO, I changed the declaration with this:
dpct::image_wrapper<sycl::uchar4, 2> text_wrp;
And after launch kernel:
dpct::image_accessor_ext<sycl::uchar4, 2> text
My problem is that I don't know how to create an equivalent when reading. I have this:
sycl::uchar4 a = text.read(cx + lx, cy);
But I don't know how to get the same unsigned char from CUDA in my DPC code. Do I have to modify the indexes on the read ? I'm really lost.
Thank you so much in advance !!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience while we were working on this question. We found a bug in DPC++ Compatibility Tool and it took more time to investigate it and find a workaround than we expected.
The issue is with the following line:
--------------------------------------------------------------------------textureD.attach(d, width, height, pitch, channel);
--------------------------------------------------------------------------
You need to replace it with:
--------------------------------------------------------------------------
textureD.attach(d, width * sizeof(int), height, pitch, channel);
--------------------------------------------------------------------------
We expect that the fix for this issue will be included in the next release.
Since I'm not able to attach files, I have to post the code here:
--------------------------------------------------------------------------
#include <CL/sycl.hpp>
#include <dpct/dpct.hpp>
#include <stdio.h>
dpct::image_wrapper<cl::sycl::uint4, 2> textureD;
void kernel2(int *d, int *dOutput, dpct::image_accessor_ext<cl::sycl::uint4, 2> acc, int width, int height){
for(int row = 0; row < height; row++) {
for(int col = 0; col < width; col++){
cl::sycl::uint4 tex_data;
tex_data = acc.read(col, row) ;
dOutput[row * width + col] = tex_data[0] ;
}
}
}
int main(){
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.default_queue();
int *h;
int width = 10;
int height = 10;
int size = width * height;
h = (int *)sycl::malloc_host(size* sizeof(int), q_ct1);
int i = 0;
for(int row = 0; row < height; row++) {
for(int col = 0; col < width; col++){
h[row * width + col] = i;
i++;
}
}
int *d;
size_t pitch;
d = (int *)dpct::dpct_malloc(pitch, width * sizeof(int), height);
dpct::dpct_memcpy(d, pitch, h, width * sizeof(int), width * sizeof(int),
height, dpct::host_to_device);
dpct::image_channel channel = dpct::image_channel::create<int>();
//textureD.attach(d, width, height, pitch, channel);
textureD.attach(d, width * sizeof(int), height, pitch, channel); // <- this line contains a workaround for DPCT bug. Fix for this issues is expected to be fixed in the next release so please, try to remove '* sizeof(int)' when the release will be available.
int *hOutput;
hOutput = (int *) sycl::malloc_host(size* sizeof(int), q_ct1);
int *dOutput;
dOutput = (int *)sycl::malloc_device(size* sizeof(int), q_ct1);
q_ct1.submit([&](sycl::handler &cgh) {
auto textureD_acc = textureD.get_access(cgh);
auto textureD_smpl = textureD.get_sampler();
auto out = sycl::stream(1024, 768, cgh);
cgh.single_task<class dpct_single_kernel>([=] {
kernel2(d, dOutput,
dpct::image_accessor_ext<cl::sycl::uint4, 2>(textureD_smpl, textureD_acc),
width, height);
});
}).wait();
q_ct1.memcpy(hOutput, dOutput, size * sizeof(int)).wait();
for(int row = 0; row < height; row++){
for(int col = 0; col < width; col++){
printf("%d ", h[row * width + col]);
}
printf("\n");
}
printf("\n");
for(int row = 0; row < height; row++){
for(int col = 0; col < width; col++){
printf("%d ", hOutput[row * width + col]);
}
printf("\n");
}
getchar();
sycl::free(h, q_ct1);
sycl::free(d, q_ct1);
sycl::free(dOutput, q_ct1);
sycl::free(hOutput, q_ct1);
return 0;
}
--------------------------------------------------------------------------
We will no longer respond to this thread.
If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Thanks,
Alina
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please provide us a sample reproducer of both CUDA, DPCPP so that we can work on it from our end.
And also please do let us know your environment details
OS & version
compiler version.
Thanks & Regards
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @NoorjahanSk_Intel Thank you so much for reply.
Here I attach two files: the original CUDA code and the original DPCT output. Could you help me to finish the migration ?
Thank you again !
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@NoorjahanSk_Intel Here I have a code that compiles, but I get a wrong output (is not the same as CUDA output).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We are working on it. We will get back to you soon.
Thanks & Regards
Noorjahan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much ! I will be waiting .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your patience!
Does the solution in DPC++ Developer Guide work for you?
The difference is that DevGuide recommends using single_task instead of parallel ND range.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Alina_S_Intel !
No, but maybe I'm doing something wrong. I added an example with the native CUDA code and the oneAPI code migrated. Can you modify the oneAPI code to get the same result as CUDA, please ?
Thank you so much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Alina_S_Intel Here you have the example attached again ! And if you like, here you have de CUDA code that works. I want to migrate this code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience. Unfortunately, I am not able to help you with the full migration because the CUDA code is incorrect.
1) The CUDA code gives 9 starting the second row:
0 1 2 3 4 5 6 7 8 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
9 9 9 9 9 9 9 9 9 9
while initial array is :
0 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92 93 94 95 96 97 98 99
It should return the same array.
2) Speaking about migrated code. You need to change :
sycl::malloc_host<int><int>(size, q_ct1);
to
(int *)sycl::malloc_host(size* sizeof(int), q_ct1);
for each SYCL malloc function. It helps you to avoid the following errors:
------------------------------------------------------------------------------------
$ dpcpp test.dp.cpp
test.dp.cpp:56:39: error: expected '(' for function-style cast or type construction
h = sycl::malloc_host<int><int>(size, q_ct1);
~~~^
test.dp.cpp:56:41: warning: expression result unused [-Wunused-value]
h = sycl::malloc_host<int><int>(size, q_ct1);
^~~~
test.dp.cpp:84:20: error: no matching function for call to 'dpct_malloc'
d = (int *)dpct::dpct_malloc<int>(pitch, width * sizeof(int), height);
^~~~~~~~~~~~~~~~~~~~~~
/nfs/pdx/disks/cts2/tools/oneapi/2021.3.0/dpcpp-ct/2021.3.0/include/dpct/memory.hpp:576:21: note: candidate function template not viable: requires at most 2 arguments, but 3 were provided
static inline void *dpct_malloc(T num_bytes,
^
test.dp.cpp:106:45: error: expected '(' for function-style cast or type construction
hOutput = sycl::malloc_host<int><int>(size, q_ct1);
~~~^
test.dp.cpp:106:47: warning: expression result unused [-Wunused-value]
hOutput = sycl::malloc_host<int><int>(size, q_ct1);
^~~~
test.dp.cpp:110:47: error: expected '(' for function-style cast or type construction
dOutput = sycl::malloc_device<int><int>(size, q_ct1);
~~~^
test.dp.cpp:110:49: warning: expression result unused [-Wunused-value]
dOutput = sycl::malloc_device<int><int>(size, q_ct1);
------------------------------------------------------------------------------------
3) When you are done with errors, you can to go back to DPCT1059:2.
If you try to run the application 'as is', you'll see the following error:
------------------------------------------
./a.out
die: piMemImageCreate: unsupported image format layout
libc++abi: terminating
Aborted (core dumped)
------------------------------------------
It happens because our default LevelZero backend does not support this feature yet. Please, use SYCL_BE=PI_OPENCL (deprecated variable) or SYCL_DEVICE_FILTER=opencl:gpu (the replacement for SYCL_BE and SYCL_DEVICE_TYPE) to run your application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you !
Yes, I know that the CUDA code returns all 9, but I expected to get the same result in oneAPI, even if the CUDA code doesn't make sense. Was an example code.
So I can't do anything to solve the 4-Channel problem? We cant modify the 1-Channel to transform it to 4-Channel and getting the same CUDA result (with the same 9)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your patience while we were working on this question. We found a bug in DPC++ Compatibility Tool and it took more time to investigate it and find a workaround than we expected.
The issue is with the following line:
--------------------------------------------------------------------------textureD.attach(d, width, height, pitch, channel);
--------------------------------------------------------------------------
You need to replace it with:
--------------------------------------------------------------------------
textureD.attach(d, width * sizeof(int), height, pitch, channel);
--------------------------------------------------------------------------
We expect that the fix for this issue will be included in the next release.
Since I'm not able to attach files, I have to post the code here:
--------------------------------------------------------------------------
#include <CL/sycl.hpp>
#include <dpct/dpct.hpp>
#include <stdio.h>
dpct::image_wrapper<cl::sycl::uint4, 2> textureD;
void kernel2(int *d, int *dOutput, dpct::image_accessor_ext<cl::sycl::uint4, 2> acc, int width, int height){
for(int row = 0; row < height; row++) {
for(int col = 0; col < width; col++){
cl::sycl::uint4 tex_data;
tex_data = acc.read(col, row) ;
dOutput[row * width + col] = tex_data[0] ;
}
}
}
int main(){
dpct::device_ext &dev_ct1 = dpct::get_current_device();
sycl::queue &q_ct1 = dev_ct1.default_queue();
int *h;
int width = 10;
int height = 10;
int size = width * height;
h = (int *)sycl::malloc_host(size* sizeof(int), q_ct1);
int i = 0;
for(int row = 0; row < height; row++) {
for(int col = 0; col < width; col++){
h[row * width + col] = i;
i++;
}
}
int *d;
size_t pitch;
d = (int *)dpct::dpct_malloc(pitch, width * sizeof(int), height);
dpct::dpct_memcpy(d, pitch, h, width * sizeof(int), width * sizeof(int),
height, dpct::host_to_device);
dpct::image_channel channel = dpct::image_channel::create<int>();
//textureD.attach(d, width, height, pitch, channel);
textureD.attach(d, width * sizeof(int), height, pitch, channel); // <- this line contains a workaround for DPCT bug. Fix for this issues is expected to be fixed in the next release so please, try to remove '* sizeof(int)' when the release will be available.
int *hOutput;
hOutput = (int *) sycl::malloc_host(size* sizeof(int), q_ct1);
int *dOutput;
dOutput = (int *)sycl::malloc_device(size* sizeof(int), q_ct1);
q_ct1.submit([&](sycl::handler &cgh) {
auto textureD_acc = textureD.get_access(cgh);
auto textureD_smpl = textureD.get_sampler();
auto out = sycl::stream(1024, 768, cgh);
cgh.single_task<class dpct_single_kernel>([=] {
kernel2(d, dOutput,
dpct::image_accessor_ext<cl::sycl::uint4, 2>(textureD_smpl, textureD_acc),
width, height);
});
}).wait();
q_ct1.memcpy(hOutput, dOutput, size * sizeof(int)).wait();
for(int row = 0; row < height; row++){
for(int col = 0; col < width; col++){
printf("%d ", h[row * width + col]);
}
printf("\n");
}
printf("\n");
for(int row = 0; row < height; row++){
for(int col = 0; col < width; col++){
printf("%d ", hOutput[row * width + col]);
}
printf("\n");
}
getchar();
sycl::free(h, q_ct1);
sycl::free(d, q_ct1);
sycl::free(dOutput, q_ct1);
sycl::free(hOutput, q_ct1);
return 0;
}
--------------------------------------------------------------------------
We will no longer respond to this thread.
If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.
Thanks,
Alina
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Alina_S_Intel Hi, thank you for reply. Sorry, but you are using
dpct::image_channel channel = dpct::image_channel::create<int>();
But 1-Channel doesn't work for me, I have the runtime error:
die: piMemImageCreate: unsupported image format layout
libc++abi: terminating
Aborted
However, if I change int to int4, it doesn't work either
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3) When you are done with errors, you can to go back to DPCT1059:2.
If you try to run the application 'as is', you'll see the following error:
------------------------------------------
./a.out
die: piMemImageCreate: unsupported image format layout
libc++abi: terminating
Aborted (core dumped)
------------------------------------------
It happens because our default LevelZero backend does not support this feature yet. Please, use SYCL_BE=PI_OPENCL (deprecated variable) or SYCL_DEVICE_FILTER=opencl:gpu (the replacement for SYCL_BE and SYCL_DEVICE_TYPE) to run your application.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much! You are helping me a lot.
I've just created a new topic with a working CUDA code that I want to migrate, but I can't. Maybe you find the problem there. Thank you so much again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, using SYCL_DEVICE_FILTER=opencl:gpu I get CL_DEVICE_NOT_FOUND. I'm using a NVIDIA GeForce RTX 2070 gpu. Do you know why ?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page