Migrating to SYCL
One-stop forum for getting assistance migrating your existing code to SYCL
39 Discussions

Intel oneAPI DPCT can't convert from CUDA 1-Channel texture to DPCT 4-Channel image_wrapper

ManuelCostanzo22
Beginner
3,989 Views

Hi everyone !!

I have the following problem:

I have a CUDA code that uses texture, for example:

 

 

texture<unsigned char, 2, cudaReadModeElementType> text;
unsigned char a = tex2D(text, cx + lx, cy);

 

 

So when I use DPCT, I have this output:

 

/*
DPCT1059:12: SYCL only supports 4-channel image format. Adjust the code.
*/
dpct::image_wrapper<unsigned char, 2> text;

 

SO, I changed the declaration with this:

 

 

dpct::image_wrapper<sycl::uchar4, 2> text_wrp;

And after launch kernel:
dpct::image_accessor_ext<sycl::uchar4, 2> text

 

 

My problem is that I don't know how to create an equivalent when reading. I have this:

 

sycl::uchar4 a = text.read(cx + lx, cy);

 

But I don't know how to get the same unsigned char from CUDA in my DPC code. Do I have to modify the indexes on the read ? I'm really lost.

Thank you so much in advance !!

0 Kudos
1 Solution
Alina_S_Intel
Employee
3,392 Views

Thank you for your patience while we were working on this question. We found a bug in DPC++ Compatibility Tool and it took more time to investigate it and find a workaround than we expected.


The issue is with the following line:

--------------------------------------------------------------------------textureD.attach(d, width, height, pitch, channel);

--------------------------------------------------------------------------


You need to replace it with:

--------------------------------------------------------------------------

textureD.attach(d, width * sizeof(int), height, pitch, channel);

--------------------------------------------------------------------------


We expect that the fix for this issue will be included in the next release.



Since I'm not able to attach files, I have to post the code here:

--------------------------------------------------------------------------

#include <CL/sycl.hpp>

#include <dpct/dpct.hpp>


#include <stdio.h>


dpct::image_wrapper<cl::sycl::uint4, 2> textureD;


void kernel2(int *d, int *dOutput, dpct::image_accessor_ext<cl::sycl::uint4, 2> acc, int width, int height){

for(int row = 0; row < height; row++) {

for(int col = 0; col < width; col++){

cl::sycl::uint4 tex_data;

tex_data = acc.read(col, row) ;

dOutput[row * width + col] = tex_data[0] ; 

}

}

}



int main(){

  dpct::device_ext &dev_ct1 = dpct::get_current_device();

  sycl::queue &q_ct1 = dev_ct1.default_queue();


int *h;

int width = 10;

int height = 10;

int size = width * height;


h = (int *)sycl::malloc_host(size* sizeof(int), q_ct1);


int i = 0;


for(int row = 0; row < height; row++) {

for(int col = 0; col < width; col++){

h[row * width + col] = i;

i++;

}

}


int *d;


size_t pitch;


d = (int *)dpct::dpct_malloc(pitch, width * sizeof(int), height);

dpct::dpct_memcpy(d, pitch, h, width * sizeof(int), width * sizeof(int),

 height, dpct::host_to_device);


dpct::image_channel channel = dpct::image_channel::create<int>();

//textureD.attach(d, width, height, pitch, channel);

textureD.attach(d, width * sizeof(int), height, pitch, channel); // <- this line contains a workaround for DPCT bug. Fix for this issues is expected to be fixed in the next release so please, try to remove '* sizeof(int)' when the release will be available.

 

int *hOutput;

hOutput = (int *) sycl::malloc_host(size* sizeof(int), q_ct1);


int *dOutput;


dOutput = (int *)sycl::malloc_device(size* sizeof(int), q_ct1);

q_ct1.submit([&](sycl::handler &cgh) {

auto textureD_acc = textureD.get_access(cgh);


auto textureD_smpl = textureD.get_sampler();

auto out = sycl::stream(1024, 768, cgh);


cgh.single_task<class dpct_single_kernel>([=] {

 kernel2(d, dOutput, 

dpct::image_accessor_ext<cl::sycl::uint4, 2>(textureD_smpl, textureD_acc),

width, height);

});

}).wait();


q_ct1.memcpy(hOutput, dOutput, size * sizeof(int)).wait();


for(int row = 0; row < height; row++){

for(int col = 0; col < width; col++){

printf("%d ", h[row * width + col]);

}

printf("\n");

}


printf("\n");


for(int row = 0; row < height; row++){

for(int col = 0; col < width; col++){

printf("%d ", hOutput[row * width + col]);

}

printf("\n");

}


getchar();

sycl::free(h, q_ct1);

sycl::free(d, q_ct1);

sycl::free(dOutput, q_ct1);

sycl::free(hOutput, q_ct1);


return 0;


}

--------------------------------------------------------------------------


We will no longer respond to this thread. 

If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Thanks,

Alina


View solution in original post

0 Kudos
16 Replies
NoorjahanSk_Intel
Moderator
3,951 Views

Hi,

Thanks for reaching out to us.

Could you please provide us a sample reproducer of both CUDA, DPCPP so that we can work on it from our end.

And also please do let us know your environment details

OS & version

compiler version.

 

Thanks & Regards

Noorjahan.

 

0 Kudos
ManuelCostanzo22
Beginner
3,932 Views

Hi @NoorjahanSk_Intel  Thank you so much for reply.

 

Here I attach two files: the original CUDA code and the original DPCT output. Could you help me to finish the migration ? 

 

Thank you again !

 

 

0 Kudos
ManuelCostanzo22
Beginner
3,915 Views

@NoorjahanSk_Intel  Here I have a code that compiles, but I get a wrong output (is not the same as CUDA output).

 

 

 

0 Kudos
NoorjahanSk_Intel
Moderator
3,834 Views

Hi,

We are working on it. We will get back to you soon.


Thanks & Regards

Noorjahan


0 Kudos
ManuelCostanzo22
Beginner
3,826 Views
Hi !

Thank you so much ! I will be waiting .
0 Kudos
Alina_S_Intel
Employee
3,757 Views

Thanks for your patience!


Does the solution in DPC++ Developer Guide work for you?

https://software.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference/dpct1059.html


The difference is that DevGuide recommends using single_task instead of parallel ND range.


0 Kudos
ManuelCostanzo22
Beginner
3,752 Views

Hi @Alina_S_Intel !

 

No, but maybe I'm doing something wrong. I added an example with the native CUDA code and the oneAPI code migrated. Can you modify the oneAPI code to get the same result as CUDA, please ? 

 

Thank you so much.

0 Kudos
ManuelCostanzo22
Beginner
3,750 Views

@Alina_S_Intel Here you have the example attached again ! And if you like, here you have de CUDA code that works. I want to migrate this code.

 

 

 

 

0 Kudos
ManuelCostanzo22
Beginner
3,685 Views

@NoorjahanSk_Intel @Alina_S_Intel  Hi, sorry, any update on this? Thank you.

0 Kudos
Alina_S_Intel
Employee
3,634 Views

Thank you for your patience. Unfortunately, I am not able to help you with the full migration because the CUDA code is incorrect.


1) The CUDA code gives 9 starting the second row:


0 1 2 3 4 5 6 7 8 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9


while initial array is :


0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27 28 29

30 31 32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47 48 49

50 51 52 53 54 55 56 57 58 59

60 61 62 63 64 65 66 67 68 69

70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89

90 91 92 93 94 95 96 97 98 99


It should return the same array.


2) Speaking about migrated code. You need to change :


sycl::malloc_host<int><int>(size, q_ct1);


to


(int *)sycl::malloc_host(size* sizeof(int), q_ct1);


for each SYCL malloc function. It helps you to avoid the following errors:

------------------------------------------------------------------------------------

$ dpcpp test.dp.cpp

test.dp.cpp:56:39: error: expected '(' for function-style cast or type construction

    h = sycl::malloc_host<int><int>(size, q_ct1);

                  ~~~^

test.dp.cpp:56:41: warning: expression result unused [-Wunused-value]

    h = sycl::malloc_host<int><int>(size, q_ct1);

                    ^~~~

test.dp.cpp:84:20: error: no matching function for call to 'dpct_malloc'

    d = (int *)dpct::dpct_malloc<int>(pitch, width * sizeof(int), height);

          ^~~~~~~~~~~~~~~~~~~~~~

/nfs/pdx/disks/cts2/tools/oneapi/2021.3.0/dpcpp-ct/2021.3.0/include/dpct/memory.hpp:576:21: note: candidate function template not viable: requires at most 2 arguments, but 3 were provided

static inline void *dpct_malloc(T num_bytes,

          ^

test.dp.cpp:106:45: error: expected '(' for function-style cast or type construction

    hOutput = sycl::malloc_host<int><int>(size, q_ct1);

                     ~~~^

test.dp.cpp:106:47: warning: expression result unused [-Wunused-value]

    hOutput = sycl::malloc_host<int><int>(size, q_ct1);

                       ^~~~

test.dp.cpp:110:47: error: expected '(' for function-style cast or type construction

    dOutput = sycl::malloc_device<int><int>(size, q_ct1);

                      ~~~^

test.dp.cpp:110:49: warning: expression result unused [-Wunused-value]

    dOutput = sycl::malloc_device<int><int>(size, q_ct1);

------------------------------------------------------------------------------------


3) When you are done with errors, you can to go back to DPCT1059:2.

If you try to run the application 'as is', you'll see the following error:

------------------------------------------

./a.out

die: piMemImageCreate: unsupported image format layout


libc++abi: terminating

Aborted (core dumped)

------------------------------------------


It happens because our default LevelZero backend does not support this feature yet. Please, use SYCL_BE=PI_OPENCL (deprecated variable) or SYCL_DEVICE_FILTER=opencl:gpu (the replacement for SYCL_BE and SYCL_DEVICE_TYPE) to run your application.


0 Kudos
ManuelCostanzo22
Beginner
3,605 Views

Thank you !

 

Yes, I know that the CUDA code returns all 9, but I expected to get the same result in oneAPI, even if the CUDA code doesn't make sense. Was an example code.

 

So I can't do anything to solve the 4-Channel problem? We cant modify the 1-Channel to transform it to 4-Channel and getting the same CUDA result (with the same 9)?

0 Kudos
Alina_S_Intel
Employee
3,393 Views

Thank you for your patience while we were working on this question. We found a bug in DPC++ Compatibility Tool and it took more time to investigate it and find a workaround than we expected.


The issue is with the following line:

--------------------------------------------------------------------------textureD.attach(d, width, height, pitch, channel);

--------------------------------------------------------------------------


You need to replace it with:

--------------------------------------------------------------------------

textureD.attach(d, width * sizeof(int), height, pitch, channel);

--------------------------------------------------------------------------


We expect that the fix for this issue will be included in the next release.



Since I'm not able to attach files, I have to post the code here:

--------------------------------------------------------------------------

#include <CL/sycl.hpp>

#include <dpct/dpct.hpp>


#include <stdio.h>


dpct::image_wrapper<cl::sycl::uint4, 2> textureD;


void kernel2(int *d, int *dOutput, dpct::image_accessor_ext<cl::sycl::uint4, 2> acc, int width, int height){

for(int row = 0; row < height; row++) {

for(int col = 0; col < width; col++){

cl::sycl::uint4 tex_data;

tex_data = acc.read(col, row) ;

dOutput[row * width + col] = tex_data[0] ; 

}

}

}



int main(){

  dpct::device_ext &dev_ct1 = dpct::get_current_device();

  sycl::queue &q_ct1 = dev_ct1.default_queue();


int *h;

int width = 10;

int height = 10;

int size = width * height;


h = (int *)sycl::malloc_host(size* sizeof(int), q_ct1);


int i = 0;


for(int row = 0; row < height; row++) {

for(int col = 0; col < width; col++){

h[row * width + col] = i;

i++;

}

}


int *d;


size_t pitch;


d = (int *)dpct::dpct_malloc(pitch, width * sizeof(int), height);

dpct::dpct_memcpy(d, pitch, h, width * sizeof(int), width * sizeof(int),

 height, dpct::host_to_device);


dpct::image_channel channel = dpct::image_channel::create<int>();

//textureD.attach(d, width, height, pitch, channel);

textureD.attach(d, width * sizeof(int), height, pitch, channel); // <- this line contains a workaround for DPCT bug. Fix for this issues is expected to be fixed in the next release so please, try to remove '* sizeof(int)' when the release will be available.

 

int *hOutput;

hOutput = (int *) sycl::malloc_host(size* sizeof(int), q_ct1);


int *dOutput;


dOutput = (int *)sycl::malloc_device(size* sizeof(int), q_ct1);

q_ct1.submit([&](sycl::handler &cgh) {

auto textureD_acc = textureD.get_access(cgh);


auto textureD_smpl = textureD.get_sampler();

auto out = sycl::stream(1024, 768, cgh);


cgh.single_task<class dpct_single_kernel>([=] {

 kernel2(d, dOutput, 

dpct::image_accessor_ext<cl::sycl::uint4, 2>(textureD_smpl, textureD_acc),

width, height);

});

}).wait();


q_ct1.memcpy(hOutput, dOutput, size * sizeof(int)).wait();


for(int row = 0; row < height; row++){

for(int col = 0; col < width; col++){

printf("%d ", h[row * width + col]);

}

printf("\n");

}


printf("\n");


for(int row = 0; row < height; row++){

for(int col = 0; col < width; col++){

printf("%d ", hOutput[row * width + col]);

}

printf("\n");

}


getchar();

sycl::free(h, q_ct1);

sycl::free(d, q_ct1);

sycl::free(dOutput, q_ct1);

sycl::free(hOutput, q_ct1);


return 0;


}

--------------------------------------------------------------------------


We will no longer respond to this thread. 

If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Thanks,

Alina


0 Kudos
ManuelCostanzo22
Beginner
3,348 Views

@Alina_S_Intel Hi, thank you for reply. Sorry, but you are using 

 

dpct::image_channel channel = dpct::image_channel::create<int>();

 

But 1-Channel doesn't work for me, I have the runtime error:

 

die: piMemImageCreate: unsupported image format layout

libc++abi: terminating
Aborted

 

However, if I change int to int4, it doesn't work either

 

0 Kudos
Alina_S_Intel
Employee
3,346 Views

3) When you are done with errors, you can to go back to DPCT1059:2.

If you try to run the application 'as is', you'll see the following error:

------------------------------------------

./a.out

die: piMemImageCreate: unsupported image format layout


libc++abi: terminating

Aborted (core dumped)

------------------------------------------


It happens because our default LevelZero backend does not support this feature yet. Please, use SYCL_BE=PI_OPENCL (deprecated variable) or SYCL_DEVICE_FILTER=opencl:gpu (the replacement for SYCL_BE and SYCL_DEVICE_TYPE) to run your application.



0 Kudos
ManuelCostanzo22
Beginner
3,333 Views

Thank you so much! You are helping me a lot. 

 

I've just created a new topic with a working CUDA code that I want to migrate, but I can't. Maybe you find the problem there. Thank you so much again.

0 Kudos
ManuelCostanzo22
Beginner
3,294 Views

Sorry, using SYCL_DEVICE_FILTER=opencl:gpu  I get CL_DEVICE_NOT_FOUND. I'm using a NVIDIA GeForce RTX 2070 gpu. Do you know why ?

0 Kudos
Reply