Community
cancel
Showing results for 
Search instead for 
Did you mean: 
ManuelCostanzo22
Beginner
501 Views

Intel oneAPI DPCT can't convert from CUDA 1-Channel texture to DPCT 4-Channel image_wrapper

Hi everyone !!

I have the following problem:

I have a CUDA code that uses texture, for example:

 

 

texture<unsigned char, 2, cudaReadModeElementType> text;
unsigned char a = tex2D(text, cx + lx, cy);

 

 

So when I use DPCT, I have this output:

 

/*
DPCT1059:12: SYCL only supports 4-channel image format. Adjust the code.
*/
dpct::image_wrapper<unsigned char, 2> text;

 

SO, I changed the declaration with this:

 

 

dpct::image_wrapper<sycl::uchar4, 2> text_wrp;

And after launch kernel:
dpct::image_accessor_ext<sycl::uchar4, 2> text

 

 

My problem is that I don't know how to create an equivalent when reading. I have this:

 

sycl::uchar4 a = text.read(cx + lx, cy);

 

But I don't know how to get the same unsigned char from CUDA in my DPC code. Do I have to modify the indexes on the read ? I'm really lost.

Thank you so much in advance !!

0 Kudos
11 Replies
NoorjahanSk_Intel
Moderator
464 Views

Hi,

Thanks for reaching out to us.

Could you please provide us a sample reproducer of both CUDA, DPCPP so that we can work on it from our end.

And also please do let us know your environment details

OS & version

compiler version.

 

Thanks & Regards

Noorjahan.

 

ManuelCostanzo22
Beginner
445 Views

Hi @NoorjahanSk_Intel  Thank you so much for reply.

 

Here I attach two files: the original CUDA code and the original DPCT output. Could you help me to finish the migration ? 

 

Thank you again !

 

 

ManuelCostanzo22
Beginner
428 Views

@NoorjahanSk_Intel  Here I have a code that compiles, but I get a wrong output (is not the same as CUDA output).

 

 

 

NoorjahanSk_Intel
Moderator
346 Views

Hi,

We are working on it. We will get back to you soon.


Thanks & Regards

Noorjahan


ManuelCostanzo22
Beginner
338 Views

Hi !

Thank you so much ! I will be waiting .
Alina_S_Intel
Employee
269 Views

Thanks for your patience!


Does the solution in DPC++ Developer Guide work for you?

https://software.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-us...


The difference is that DevGuide recommends using single_task instead of parallel ND range.


ManuelCostanzo22
Beginner
264 Views

Hi @Alina_S_Intel !

 

No, but maybe I'm doing something wrong. I added an example with the native CUDA code and the oneAPI code migrated. Can you modify the oneAPI code to get the same result as CUDA, please ? 

 

Thank you so much.

ManuelCostanzo22
Beginner
262 Views

@Alina_S_Intel Here you have the example attached again ! And if you like, here you have de CUDA code that works. I want to migrate this code.

 

 

 

 

ManuelCostanzo22
Beginner
197 Views

@NoorjahanSk_Intel @Alina_S_Intel  Hi, sorry, any update on this? Thank you.

Alina_S_Intel
Employee
146 Views

Thank you for your patience. Unfortunately, I am not able to help you with the full migration because the CUDA code is incorrect.


1) The CUDA code gives 9 starting the second row:


0 1 2 3 4 5 6 7 8 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9

9 9 9 9 9 9 9 9 9 9


while initial array is :


0 1 2 3 4 5 6 7 8 9

10 11 12 13 14 15 16 17 18 19

20 21 22 23 24 25 26 27 28 29

30 31 32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47 48 49

50 51 52 53 54 55 56 57 58 59

60 61 62 63 64 65 66 67 68 69

70 71 72 73 74 75 76 77 78 79

80 81 82 83 84 85 86 87 88 89

90 91 92 93 94 95 96 97 98 99


It should return the same array.


2) Speaking about migrated code. You need to change :


sycl::malloc_host<int><int>(size, q_ct1);


to


(int *)sycl::malloc_host(size* sizeof(int), q_ct1);


for each SYCL malloc function. It helps you to avoid the following errors:

------------------------------------------------------------------------------------

$ dpcpp test.dp.cpp

test.dp.cpp:56:39: error: expected '(' for function-style cast or type construction

    h = sycl::malloc_host<int><int>(size, q_ct1);

                  ~~~^

test.dp.cpp:56:41: warning: expression result unused [-Wunused-value]

    h = sycl::malloc_host<int><int>(size, q_ct1);

                    ^~~~

test.dp.cpp:84:20: error: no matching function for call to 'dpct_malloc'

    d = (int *)dpct::dpct_malloc<int>(pitch, width * sizeof(int), height);

          ^~~~~~~~~~~~~~~~~~~~~~

/nfs/pdx/disks/cts2/tools/oneapi/2021.3.0/dpcpp-ct/2021.3.0/include/dpct/memory.hpp:576:21: note: candidate function template not viable: requires at most 2 arguments, but 3 were provided

static inline void *dpct_malloc(T num_bytes,

          ^

test.dp.cpp:106:45: error: expected '(' for function-style cast or type construction

    hOutput = sycl::malloc_host<int><int>(size, q_ct1);

                     ~~~^

test.dp.cpp:106:47: warning: expression result unused [-Wunused-value]

    hOutput = sycl::malloc_host<int><int>(size, q_ct1);

                       ^~~~

test.dp.cpp:110:47: error: expected '(' for function-style cast or type construction

    dOutput = sycl::malloc_device<int><int>(size, q_ct1);

                      ~~~^

test.dp.cpp:110:49: warning: expression result unused [-Wunused-value]

    dOutput = sycl::malloc_device<int><int>(size, q_ct1);

------------------------------------------------------------------------------------


3) When you are done with errors, you can to go back to DPCT1059:2.

If you try to run the application 'as is', you'll see the following error:

------------------------------------------

./a.out

die: piMemImageCreate: unsupported image format layout


libc++abi: terminating

Aborted (core dumped)

------------------------------------------


It happens because our default LevelZero backend does not support this feature yet. Please, use SYCL_BE=PI_OPENCL (deprecated variable) or SYCL_DEVICE_FILTER=opencl:gpu (the replacement for SYCL_BE and SYCL_DEVICE_TYPE) to run your application.


ManuelCostanzo22
Beginner
117 Views

Thank you !

 

Yes, I know that the CUDA code returns all 9, but I expected to get the same result in oneAPI, even if the CUDA code doesn't make sense. Was an example code.

 

So I can't do anything to solve the 4-Channel problem? We cant modify the 1-Channel to transform it to 4-Channel and getting the same CUDA result (with the same 9)?

Reply