Intel® oneAPI Data Parallel C++
Support for Intel® oneAPI DPC++ Compiler, Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and GDB*
415 Discussions

Problems related to oneapi programs with ESIMD extension

tanzl_ustc
Beginner
552 Views

Hello, Intel engineers.

I am a senior student and am learning to use oneapi with esimd extension to do my graduation project.

I have encountered some very confusing problems. I was wondering if they are related to the compiler of oneAPI. I will list them one by one below.

First, I want to clarify my code running environment:

The OS version:Linux user-X299-UD4-Pro 5.4.48-xe-max #1 SMP Sun Dec 12 17:04:51 CST 2021 x86_ 64 x86_ 64 x86_ 64 GNU/Linux


IDE version: I am using Visual Studio Code version 1.63.2


Oneapi toolkit version:l_ BaseKit_ p_ 2022.1.1.119


In the attachment, I attached a code file called "gemm_genx_dpcpp_finalver. cpp". This code is used to realize matrix multiplication by parallel computing. If the result of the program is correct, it will output "all is done". If the result is incorrect, it will output the first incorrect result and break.

tanzl_ustc_0-1644229988171.png


At the same time, as you can see, there are a series of macro definitions in lines 21-29 of the code to specify the matrix size and the number of cycles during the operation.

For the code itself, each thread multiplies a small matrix, and the small matrix's size is defined by III JJJ KKK. II JJ KK refer to the size of each dimension of a single work group, and I J K refer to the number of work groups in the nd-range.

I used dpcpp gemm_genx_ dpcpp_ finalver.cpp  command to compile it.

However, the result of the code is puzzling:

1. If line 109 is deleted, that is, the statement "sycl:: ext:: oneapi:: experimental:: printf (fmt_1)" is deleted, the code that could normally run will have an error. But this is only a very simple output statement, which should have no impact on our code.

tanzl_ustc_1-1644230209933.png

 


2. This code can be run no matter KKK JJJ III KK JJ II takes any value, but it cannot be executed normally when the value of K J I is large (such as 1 7 7). At present, what we can confirm is that the calculation result of a single thread is correct, and the calculation result of a single work group is also correct. It is normal when the number of work groups is small in nd-range. However, once the value of K J I becomes larger, that is, the number of work groups in nd-range increases, errors will occur.

However, in order to facilitate debugging, I rewritted the code intactly into the CPU version, and only changed the part involving ESIMD into std::vector. It is very confusing that this code in CPU version can run perfectly. I also attached this code (file name: gemm_genx_cpu.cpp) in zip file.You can also use dpcpp to compile it.

So I think gemm_ genx_ dpcpp_ finalver.cpp  should have no logical problems. I was wondering if the problem lies in the compiler. Could you help me check the causes of its errors?

Besides, Could you help me see if there is any update feedback of the problem I raised last time, the link is https://community.intel.com/t5/Intel-oneAPI-Data-Parallel-C/How-can-I-debug-kernel-code-of-DPC-with-...

thank you

0 Kudos
5 Replies
HemanthCH_Intel
Moderator
522 Views

Hi,

 

Thank you for posting in Intel Communities.

 

>>" First, I want to clarify my code running environment:"

Could you please mention the Linux distribution you are using?

Please refer to the below link for system requirements of oneAPI Base Toolkit:

https://www.intel.com/content/www/us/en/developer/articles/system-requirements/intel-oneapi-base-too...

 

We are able to reproduce your issue at our end. We looking into the issue and will get back to you soon.

 

>>"Could you help me see if there is any update feedback of the problem I raised last time"

we have reported this issue to the concerned development team. They are looking into your issue.

 

Thanks & Regards,

Hemanth.

 

tanzl_ustc
Beginner
517 Views

Hi,

 

The linux distribution I am using is: Ubuntu 20.04.1

tanzl_ustc_0-1644334335923.png

Looking forward to your reply~

 

Thanks & regards

tanzl_ustc

HemanthCH_Intel
Moderator
477 Views

Hi,

 

Your environment supports Intel oneAPI Base Toolkit.

 

We have observed in your code that:

 

1) Instead of using macro JJ, you have used constant value 8 for block_load(line number 94,106 in the screenshot). Does it work even if we change the number of workitems in nd-range?

2) For select function(line number 95,102,106,107 in screenshot)we can see that you have used select<8,1>. Is <8,1> fixed or can we use any macro here? If yes, what should be the macro?

HemanthCH_Intel_0-1644921291933.png

 

 

since the change in I, J, K values reflect the number of work items in nd-range, try to change the code to utilize the max workgroup supported by your system. To see the max workgroup size supported by your system use the below command and refer to the below screenshot.

 

clinfo

HemanthCH_Intel_1-1644921392356.png

 

 

 

Thanks & Regards,

Hemanth

 

HemanthCH_Intel
Moderator
454 Views

Hi,


We haven't heard back from you. Could you please provide an update on your issue?


Thanks & Regards,

Hemanth.


HemanthCH_Intel
Moderator
433 Views

Hi,


We haven't heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.


Thanks & Regards,

Hemanth.


Reply