- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Intel engineers.
I am a senior student and am learning to use oneapi with esimd extension to do my graduation project.
I have encountered some very confusing problems. I was wondering if they are related to the compiler of oneAPI. I will list them one by one below.
First, I want to clarify my code running environment:
The OS version:Linux user-X299-UD4-Pro 5.4.48-xe-max #1 SMP Sun Dec 12 17:04:51 CST 2021 x86_ 64 x86_ 64 x86_ 64 GNU/Linux
IDE version: I am using Visual Studio Code version 1.63.2
Oneapi toolkit version:l_ BaseKit_ p_ 2022.1.1.119
In the attachment, I attached a code file called "gemm_genx_dpcpp_finalver. cpp". This code is used to realize matrix multiplication by parallel computing. If the result of the program is correct, it will output "all is done". If the result is incorrect, it will output the first incorrect result and break.
At the same time, as you can see, there are a series of macro definitions in lines 21-29 of the code to specify the matrix size and the number of cycles during the operation.
For the code itself, each thread multiplies a small matrix, and the small matrix's size is defined by III JJJ KKK. II JJ KK refer to the size of each dimension of a single work group, and I J K refer to the number of work groups in the nd-range.
I used dpcpp gemm_genx_ dpcpp_ finalver.cpp command to compile it.
However, the result of the code is puzzling:
1. If line 109 is deleted, that is, the statement "sycl:: ext:: oneapi:: experimental:: printf (fmt_1)" is deleted, the code that could normally run will have an error. But this is only a very simple output statement, which should have no impact on our code.
2. This code can be run no matter KKK JJJ III KK JJ II takes any value, but it cannot be executed normally when the value of K J I is large (such as 1 7 7). At present, what we can confirm is that the calculation result of a single thread is correct, and the calculation result of a single work group is also correct. It is normal when the number of work groups is small in nd-range. However, once the value of K J I becomes larger, that is, the number of work groups in nd-range increases, errors will occur.
However, in order to facilitate debugging, I rewritted the code intactly into the CPU version, and only changed the part involving ESIMD into std::vector. It is very confusing that this code in CPU version can run perfectly. I also attached this code (file name: gemm_genx_cpu.cpp) in zip file.You can also use dpcpp to compile it.
So I think gemm_ genx_ dpcpp_ finalver.cpp should have no logical problems. I was wondering if the problem lies in the compiler. Could you help me check the causes of its errors?
Besides, Could you help me see if there is any update feedback of the problem I raised last time, the link is https://community.intel.com/t5/Intel-oneAPI-Data-Parallel-C/How-can-I-debug-kernel-code-of-DPC-with-ESIMD-extension/m-p/1352124#M2069
thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thank you for posting in Intel Communities.
>>" First, I want to clarify my code running environment:"
Could you please mention the Linux distribution you are using?
Please refer to the below link for system requirements of oneAPI Base Toolkit:
We are able to reproduce your issue at our end. We looking into the issue and will get back to you soon.
>>"Could you help me see if there is any update feedback of the problem I raised last time"
we have reported this issue to the concerned development team. They are looking into your issue.
Thanks & Regards,
Hemanth.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The linux distribution I am using is: Ubuntu 20.04.1
Looking forward to your reply~
Thanks & regards
tanzl_ustc
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Your environment supports Intel oneAPI Base Toolkit.
We have observed in your code that:
1) Instead of using macro JJ, you have used constant value 8 for block_load(line number 94,106 in the screenshot). Does it work even if we change the number of workitems in nd-range?
2) For select function(line number 95,102,106,107 in screenshot)we can see that you have used select<8,1>. Is <8,1> fixed or can we use any macro here? If yes, what should be the macro?
since the change in I, J, K values reflect the number of work items in nd-range, try to change the code to utilize the max workgroup supported by your system. To see the max workgroup size supported by your system use the below command and refer to the below screenshot.
clinfo
Thanks & Regards,
Hemanth
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. Could you please provide an update on your issue?
Thanks & Regards,
Hemanth.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
We haven't heard back from you. This thread will no longer be monitored by Intel. If you need further assistance, please post a new question.
Thanks & Regards,
Hemanth.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page