Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JiniusT
Beginner
358 Views

oneapi gpu quick sort performance issue.

Hi all:

I tried the sycl GPU sort code as from url:

https://techdecoded.intel.io/resources/gpu-quicksort/#gs.bi6fkf

 

build under oneapi 2020.2 release,   but the result shows oneapi dpc++ compiler 's performance have a huge gap compare to opencl 1.2.

 

testing hardware was :  i7 11700K, with 512x512 array.

opencl 1.2 take 4-5 ms sort this array. but oneapi sycl take 6-7 ms.  that's almost 40% overhead...

 

I highly doubt that oneapi dpc++ compiler have some performance issue , b/c different software stack for GPU  should NOT have such big perf gap.

 

the source code was just in above link and it's an intel official samples.

 

Anybody can help explain why and how to make sycl hav equal perf as opencl1.2?

 

Thanks ahead.

 

 

 

0 Kudos
8 Replies
VidyalathaB_Intel
Moderator
329 Views

Hi,


Thanks for reaching out to us.

>>build under oneapi 2020.2 release

Could you please try the latest version of oneapi (2021.3.0) DPCPP compiler and check if the issue stills persist?

Below is the link to download the latest version of oneAPI Basetool kit (you can get DPCPP compiler from the base toolkit):

https://software.intel.com/content/www/us/en/develop/tools/oneapi/base-toolkit/download.html


Regards,

Vidya.





jimdempseyatthecove
Black Belt
310 Views

Please state if this is the 1st/only sort time or if this is 2nd (and later) sort time(s). Note, the 1st time contains the JIT, resource allocation (and GPU memory allocation).

Jim Dempsey

JiniusT
Beginner
290 Views

the original intel demo for gpu sort already well considered the oneapi jit and opencl precompile for kernel.  both the compile time was NOT included into the perf bench. 

VidyalathaB_Intel
Moderator
230 Views

Hi,


We are looking into this issue. We will get back to you soon.


Regards,

Vidya.


JiniusT
Beginner
202 Views

Finally I have some time to test the issue with latest oneapi 2021.3 toolkit.

The performance result is the same.   dpcpp still very slow compare to opencl1.2.

After dig deeper into the issue,  I feel it's the memory copy issue:

 

1.) in opencl1.2 intel i915 driver , it's will implements the zero memcpy between cpu and igpu. so perf no penalty.

2.).in dpcpp stack,  the sycl syntax of buffer won't trigger the zero memcpy buffer some how, and  even the opencl and sycl syntax (functionality ) almost equivalent,  but dpcpp with sycl stack just suffer the pain from memory move between cpu and gpu. I don't know the real reason without the deep knowledge yet.

3.) if switch from sycl buffer into dpcpp 's USM,  performance was much better , but still can't match opencl1.2 i915 stack yet.

 

Please help this issue, b/c it's so critial for oneapi stack,  if perf have huge gap between opencl1.2 and oneapi,  developer lost motivation to migrate to this new api stack and SORTING is so important for almost everything.

 

BTW, why I seeking a solution here for a onepai based gpu sorting , simple b/c it's not available in oneapi.  There are no cuda based Thrust like framework for oneapi yet, CUB migration still a dream.  and  TBB won't support soring on GPU.  

Really appreciate If anybody can show me some light on how to sorting with oneapi on GPU. ( maybe there was a decent solution already somewhere.).

 

VidyalathaB_Intel
Moderator
162 Views

Hi,

 

Could you please provide us a sample reproducer for both opencl & sycl (USM & buffer models) versions and steps to reproduce the issue that you have followed to obtain the results so that we can work on it from our end?

Also please provide the following details

1.output of:

           sycl-ls

           clinfo

2.  Hardware details 

3. Are you using OpenCL runtime or level zero as backend ?

You can also use sorting algorithms from oneDPL. Please refer to the below link for more details.

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top/ex...

 

Regards,

Vidya.

 

VidyalathaB_Intel
Moderator
96 Views

Hi,

Reminder:

Could you please provide the above-mentioned details so that we can work on it from our end?

Regards,

Vidya.



VidyalathaB_Intel
Moderator
53 Views

Hi,


As we have not heard back from you, we are closing this case for now. Please post a new question if you need any additional information from Intel.


Regards,

Vidya.


Reply