Serious performance and incompatibility issues with 1.5

David_E_7 · ‎10-06-2011

Thank you continuing development on the OpenCL SDK and working toward a better debugging experience.

We have been using SDK 1.1 with good success. Unfortunately however with 1.5 we are encountering several "show-stopper" difficulties.

1) Most notably, across approximately 10 kernels we have written, 7 of them are slower by factors of 300% to 800%, making performance uncompetitive with other implementations (such as AMD's) and not usable. Suggestions made in other threads that the workgroup size be increased are not easily accomplished.

2) Second, we find some incompatabilites in the compiler where CL source code which was formerly accepted by 1.1 (and also AMD's compiler) is now rejected. For example, the line shown below (appearing in the body of a function) is rejected by the compiler with the error shown.

enum{ MWC64XVEC4_A = 4294883355U };

:286:6: error: expected an identifier or template-id after '::'

:286:6: error: expected identifier or '('

Your prompt attention to these issues would be highly appreciated.

Boaz_O_Intel · ‎10-06-2011

Hi,

Would it be possible to expose reproducable kernels that we will use inorder to analyze the performance regressions? What workgroup sizes are you using (Is it 1?)

We will also take a look at thethe compilation failure which you have encountered. Is it enough to place the enum inorder to reproduce the failure?

Thanks,
Boaz

David_E_7 · ‎10-07-2011

Boaz,

Thank you for the response.

1) Regarding the compilation issue, I've provided below a one-line kernel which demonstrates the problem. This code compiles with the prior version of the Intel SDK and several other tested compilers, but not with 1.5.

2) Regarding kernels to demonstrate performance regression, I've included one below. As an example, when run over a matrix of floats (#define T float) with 100,000 rows and 1877 columns using a 1-dimensional range (one for each column), the runtime has regressed from 629ms to 2624ms. Granted, this kernel is not highly optimized, but the issue it performs well on all tested OpenCL implementations including AMD (CPU), AMD (Radeon 5800) and Intel 1.3 but not on Intel 1.5.

3) Additionally, the issue mentioned in a recent post about no the Intel OpenCL code longer running on Windows Server 2008 is also problematic for us.

Thank you,

David

__kernel void test()

{

enum{ MWC64XVEC4_A = 4294883355U };

}

__kernelvoidmeanZeroCol(__globalT*baseData,intnumRows,intnumCols)
{
unsignedintcolIndex=get_global_id(0);
 
//Computecurrentaveragevalueforthiscolumn
TcolAdj=0;
for(introwNum=0;rowNum

David_E_7 · ‎10-10-2011

Boaz,

Some more positive information to report....

1) It turns out the only compiler error we ultimately encountered was the note noted above, which we easily worked around.

2) The number of kernels running slowly turned out to be relative few, and we successfully tweaked the
workgroup settings in most or possibly all cases to recover good performance. I believe our issues are similar to the performance regressions noted earlier in the forum whch you have apparently also recognized.

3) Thus only one serious issue remains for us - the incompatability with Windows Server 2008 which you also note will be fixed in some future release.

Thank you,
David