Solved: Perfoermance regressions ICX vs ICL ( classic)

AndrewC · ‎07-13-2022

We are noticing a 20-25% drop in runtime performance on our benchmarks using ICX (2022 latest) vs ICL ( Classic 19.2). This is a heavily floating point intensive CAE code base running on Intel workstations - we use MKL extensively so the performance regression is surprising. We will not be moving to ICX until we can get at least comparable runtimes.

It seems like the optimizations made by ICL at /O2 are ahead of the CLANG compiler (still). Are there some tips for getting better runtime performance out of CLANG?

AndrewC · ‎07-19-2022

I found this page very helpful

https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html

In particular the use of a -fiopenmp and setting a processor target (e.g. AVX) to get optimizations
With the changes

Hack mkl_direct.h to allow use the ICX compiler
-fiopenmp
set AVX instruction set optimizations

I was able to get comparable or better than ICL performance.

View solution in original post

VarshaS_Intel · ‎07-14-2022

Hi,

Thanks for posting in Intel Communities.

Could you please provide us with the OS details and sample reproducer code along with the steps to reproducer your issue?

And also, could you please confirm whether you are using the latest oneAPI Toolkit(2022.2)?

Thanks & Regards,

Varsha

AndrewC · ‎07-15-2022

I am using the latest kit (2022.2). I can't show a simple benchmark code. This is after rebuilding a large C++ code base. It is floating point intensive, multi-threaded ( using OpenMP) and uses MKL for vector and matrix applications. The identical code base compiled with the latest ICL and the latest ICX - we see performance regressions of 0-20% with ICX.
Not surprising to me. ICL was developed over many,many years with one goal in mind ( I have been using it since version 6.0) - high performance computing. CLANG was developed as an open source, flexible,extensible 'standards supporting' C++ compiler framework - HPC was not it's focus.

Mentzer__Stuart · ‎07-14-2022

I can echo Andrew's observation: I see a similar performance drop in an OpenMP modeling application in the ICX build on Windows.

I second the vote for not reducing ICC/ICL support until ICX reaches performance parity!

AndrewC · ‎07-18-2022

It seems the problem could be the handling of MKL_DIRECT_CALL. We use small matrices in some places and MKL_DIRECT_CALL is a big win. It appears MKL_DIRECT_CALL is disabled for ICX as __INTEL_COMPILER is not defined.

AndrewC · ‎07-19-2022

I found this page very helpful

https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html

In particular the use of a -fiopenmp and setting a processor target (e.g. AVX) to get optimizations
With the changes

Hack mkl_direct.h to allow use the ICX compiler
-fiopenmp
set AVX instruction set optimizations

I was able to get comparable or better than ICL performance.

VarshaS_Intel · ‎07-20-2022

Hi,

Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.

Thanks & Regards,

Varsha