- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are noticing a 20-25% drop in runtime performance on our benchmarks using ICX (2022 latest) vs ICL ( Classic 19.2). This is a heavily floating point intensive CAE code base running on Intel workstations - we use MKL extensively so the performance regression is surprising. We will not be moving to ICX until we can get at least comparable runtimes.
It seems like the optimizations made by ICL at /O2 are ahead of the CLANG compiler (still). Are there some tips for getting better runtime performance out of CLANG?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found this page very helpful
https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html
In particular the use of a -fiopenmp and setting a processor target (e.g. AVX) to get optimizations
With the changes
- Hack mkl_direct.h to allow use the ICX compiler
- -fiopenmp
- set AVX instruction set optimizations
I was able to get comparable or better than ICL performance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting in Intel Communities.
Could you please provide us with the OS details and sample reproducer code along with the steps to reproducer your issue?
And also, could you please confirm whether you are using the latest oneAPI Toolkit(2022.2)?
Thanks & Regards,
Varsha
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using the latest kit (2022.2). I can't show a simple benchmark code. This is after rebuilding a large C++ code base. It is floating point intensive, multi-threaded ( using OpenMP) and uses MKL for vector and matrix applications. The identical code base compiled with the latest ICL and the latest ICX - we see performance regressions of 0-20% with ICX.
Not surprising to me. ICL was developed over many,many years with one goal in mind ( I have been using it since version 6.0) - high performance computing. CLANG was developed as an open source, flexible,extensible 'standards supporting' C++ compiler framework - HPC was not it's focus.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can echo Andrew's observation: I see a similar performance drop in an OpenMP modeling application in the ICX build on Windows.
I second the vote for not reducing ICC/ICL support until ICX reaches performance parity!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems the problem could be the handling of MKL_DIRECT_CALL. We use small matrices in some places and MKL_DIRECT_CALL is a big win. It appears MKL_DIRECT_CALL is disabled for ICX as __INTEL_COMPILER is not defined.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I found this page very helpful
https://www.intel.com/content/www/us/en/developer/articles/guide/porting-guide-for-icc-users-to-dpcpp-or-icx.html
In particular the use of a -fiopenmp and setting a processor target (e.g. AVX) to get optimizations
With the changes
- Hack mkl_direct.h to allow use the ICX compiler
- -fiopenmp
- set AVX instruction set optimizations
I was able to get comparable or better than ICL performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Glad to know that your issue is resolved. Thanks for sharing the solution with us. If you need any additional information, please post a new question as this thread will no longer be monitored by Intel.
Thanks & Regards,
Varsha
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page