Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
870 Discussions

Intel icx does not scale the code well on Windows

newcfd
Beginner
386 Views

The same code is compiled on Linux and Windows.  The running time with thread numbers is follows.

on Windows
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 73.31324 (min); OpenMP timer: 73.31224 (min); CPU time: 73.31223 (min) 50 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 75.19106 (min); OpenMP timer: 75.18994 (min); CPU time: 75.18993 (min) 50 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 79.41946 (min); OpenMP timer: 79.41827 (min); CPU time: 79.41827 (min) 56 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 96.83948 (min); OpenMP timer: 96.83786 (min); CPU time: 96.83787 (min) 70 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 127.79664 (min); OpenMP timer: 127.79473 (min); CPU time: 127.79473 (min) 100 threads


// on Linux
Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 51.32146 (min); OpenMP timer: 51.32146 (min); CPU time: 2833.64239 (min) 56 threads
Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 32.59633 (min); OpenMP timer: 32.59633 (min); CPU time: 2993.59505 (min) 96 threads

OpenMP settings

    _putenv_s("GOMP_CPU_AFFINITY", ""); 
    _putenv_s("OMP_DYNAMIC", "false");  
    _putenv_s("OMP_MAX_ACTIVE_LEVELS", "1");
_putenv_s("OMP_WAIT_POLICY", "ACTIVE"); _putenv_s("OMP_PROC_BIND", "false");

/MP /GS /Qiopenmp /GA /W3 /Gy /Zc:wchar_t  /Qipo /Zc:forScope /std:c17 /Oi /MD /std:c++20 /Qxhost /Qftz   

 

The Intel CPU is same for both Linux and Windows:  Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz (2 processors) .

 

Why is the code built on Windows not scaled well and much slower than on Linux?

The icx on Windows is the latest.

The icx on Linux:  Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)

0 Kudos
5 Replies
Sravani_K_Intel
Moderator
363 Views

On Linux, CPU time ≈ 55× wall time (expected for 56 threads doing real work). On Windows, CPU time ≈ wall time, meaning the process is effectively running on ~1 thread's worth of work, regardless of how many threads are spawned.

GOMP_* variables are for GCC's libgomp. Intel's runtime (libiomp5) uses KMP_* variables. GOMP_CPU_AFFINITY is silently ignored, leaving thread placement to the OS scheduler. Try setting KMP_AFFINITY as described at https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/thread-affinity-interface.html to see if that helps.

 

0 Kudos
newcfd
Beginner
134 Views

It is a big project and I can not offer you the test code. 

CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz

Can you please tell me which flags are proper for large linear sparse matrix system? You guys must have tested some cases. Does Intel have any benchmark cases? 

0 Kudos
newcfd
Beginner
266 Views

Thank for your reply. Good to know. Try the settings you suggested. No help!

 

Another case on Linux

ICX
Successful completion Step 3434 10.0000 years 21630 iterations; real duration: 267.52013 (min); OpenMP timer: 267.52013 (min); CPU time: 25280.19389 (min) threads 96

GCC
Successful completion Step 3312 10.0000 years 20857 iterations; real duration: 72.17944 (min); OpenMP timer: 72.17944 (min); CPU time: 6761.04930 (min) Threads 96

 

GCC code is three times faster. Which settings or flags can make a numerical code run as close fast as the build with gcc. We do not talk about faster.

0 Kudos
newcfd
Beginner
265 Views

I can not believe intel compiler is so inferior to gcc for Intel CPU. I do not even need any special settings for gcc. Intel compiler has so many settings, but can not make the code faster.

Intel guys: what are the secrets in Intel compiler?

0 Kudos
Sravani_K_Intel
Moderator
166 Views

Could you please share a sample of code that demonstrates this issue so we can help troubleshoot?

0 Kudos
Reply