- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The same code is compiled on Linux and Windows. The running time with thread numbers is follows.
on Windows
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 73.31324 (min); OpenMP timer: 73.31224 (min); CPU time: 73.31223 (min) 50 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 75.19106 (min); OpenMP timer: 75.18994 (min); CPU time: 75.18993 (min) 50 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 79.41946 (min); OpenMP timer: 79.41827 (min); CPU time: 79.41827 (min) 56 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 96.83948 (min); OpenMP timer: 96.83786 (min); CPU time: 96.83787 (min) 70 threads
Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 127.79664 (min); OpenMP timer: 127.79473 (min); CPU time: 127.79473 (min) 100 threads
// on Linux
Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 51.32146 (min); OpenMP timer: 51.32146 (min); CPU time: 2833.64239 (min) 56 threads
Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 32.59633 (min); OpenMP timer: 32.59633 (min); CPU time: 2993.59505 (min) 96 threads
OpenMP settings
_putenv_s("GOMP_CPU_AFFINITY", ""); _putenv_s("OMP_DYNAMIC", "false"); _putenv_s("OMP_MAX_ACTIVE_LEVELS", "1");
_putenv_s("OMP_WAIT_POLICY", "ACTIVE"); _putenv_s("OMP_PROC_BIND", "false");
/MP /GS /Qiopenmp /GA /W3 /Gy /Zc:wchar_t /Qipo /Zc:forScope /std:c17 /Oi /MD /std:c++20 /Qxhost /Qftz
The Intel CPU is same for both Linux and Windows: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz (2 processors) .
Why is the code built on Windows not scaled well and much slower than on Linux?
The icx on Windows is the latest.
The icx on Linux: Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Linux, CPU time ≈ 55× wall time (expected for 56 threads doing real work). On Windows, CPU time ≈ wall time, meaning the process is effectively running on ~1 thread's worth of work, regardless of how many threads are spawned.
GOMP_* variables are for GCC's libgomp. Intel's runtime (libiomp5) uses KMP_* variables. GOMP_CPU_AFFINITY is silently ignored, leaving thread placement to the OS scheduler. Try setting KMP_AFFINITY as described at https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/thread-affinity-interface.html to see if that helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is a big project and I can not offer you the test code.
CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz
Can you please tell me which flags are proper for large linear sparse matrix system? You guys must have tested some cases. Does Intel have any benchmark cases?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank for your reply. Good to know. Try the settings you suggested. No help!
Another case on Linux
ICX
Successful completion Step 3434 10.0000 years 21630 iterations; real duration: 267.52013 (min); OpenMP timer: 267.52013 (min); CPU time: 25280.19389 (min) threads 96
GCC
Successful completion Step 3312 10.0000 years 20857 iterations; real duration: 72.17944 (min); OpenMP timer: 72.17944 (min); CPU time: 6761.04930 (min) Threads 96
GCC code is three times faster. Which settings or flags can make a numerical code run as close fast as the build with gcc. We do not talk about faster.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can not believe intel compiler is so inferior to gcc for Intel CPU. I do not even need any special settings for gcc. Intel compiler has so many settings, but can not make the code faster.
Intel guys: what are the secrets in Intel compiler?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please share a sample of code that demonstrates this issue so we can help troubleshoot?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page