<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Intel icx does not scale the code well on Windows in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1746342#M4744</link>
    <description>&lt;P&gt;&lt;SPAN&gt;It is a big project and I can not offer you the test code.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can you please tell me which flags are proper for large linear sparse matrix system? You guys must have tested some cases. Does Intel have any benchmark cases?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 30 Apr 2026 11:54:53 GMT</pubDate>
    <dc:creator>newcfd</dc:creator>
    <dc:date>2026-04-30T11:54:53Z</dc:date>
    <item>
      <title>Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1743921#M4727</link>
      <description>&lt;P&gt;The same code is compiled on Linux and Windows.&amp;nbsp; The running time with thread numbers is follows.&lt;/P&gt;&lt;P&gt;on Windows&lt;BR /&gt;Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 73.31324 (min); OpenMP timer: 73.31224 (min); CPU time: 73.31223 (min) 50 threads&lt;BR /&gt;Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 75.19106 (min); OpenMP timer: 75.18994 (min); CPU time: 75.18993 (min) 50 threads&lt;BR /&gt;Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 79.41946 (min); OpenMP timer: 79.41827 (min); CPU time: 79.41827 (min) 56 threads&lt;BR /&gt;Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 96.83948 (min); OpenMP timer: 96.83786 (min); CPU time: 96.83787 (min) 70 threads&lt;BR /&gt;Successful completion Step 1892 11.8242 years 14414 iterations; real duration: 127.79664 (min); OpenMP timer: 127.79473 (min); CPU time: 127.79473 (min) 100 threads&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;// on Linux&lt;BR /&gt;Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 51.32146 (min); OpenMP timer: 51.32146 (min); CPU time: 2833.64239 (min) 56 threads&lt;BR /&gt;Successful completion Step 1867 11.8242 years 14059 iterations; real duration: 32.59633 (min); OpenMP timer: 32.59633 (min); CPU time: 2993.59505 (min) 96 threads&lt;/P&gt;&lt;P&gt;OpenMP settings&lt;/P&gt;&lt;PRE&gt;    _putenv_s(&lt;SPAN&gt;"GOMP_CPU_AFFINITY"&lt;/SPAN&gt;, &lt;SPAN&gt;""&lt;/SPAN&gt;); 
    _putenv_s(&lt;SPAN&gt;"OMP_DYNAMIC"&lt;/SPAN&gt;, &lt;SPAN&gt;"false"&lt;/SPAN&gt;);  
    _putenv_s(&lt;SPAN&gt;"OMP_MAX_ACTIVE_LEVELS"&lt;/SPAN&gt;, &lt;SPAN&gt;"1"&lt;/SPAN&gt;);&lt;BR /&gt;
    _putenv_s(&lt;SPAN&gt;"OMP_WAIT_POLICY"&lt;/SPAN&gt;, &lt;SPAN&gt;"ACTIVE"&lt;/SPAN&gt;);
    _putenv_s(&lt;SPAN&gt;"OMP_PROC_BIND"&lt;/SPAN&gt;, &lt;SPAN&gt;"false"&lt;/SPAN&gt;); &lt;/PRE&gt;&lt;P&gt;/MP /GS /Qiopenmp /GA /W3 /Gy /Zc:wchar_t&amp;nbsp; /Qipo /Zc:forScope /std:c17 /Oi /MD /std:c++20&amp;nbsp;/Qxhost /Qftz&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Intel CPU is same for both Linux and Windows:&amp;nbsp;&amp;nbsp;Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz (2 processors)&amp;nbsp;.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Why is the code built on Windows not scaled well and much slower than on Linux?&lt;/P&gt;&lt;P&gt;The icx on Windows is the latest.&lt;/P&gt;&lt;P&gt;The icx on Linux:&amp;nbsp; Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 12:46:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1743921#M4727</guid>
      <dc:creator>newcfd</dc:creator>
      <dc:date>2026-04-10T12:46:17Z</dc:date>
    </item>
    <item>
      <title>Re: Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1744243#M4732</link>
      <description>&lt;P&gt;&lt;SPAN&gt;On Linux, CPU time ≈ 55× wall time (expected for 56 threads doing real work). On Windows, CPU time ≈ wall time, meaning the process is effectively running on ~1 thread's worth of work, regardless of how many threads are spawned.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;GOMP_*&amp;nbsp;variables are for GCC's&amp;nbsp;libgomp. Intel's runtime (libiomp5) uses&amp;nbsp;KMP_*&amp;nbsp;variables. GOMP_CPU_AFFINITY is silently ignored, leaving thread placement to the OS scheduler. Try setting KMP_AFFINITY as described at &lt;A href="https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/thread-affinity-interface.html" target="_blank"&gt;https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2025-2/thread-affinity-interface.html&lt;/A&gt;&amp;nbsp;to see if that helps.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Apr 2026 22:18:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1744243#M4732</guid>
      <dc:creator>Sravani_K_Intel</dc:creator>
      <dc:date>2026-04-13T22:18:07Z</dc:date>
    </item>
    <item>
      <title>Re: Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1745302#M4737</link>
      <description>&lt;P&gt;Thank for your reply. Good to know. Try the settings you suggested. No help!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Another case on Linux&lt;/P&gt;&lt;P&gt;ICX&lt;BR /&gt;Successful completion Step 3434 10.0000 years 21630 iterations; real duration: 267.52013 (min); OpenMP timer: 267.52013 (min); CPU time: 25280.19389 (min) threads 96&lt;/P&gt;&lt;P&gt;GCC&lt;BR /&gt;Successful completion Step 3312 10.0000 years 20857 iterations; real duration: 72.17944 (min); OpenMP timer: 72.17944 (min); CPU time: 6761.04930 (min) Threads 96&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;GCC code is three times faster. Which settings or flags can make a numerical code run as close fast as the build with gcc. We do not talk about faster.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 04:04:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1745302#M4737</guid>
      <dc:creator>newcfd</dc:creator>
      <dc:date>2026-04-22T04:04:15Z</dc:date>
    </item>
    <item>
      <title>Re: Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1745303#M4738</link>
      <description>&lt;P&gt;I can not believe intel compiler is so inferior to gcc for Intel CPU. I do not even need any special settings for gcc. Intel compiler has so many settings, but can not make the code faster.&lt;/P&gt;&lt;P&gt;Intel guys: what are the secrets in Intel compiler?&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2026 04:20:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1745303#M4738</guid>
      <dc:creator>newcfd</dc:creator>
      <dc:date>2026-04-22T04:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1746103#M4743</link>
      <description>&lt;P&gt;Could you please share a sample of code that demonstrates this issue so we can help troubleshoot?&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2026 18:53:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1746103#M4743</guid>
      <dc:creator>Sravani_K_Intel</dc:creator>
      <dc:date>2026-04-28T18:53:18Z</dc:date>
    </item>
    <item>
      <title>Re: Intel icx does not scale the code well on Windows</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1746342#M4744</link>
      <description>&lt;P&gt;&lt;SPAN&gt;It is a big project and I can not offer you the test code.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;CPU: Intel(R) Xeon(R) Gold 6348 CPU @ 2.60GHz 2.59 GHz&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Can you please tell me which flags are proper for large linear sparse matrix system? You guys must have tested some cases. Does Intel have any benchmark cases?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Apr 2026 11:54:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Intel-icx-does-not-scale-the-code-well-on-Windows/m-p/1746342#M4744</guid>
      <dc:creator>newcfd</dc:creator>
      <dc:date>2026-04-30T11:54:53Z</dc:date>
    </item>
  </channel>
</rss>

