<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can't get optimized linpack to run on all threads on the sy in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847894#M6374</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/404199"&gt;Ying Hu (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Hello Steve, &lt;BR /&gt;&lt;BR /&gt;The variable KMP_AFFINITY can help to bind the thread to CPU core. &lt;BR /&gt;&lt;BR /&gt;How about set the variable OMP_NUM_THREAD&lt;BR /&gt;for example, export OMP_NUM_THREADS=16&lt;BR /&gt;? &lt;BR /&gt;&lt;BR /&gt;As i knew, by default, the currentMKL version will spawn only 1/2 threads on hyperthread enabling system because&lt;BR /&gt;to enable HT threading may not benefit the performance,some of time, it will hurttheperformance. &lt;BR /&gt;&lt;BR /&gt;Here is some explanation in MKL userguide for your reference&lt;BR /&gt;The use of Hyper-Threading Technology: &lt;BR /&gt;Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. &lt;BR /&gt;&lt;BR /&gt;However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may&lt;BR /&gt;obtain higher performance by disabling HT Technology. MKLby default generates threadaccording tothe number of physical core. So I guess, that is why you only see only8 threadsrun.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Ying&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Ying,&lt;BR /&gt;&lt;BR /&gt;Thank you for your reply. I had tried OMP_NUM_THREADS variable and could never achieve more than 8 threads. What made me think I could get more was the output that states 16 CPUs / 16 Threads. I think that needs to be fixed. I accept now that the program will only run 1 thread per core, especially if the performance would be worse with more threads.&lt;BR /&gt;&lt;BR /&gt;Best Regards, &lt;BR /&gt;Steve</description>
    <pubDate>Wed, 29 Jul 2009 17:37:03 GMT</pubDate>
    <dc:creator>panic4az</dc:creator>
    <dc:date>2009-07-29T17:37:03Z</dc:date>
    <item>
      <title>Can't get optimized linpack to run on all threads on the system.</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847892#M6372</link>
      <description>I'm trying to run the optimized linpack on a hyperthreaded enabled system. No matter what different options for KMP_AFFINITY that I choose, only 1/2 of the threads run. Here is the latest run:&lt;BR /&gt;&lt;BR /&gt;[kirk] (uid) linpack&amp;gt; numactl --hardware&lt;BR /&gt;available: 2 nodes (0-1)&lt;BR /&gt;node 0 cpus: 0 2 4 6 8 10 12 14&lt;BR /&gt;node 0 size: 12279 MB&lt;BR /&gt;node 0 free: 11031 MB&lt;BR /&gt;node 1 cpus: 1 3 5 7 9 11 13 15&lt;BR /&gt;node 1 size: 12288 MB&lt;BR /&gt;node 1 free: 11907 MB&lt;BR /&gt;node distances:&lt;BR /&gt;node 0 1&lt;BR /&gt;0: 10 20&lt;BR /&gt;1: 20 10&lt;BR /&gt;[kirk] (uid) linpack&amp;gt; ./runme_xeon64&lt;BR /&gt;This is a SAMPLE run script. Change it to reflect the correct number&lt;BR /&gt;of CPUs/threads, problem input files, etc..&lt;BR /&gt;Thu Jul 23 17:04:41 MST 2009&lt;BR /&gt;OMP: Warning #190: Bad message catalog "libiomp5.cat": Version "2" found, version "1" expected.&lt;BR /&gt;OMP: Hint: Check NLSPATH environment variable, its value is "/opt/intel/Compiler/11.0/083/mkl/lib/64/locale/%l_%t/%N:/opt/intel/mkl/10.1.2.024/lib/em64t/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/lib/intel64/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/ipp/em64t/lib/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/mkl/lib/em64t/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/idb/intel64/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/lib/intel64/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/ipp/em64t/lib/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/mkl/lib/em64t/locale/%l_%t/%N:/opt/intel/Compiler/11.0/083/idb/intel64/locale/%l_%t/%N".&lt;BR /&gt;OMP: Info #3: Default messages will be used.&lt;BR /&gt;OMP: Info #157: KMP_AFFINITY: Affinity capable, using global cpuid instr info&lt;BR /&gt;OMP: Info #162: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}&lt;BR /&gt;OMP: Info #164: KMP_AFFINITY: 16 available OS procs&lt;BR /&gt;OMP: Info #165: KMP_AFFINITY: Uniform topology&lt;BR /&gt;OMP: Info #167: KMP_AFFINITY: 2 packages x 4 cores/pkg x 2 threads/core (8 total cores)&lt;BR /&gt;OMP: Info #168: KMP_AFFINITY: OS proc to physical thread map ([] =&amp;gt; level not in map):&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 8 maps to package 0 core 0 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 10 maps to package 0 core 1 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 12 maps to package 0 core 2 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 14 maps to package 0 core 3 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 1 maps to package 1 core 0 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 9 maps to package 1 core 0 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 3 maps to package 1 core 1 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 11 maps to package 1 core 1 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 5 maps to package 1 core 2 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 13 maps to package 1 core 2 thread 1&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 thread 0&lt;BR /&gt;OMP: Info #178: KMP_AFFINITY: OS proc 15 maps to package 1 core 3 thread 1&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 4 bound to OS proc set {4}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 5 bound to OS proc set {5}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6}&lt;BR /&gt;OMP: Info #155: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}&lt;BR /&gt;Done: Thu Jul 23 17:12:03 MST 2009&lt;BR /&gt;[kirk] (uid) linpack&amp;gt; cat lin_xeon64.txt&lt;BR /&gt;Thu Jul 23 17:04:41 MST 2009&lt;BR /&gt;Intel LINPACK data&lt;BR /&gt;&lt;BR /&gt;Current date/time: Thu Jul 23 17:04:41 2009&lt;BR /&gt;&lt;BR /&gt;CPU frequency: 2.666 GHz&lt;BR /&gt;Number of CPUs: 16&lt;BR /&gt;Number of threads: 16&lt;BR /&gt;&lt;BR /&gt;Parameters are set to:&lt;BR /&gt;&lt;BR /&gt;Number of tests : 1&lt;BR /&gt;Number of equations to solve (problem size) : 35000&lt;BR /&gt;Leading dimension of array : 45000&lt;BR /&gt;Number of trials to run : 1&lt;BR /&gt;Data alignment value (in Kbytes) : 1&lt;BR /&gt;&lt;BR /&gt;Maximum memory requested that can be used = 12600901024, at the size = 35000&lt;BR /&gt;&lt;BR /&gt;============= Timing linear equation system solver =================&lt;BR /&gt;&lt;BR /&gt;Size LDA Align. Time(s) GFlops Residual Residual(norm)&lt;BR /&gt;35000 45000 1 366.288 78.0419 1.073967e-09 3.117562e-02&lt;BR /&gt;&lt;BR /&gt;Performance Summary (GFlops)&lt;BR /&gt;&lt;BR /&gt;Size LDA Align. Average Maximal&lt;BR /&gt;35000 45000 1 78.0419 78.0419&lt;BR /&gt;&lt;BR /&gt;End of tests&lt;BR /&gt;&lt;BR /&gt;Thu Jul 23 17:12:03 MST 2009&lt;BR /&gt;[kirk] (uid) linpack&amp;gt;&lt;BR /&gt;cat runme_xeon64&lt;BR /&gt;#!/bin/bash&lt;BR /&gt;#&lt;BR /&gt;export KMP_AFFINITY=nowarnings,verbose,granularity=fine,proclist=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],explicit&lt;BR /&gt;&lt;BR /&gt;echo "This is a SAMPLE run script. Change it to reflect the correct number"&lt;BR /&gt;echo "of CPUs/threads, problem input files, etc.."&lt;BR /&gt;&lt;BR /&gt;date&lt;BR /&gt;date &amp;gt; lin_xeon64.txt&lt;BR /&gt;./xlinpack_xeon64 lininput_xeon64 &amp;gt;&amp;gt; lin_xeon64.txt&lt;BR /&gt;date &amp;gt;&amp;gt; lin_xeon64.txt&lt;BR /&gt;echo -n "Done: "&lt;BR /&gt;date&lt;BR /&gt;&lt;BR /&gt;This is the latest, but I tried granularity=fine,compact and several other options. I'm expecting that I should be able to get all 16 logical processors to have a thread running. Even the output states 16 cpus, 16 threads, but only 8 run. Could the system be configured wrong? Any help would be appreciated.&lt;BR /&gt;&lt;BR /&gt;Regards, Steve.&lt;BR /&gt;</description>
      <pubDate>Fri, 24 Jul 2009 01:06:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847892#M6372</guid>
      <dc:creator>panic4az</dc:creator>
      <dc:date>2009-07-24T01:06:22Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get optimized linpack to run on all threads on the sy</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847893#M6373</link>
      <description>&lt;BR /&gt;Hello Steve, &lt;BR /&gt;&lt;BR /&gt;The variable KMP_AFFINITY can help to bind the thread to CPU core. &lt;BR /&gt;&lt;BR /&gt;How about set the variable OMP_NUM_THREAD&lt;BR /&gt;for example, export OMP_NUM_THREADS=16&lt;BR /&gt;? &lt;BR /&gt;&lt;BR /&gt;As i knew, by default, the currentMKL version will spawn only 1/2 threads on hyperthread enabling system because&lt;BR /&gt;to enable HT threading may not benefit the performance,some of time, it will hurttheperformance. &lt;BR /&gt;&lt;BR /&gt;Here is some explanation in MKL userguide for your reference&lt;BR /&gt;The use of Hyper-Threading Technology: &lt;BR /&gt;Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. &lt;BR /&gt;&lt;BR /&gt;However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may&lt;BR /&gt;obtain higher performance by disabling HT Technology. MKLby default generates threadaccording tothe number of physical core. So I guess, that is why you only see only8 threadsrun.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Ying</description>
      <pubDate>Tue, 28 Jul 2009 08:13:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847893#M6373</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2009-07-28T08:13:16Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get optimized linpack to run on all threads on the sy</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847894#M6374</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/404199"&gt;Ying Hu (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Hello Steve, &lt;BR /&gt;&lt;BR /&gt;The variable KMP_AFFINITY can help to bind the thread to CPU core. &lt;BR /&gt;&lt;BR /&gt;How about set the variable OMP_NUM_THREAD&lt;BR /&gt;for example, export OMP_NUM_THREADS=16&lt;BR /&gt;? &lt;BR /&gt;&lt;BR /&gt;As i knew, by default, the currentMKL version will spawn only 1/2 threads on hyperthread enabling system because&lt;BR /&gt;to enable HT threading may not benefit the performance,some of time, it will hurttheperformance. &lt;BR /&gt;&lt;BR /&gt;Here is some explanation in MKL userguide for your reference&lt;BR /&gt;The use of Hyper-Threading Technology: &lt;BR /&gt;Hyper-Threading Technology (HT Technology) is especially effective when each thread is performing different types of operations and when there are under-utilized resources on the processor. &lt;BR /&gt;&lt;BR /&gt;However, Intel MKL fits neither of these criteria because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may&lt;BR /&gt;obtain higher performance by disabling HT Technology. MKLby default generates threadaccording tothe number of physical core. So I guess, that is why you only see only8 threadsrun.&lt;BR /&gt;Best Regards,&lt;BR /&gt;Ying&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Ying,&lt;BR /&gt;&lt;BR /&gt;Thank you for your reply. I had tried OMP_NUM_THREADS variable and could never achieve more than 8 threads. What made me think I could get more was the output that states 16 CPUs / 16 Threads. I think that needs to be fixed. I accept now that the program will only run 1 thread per core, especially if the performance would be worse with more threads.&lt;BR /&gt;&lt;BR /&gt;Best Regards, &lt;BR /&gt;Steve</description>
      <pubDate>Wed, 29 Jul 2009 17:37:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847894#M6374</guid>
      <dc:creator>panic4az</dc:creator>
      <dc:date>2009-07-29T17:37:03Z</dc:date>
    </item>
    <item>
      <title>Re: Can't get optimized linpack to run on all threads on the sy</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847895#M6375</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;BR /&gt;Steve, &lt;BR /&gt;&lt;BR /&gt;Try to export both MKL_DYNAMIC=FALSE and OMP_NUM_THREADS=16 enviroment. &lt;BR /&gt;&lt;BR /&gt;Currently MKL detect number of physical cores and limit the threading to the physical core number to avoid overthreading. (It is only half of the logical processors in Hyper-Threading). &lt;BR /&gt;To change such behavoir, use the following two enviroment vars:&lt;BR /&gt;&lt;BR /&gt;export MKL_DYNAMIC=FALSE&lt;BR /&gt;export MKL_NUM_THREADS=16 ( or OMP_NUM_THREADS=16)&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Chao &lt;BR /&gt;&lt;/DIV&gt;
&lt;BR /&gt;</description>
      <pubDate>Mon, 03 Aug 2009 07:46:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847895#M6375</guid>
      <dc:creator>Chao_Y_Intel</dc:creator>
      <dc:date>2009-08-03T07:46:43Z</dc:date>
    </item>
    <item>
      <title>Always run Linpack without</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847896#M6376</link>
      <description>&lt;P&gt;Always run Linpack without hyperthreading to utilize all the threads. &amp;nbsp;Linpack is not meant to be run with hyperthreading on.&lt;/P&gt;</description>
      <pubDate>Thu, 09 Mar 2017 19:33:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Can-t-get-optimized-linpack-to-run-on-all-threads-on-the-system/m-p/847896#M6376</guid>
      <dc:creator>Adrian_C_</dc:creator>
      <dc:date>2017-03-09T19:33:24Z</dc:date>
    </item>
  </channel>
</rss>

