<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Significant Overhead if threaded MKL is called from OpenMP parallel region in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032245#M20206</link>
    <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;my aim is to diagonalize quadratic matrices with different sizes dxd in parallel. To this end I wrote a for &amp;nbsp;loop. In each iteration the aligned memory (dependent on the dimension d) is allocated with mkl_malloc(). The matrix is filled and afterwards dsyev is called to determine the optimal workspace size. Then I allocate the (aligned) workspace needed with mkl_malloc(), call dsyev once again to diagonalize the matrices and deallocate the memory that was used for the workspace and to store the matrix (using mkl_free()).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Since the diagonalizations are independent of each other I want to run these in parallel by using OpenMP. Therefore I used the OpenMP pragma: #pragma omp parallel for with proper scheduling. The memory for each diagonalization is not accessed by different threads.&lt;/P&gt;

&lt;P&gt;I run the code with OMP_NESTED=true, MKL_DYNAMIC=false, OMP_DYNAMIC=false. If I set OMP_NUM_THREADS=1 and MKL_NUM_THREADS=4,8,16 no significant overhead ( %sys of linux top command) is observed. If I set OMP_NUM_THREADS=4 and MKL_NUM_THREADS=1 i.e. call the sequential version of MKL dsyev also no significant overhead is observed and roughly the same performance is ached like in the opposite case where MKL_NUM_THREADS=4 and OMP_NUM_THREADS=1.&lt;/P&gt;

&lt;P&gt;BUT, if I now want to exploit my OpenMP parallelization with for example OMP_NUM_THREADS=2,4 and MKL_NUM_THREADS=4 I get a huge slow down. Up to 30% of the processors capacity are used for system calls (kernel) (the more OpenMP threads I use, the greater is the slow down). I tried different scheduling techniques to ensure load balancing as best as I can. If I change the scheduling, the problem i.e. overhead still persists.&lt;/P&gt;

&lt;P&gt;Are the frequent calls to mkl_malloc() and ml_free() from different threads the reason for this ? If yes, I could allocate the maximum memory needed as one big block before entering the parallel region. Unfortunately the MKL routines have their own memory management to tune their performance. Is it likely that the internal memory management of threaded MKL dsyev can cause also such a large overhead ? Are there any other reasons for this slow down ?&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix Kaiser&lt;/P&gt;</description>
    <pubDate>Sat, 18 Apr 2015 09:42:55 GMT</pubDate>
    <dc:creator>Felix__K_</dc:creator>
    <dc:date>2015-04-18T09:42:55Z</dc:date>
    <item>
      <title>Significant Overhead if threaded MKL is called from OpenMP parallel region</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032245#M20206</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;my aim is to diagonalize quadratic matrices with different sizes dxd in parallel. To this end I wrote a for &amp;nbsp;loop. In each iteration the aligned memory (dependent on the dimension d) is allocated with mkl_malloc(). The matrix is filled and afterwards dsyev is called to determine the optimal workspace size. Then I allocate the (aligned) workspace needed with mkl_malloc(), call dsyev once again to diagonalize the matrices and deallocate the memory that was used for the workspace and to store the matrix (using mkl_free()).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Since the diagonalizations are independent of each other I want to run these in parallel by using OpenMP. Therefore I used the OpenMP pragma: #pragma omp parallel for with proper scheduling. The memory for each diagonalization is not accessed by different threads.&lt;/P&gt;

&lt;P&gt;I run the code with OMP_NESTED=true, MKL_DYNAMIC=false, OMP_DYNAMIC=false. If I set OMP_NUM_THREADS=1 and MKL_NUM_THREADS=4,8,16 no significant overhead ( %sys of linux top command) is observed. If I set OMP_NUM_THREADS=4 and MKL_NUM_THREADS=1 i.e. call the sequential version of MKL dsyev also no significant overhead is observed and roughly the same performance is ached like in the opposite case where MKL_NUM_THREADS=4 and OMP_NUM_THREADS=1.&lt;/P&gt;

&lt;P&gt;BUT, if I now want to exploit my OpenMP parallelization with for example OMP_NUM_THREADS=2,4 and MKL_NUM_THREADS=4 I get a huge slow down. Up to 30% of the processors capacity are used for system calls (kernel) (the more OpenMP threads I use, the greater is the slow down). I tried different scheduling techniques to ensure load balancing as best as I can. If I change the scheduling, the problem i.e. overhead still persists.&lt;/P&gt;

&lt;P&gt;Are the frequent calls to mkl_malloc() and ml_free() from different threads the reason for this ? If yes, I could allocate the maximum memory needed as one big block before entering the parallel region. Unfortunately the MKL routines have their own memory management to tune their performance. Is it likely that the internal memory management of threaded MKL dsyev can cause also such a large overhead ? Are there any other reasons for this slow down ?&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix Kaiser&lt;/P&gt;</description>
      <pubDate>Sat, 18 Apr 2015 09:42:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032245#M20206</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-18T09:42:55Z</dc:date>
    </item>
    <item>
      <title>**** UPDATE ****</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032246#M20207</link>
      <description>&lt;P&gt;**** UPDATE ****&lt;/P&gt;

&lt;P&gt;I've moved all calls to mkl_malloc() and mkl_free() outside the parallel region and set the MKL_DISABLE_FAST_MM environment variable. This did not help. Setting OMP_NUM_THREADS=2 and increasing the number of MKL threads to MKL_NUM_THREADS=2,4,8 the overhead increases too.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix Kaiser&lt;/P&gt;</description>
      <pubDate>Sat, 18 Apr 2015 12:53:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032246#M20207</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-18T12:53:15Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032247#M20208</link>
      <description>&lt;P&gt;Hi Felix,&lt;/P&gt;

&lt;P&gt;What OpenMP compiler&amp;nbsp;are you using?&amp;nbsp;&amp;nbsp;How many of&amp;nbsp; your physical processors, when you Setting OMP_NUM_THREADS=2 and increasing the number of MKL threads to MKL_NUM_THREADS=2,4, 8 etc. do they oversubscribe the Physical processors?&lt;/P&gt;

&lt;P&gt;As I understand, you have your own OpenMP threads on "for" loop and you have MKL function, which may call MKL threading internally,you hope the nested thread may help, right? But&amp;nbsp;it should be a precondition: you have enough physical cores. Otherwise,&amp;nbsp;the nested threading&amp;nbsp;doesn't help. &amp;nbsp;&lt;/P&gt;

&lt;P&gt;I digged some&amp;nbsp;discussion about the MKL&amp;nbsp;nested threading issues for your reference.&lt;/P&gt;

&lt;P&gt;for example, In MKL user guide and &amp;nbsp;&lt;A href="https://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library/"&gt;&lt;U&gt;&lt;FONT color="#0066cc"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library/" target="_blank"&gt;https://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-library/&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;It&amp;nbsp;was recommended, &amp;nbsp;Intel MKL should run on a single thread when called from a threaded region of an application to avoid over-subscription of system resources.&lt;/P&gt;

&lt;P&gt;And in most of case, MKL threading is not needed.&amp;nbsp;&lt;SPAN&gt;&lt;STRONG&gt;When&lt;/STRONG&gt;&lt;/SPAN&gt;&amp;nbsp;you believe the threads of your application utilize all physical cores of the system, or MKL threading will lead to oversubscription&lt;SUP&gt;&lt;FONT size="2"&gt;3&lt;/FONT&gt;&lt;/SUP&gt;&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;Only&amp;nbsp;in some case,&amp;nbsp;to &amp;nbsp;Enable MKL threading &lt;/STRONG&gt;- use when you are sure that there are enough resources (physical cores) for MKL threading in addition to your own threads. Choose N carefully&amp;nbsp;if you'd like your own threads,.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications"&gt;&lt;U&gt;&lt;FONT color="#0066cc"&gt;&lt;/FONT&gt;&lt;/U&gt;&lt;/A&gt;&lt;A href="https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications" target="_blank"&gt;https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;and some discussions also in&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications"&gt;https://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application"&gt;https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2015 06:06:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032247#M20208</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-04-20T06:06:21Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032248#M20209</link>
      <description>&lt;P&gt;Hello Ying,&lt;/P&gt;

&lt;P&gt;I'm using Intel C++ Composer (if this is the answer to your question). Typing icpc in terminal results in: icpc (ICC) 11.1 20100414.I fully agree with you and I've read all those literature you posted already and yes, you understand me right. When using OMP_NUM_THREADS=2 and MKL_NUM_THREADS=2,4,8 I would need at most 16 physical cores. I ensured this in my tests by explicitly setting KMP_AFFINITY='proclist=[{list of ID's for 16 physical cores}],explicit'. For my problem sizes the threaded MKL routines dgemm and dsyev with 8 threads gives best performance. Hence I would like to call dsyev (using 8 cores) from a parallel for loop (which is parallelized using my own OpenMP threads i.e. 2,4 or even more) to get - if proper load balancing is ensured - a huge speed up. However, it turned out that even if I ensure that enough physical cores can be used and set the recommended environment variables, the code slows down significantly which I don't understand.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2015 08:01:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032248#M20209</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-20T08:01:48Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032249#M20210</link>
      <description>&lt;P&gt;Hi Felix,&lt;/P&gt;

&lt;P&gt;is it &amp;nbsp;the same for dgemm? or just for dsyev?&lt;/P&gt;

&lt;P&gt;When you run the application, do you have some tool like vtune to show how many openmp threads in system?&lt;/P&gt;

&lt;P&gt;There is one thing comes&amp;nbsp;to my mind, (&amp;nbsp;which may be related). dsyev is LApack functions, will call blas function.So the nested parallel internally may be some issues.&lt;/P&gt;

&lt;P&gt;How about try&lt;/P&gt;

&lt;P&gt;OMP_NESTED=False and keep OMP_NUM_THREADS=2 and MKL_NUM_THREADS=2,4,8?&lt;/P&gt;

&lt;P&gt;or&amp;nbsp;export MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=2,4,8 etc.&lt;/P&gt;

&lt;P&gt;and let me know if you get any&amp;nbsp;result.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2015 10:02:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032249#M20210</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-04-20T10:02:36Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032250#M20211</link>
      <description>&lt;P&gt;Hello Ying,&lt;/P&gt;

&lt;P&gt;till know I checked only dsyev. I got the following results:&lt;/P&gt;

&lt;P&gt;Setting 1: OMP_NESTED=false, OMP_NUM_THREADS=2, MKL_NUM_THREADS=2,4,8: No overhead observed. Only 2 open threads ( the OpenMP ones).&lt;/P&gt;

&lt;P&gt;Setting 2: OMP_NESTED=true, MKL_DOMAIN_ALL=1, MKL_DOMAIN_BLAS=2,4,8, OMP_NUM_THREADS=2: No overhead observed. But uses only the 2 OpenMP threads (again). Why does the setting of MKL_DOMAIN_BLAS=2,4,8 has no impact ?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;Setting 3: OMP_NESTED=true, OMP_NUM_THREADS=1, MKL_NUM_THREADS=2,3,4: runs with 4,6,16 threads. No overhead introduced. It seems that each MKL thread creates MKL_NUM_THREADS itself once again. The actual number of threads that is created by internal nested regions of MKL functions can be restricted with OMP_THREAD_LIMIT.&lt;/P&gt;

&lt;P&gt;Finally I found out that the problem was the internal nested parallelism of MKL functions. By having 16 cores that can be used, and setting OMP_NUM_THREADS=2, OMP_NESTED=true and e.g. MKL_NUM_THREADS=4, one would get 2*4*4 = 32 threads due to the nested parallelism inside the dsyev. Hence two threads run on each core, leading to a overhead. Even worse, setting OMP_NUM_THREADS=4 one would get 4*4*4 = 64, and 4 threads would run on each core leading to a significant overhead.&lt;/P&gt;

&lt;P&gt;One possible workaround would be to set OMP_THREAD_LIMIT=16. The other would be to restrict the active levels of nested parallelism to 2 by calling omp_set_max_active_levels(2) if MKL dsyev is called from a simple (not nested) OpenMP region of my program. Setting&amp;nbsp;omp_set_max_active_levels(2) and using OMP_NUM_THREADS=4, MKL_NUM_THREADS=4, OMP_NESTED=true gives the desired result. 16 Threads run on 16 physical cores. No overhead (beside scheduling) is generated.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;My final question would be: If I would just enable OMP_NESTED=true in a serial program. How do the MKL functions that don't use nested parallelism and those who do perform ? Is there a significant performance difference ?&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Mon, 20 Apr 2015 15:47:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032250#M20211</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-20T15:47:25Z</dc:date>
    </item>
    <item>
      <title>Hello Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032251#M20212</link>
      <description>&lt;P&gt;Hello Felix,&lt;/P&gt;

&lt;P&gt;MKL BLAS and LAPACK runs better when deeper nested threading is disabled (MKL spawns only one level of threads),&lt;BR /&gt;
	So the recommended approach is to use omp_set_max_active_levels(2) for MKL to spawn only the first level of nested threads.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;BR /&gt;
	Alexander&lt;/P&gt;</description>
      <pubDate>Tue, 21 Apr 2015 05:45:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032251#M20212</guid>
      <dc:creator>Alexander_K_Intel3</dc:creator>
      <dc:date>2015-04-21T05:45:55Z</dc:date>
    </item>
    <item>
      <title>Hello Alexander, Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032252#M20213</link>
      <description>&lt;P&gt;Hello Alexander, Hello Ying,&lt;/P&gt;

&lt;P&gt;thanks a lot for your help !. Finally I got the desired parallelization. I wonder why the recommended settings&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications" target="_blank"&gt;https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;do not work out. Setting MKL_DYNAMIC=true only my own OpenMP Threads are used and MKL runs in sequential mode. Setting MKL_DYNAMIC=false the nested threading of dsyev creates again more threads than physical Cores available. By setting OMP_DYNAMIC=true this cannot be prevented. Did I miss something ?&lt;/P&gt;

&lt;P&gt;I finally did some (very rough) benchmarking on a test problem. Running dsyev with 8 cores without my own OpenMP parallelization &amp;nbsp;I get the best performance. Now I expected to observe a speed up if I run the test calling dsyev with 8 cores from 2 (own) OpenMP threads. Unfortunately this is slower. What could be the reason for &amp;nbsp;that ?&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;In numerical expensive steps the number of diagonalizations that have to be performed is ca. 127 and the dimensions of the real symmetric matrices that need to be diagonalized range from 1 to ca. 5500.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Felix Kaiser&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2015 13:00:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032252#M20213</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-22T13:00:54Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032253#M20214</link>
      <description>&lt;P&gt;Hi Felix,&lt;/P&gt;

&lt;P&gt;Thank you a lot&amp;nbsp;for the exploration.&amp;nbsp; Right, we need remove or modify&amp;nbsp;the control of OMP_DYNAMIC on that paper.&lt;/P&gt;

&lt;P&gt;In parallel region&amp;nbsp;, only MKL_DYNAMIC control of the MKL's threads.&lt;/P&gt;

&lt;P&gt;Here is description in MKL user guide:&lt;/P&gt;

&lt;P&gt;The MKL_DYNAMIC environment variable enables Intel MKL to dynamically change the number of threads.&lt;BR /&gt;
	The default value of MKL_DYNAMIC is TRUE, &lt;STRONG&gt;regardless of OMP_DYNAMIC&lt;/STRONG&gt;, whose default value may be FALSE.&lt;BR /&gt;
	When MKL_DYNAMIC is TRUE, Intel MKL tries to use what it considers the best number of threads, up to the&lt;BR /&gt;
	maximum number you specify.&lt;/P&gt;

&lt;P&gt;So when MKL_DYNAMIC=true,&amp;nbsp; MKL is able to detect if it is in parallel region and change&amp;nbsp; the thread.&amp;nbsp; So &amp;nbsp;only&amp;nbsp;your own OpenMP Threads are used and MKL runs in sequential mode&amp;nbsp;because&amp;nbsp;MKL found itself in Intel OpenMP&amp;nbsp;parallel region, so it choose to&amp;nbsp;run with&amp;nbsp;sequential to avoid the oversubscription of system resource.&amp;nbsp; and&amp;nbsp;&amp;nbsp; OMP_DYNAMIC=true or false can't control the MKL threads.&lt;/P&gt;

&lt;P&gt;Regarding the performance of&amp;nbsp; 2 X &amp;nbsp;"Running dsyev with 8 cores without my own OpenMP parallelization" and run the test calling dsyev with 8 cores from 2 (own) OpenMP threads.&amp;nbsp;&amp;nbsp;&amp;nbsp;How &amp;nbsp;the performance look like&amp;nbsp;?&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;and&amp;nbsp;&amp;nbsp;&amp;nbsp;could you&amp;nbsp;tell me the&amp;nbsp;processor type?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 24 Apr 2015 01:44:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032253#M20214</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-04-24T01:44:28Z</dc:date>
    </item>
    <item>
      <title>I modify the article.  Thanks</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032254#M20215</link>
      <description>&lt;P&gt;I modify the article.&amp;nbsp; Thanks for the questions.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications"&gt;https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 27 Apr 2015 01:25:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032254#M20215</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-04-27T01:25:59Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032255#M20216</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hello Ying,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;sorry for the delay, but the benchmarks needed some time. So finally here are my results. As mentioned before I diagonalized 127 real symmetric matrices whose dimensions range from 1 to 5718. Hence my loop has 127 iterations. The time (estimated up to seconds) needed to diagonalize all these matrices with MKL's DSYEV were estimated for different setups (OMP_NUM_THREADS x MKL_NUM_THREADS | scheduling):&lt;/P&gt;

&lt;P&gt;(1x8 | none): 8min 24sec, (2x8| dynamic,1): 1h 26min 15sec, (2x8| static,1): 1h 22min 26sec , (2x8| guided): 1h 25min 52sec&lt;/P&gt;

&lt;P&gt;(1x4| none): 12min 25sec, (2x4| dynamic, 1): 8min 6sec (2x4| static, 1): 8min 5sec, (4x4| guided): 22min 30sec&lt;/P&gt;

&lt;P&gt;So if I use 8 cores within the MKL function, I get execution times that are larger by a factor of 10 (!) if run with two (own) OpenMP threads instead of using only a single thread i.e. calling DSYEV from a sequential region. This significant decrease of performance seems to be independent of the scheduling type.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If I compare the performance for DSYEV using 2x4 threads, I get a performance that is comparable to the 1x8 case. This speed up depends also not on the scheduling in a significant way. Finally if I want to gain even more speed up by calling 4-core DSYEV from 4 (own) OpenMP threads I once again get a huge slow down. This time by a factor of 2 compared to the sequential run (1x4).&lt;/P&gt;

&lt;P&gt;The results were checked. All runs give the same results (max absolute difference is 10^{-13}). These calculations were performed on 16 Intel Xeon X5550 CPU's @ 2.67GHz.&lt;/P&gt;

&lt;P&gt;Any idea what could be the reason for this ?&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Mon, 27 Apr 2015 18:06:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032255#M20216</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-27T18:06:21Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032256#M20217</link>
      <description>&lt;P&gt;Hi Felix,&lt;/P&gt;

&lt;P&gt;Further check,&amp;nbsp; when you mentioned : 16 Intel Xeon X5550 CPU's @ 2.67GHz.&lt;/P&gt;

&lt;P&gt;and&amp;nbsp; on Xeon X550 CPU &amp;nbsp; have 4 core (hardware) and 8 threads (logical threads&amp;nbsp; or Hyper Threading (HT)&amp;nbsp;is on ).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;do you&amp;nbsp;mean&amp;nbsp; 4&amp;nbsp; CPUs&amp;nbsp; or 2 CPU on the system,&amp;nbsp; is the HT on or off?&lt;/P&gt;

&lt;P&gt;Seeing &amp;nbsp;from your test result, the best is&amp;nbsp; (2x4| static, 1): 8min 5sec, (4x4| guided,&amp;nbsp;&amp;nbsp; I guess&amp;nbsp;, you may have &amp;nbsp;2 CPU, which have total 8 hardware core.&amp;nbsp; and HT is on.&amp;nbsp; So only 8 thread are&amp;nbsp;valid &amp;nbsp;and if 16 (2x8) , then all threads will battle for hardware resource,&amp;nbsp; it impacts performance badly.&lt;/P&gt;

&lt;P&gt;Regarding&amp;nbsp; the result of the HyperThreading and overhead,&amp;nbsp; &amp;nbsp; &lt;EM&gt;Intel® Math Kernel Library 11.3 User's Guide provide some explanation: &lt;/EM&gt;&lt;/P&gt;

&lt;H1 class="topictitle1"&gt;Using Intel® &lt;SPAN style="color: rgb(34, 34, 34); background-color: rgb(255, 255, 0);"&gt;Hyper&lt;/SPAN&gt;-Threading Technology&lt;/H1&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P id="GUID-6C02DF26-DEA6-4F93-AD3C-A5FB3FBAA7E0"&gt;Intel® &lt;SPAN style="color: rgb(34, 34, 34); background-color: rgb(255, 255, 0);"&gt;Hyper&lt;/SPAN&gt;-Threading Technology (Intel® HT Technology) is especially effective when each thread performs &lt;STRONG&gt;different types of operations&lt;/STRONG&gt; and when there are under-utilized resources on the processor. &lt;STRONG&gt;However, Intel MKL fits neither of these criteria &lt;/STRONG&gt;because the threaded portions of the library execute at high efficiencies using most of the available resources and perform identical operations on each thread. You may obtain higher performance by &lt;STRONG&gt;disabling Intel HT Technology&lt;/STRONG&gt;.&lt;/P&gt;

&lt;P id="GUID-C9D12023-A62A-4E98-8731-BD7B4ABD321B"&gt;Intel Optimized LINPACK Benchmark is threaded to effectively use multiple processors. So, in multi-processor systems, best performance will be obtained with the Intel® &lt;SPAN style="color: rgb(34, 34, 34); background-color: rgb(255, 255, 0);"&gt;Hyper&lt;/SPAN&gt;-Threading Technology turned off, which ensures that the operating system assigns threads to physical processors only.&lt;/P&gt;

&lt;DIV id="GUID-EA857851-7702-43A2-8B01-CD247517A556"&gt;
	&lt;P id="P_CF_12857569597140"&gt;Best Regards,&lt;/P&gt;

	&lt;P&gt;Ying&lt;/P&gt;

	&lt;DIV class="tablenoborder"&gt;
		&lt;TABLE id="d21e18" border="1" rules="all" frame="border" cellspacing="0" cellpadding="4" summary=""&gt;
			&lt;THEAD align="left"&gt;
				&lt;TR&gt;
					&lt;TH width="100%" align="left" class="cellrowborder" id="d83315e46" valign="top"&gt;
						&lt;P id="d21e29"&gt;&lt;A name="d21e29"&gt;{C}&lt;!-- --&gt;&lt;/A&gt;Optimization Notice&lt;/P&gt;
					&lt;/TH&gt;
				&lt;/TR&gt;
			&lt;/THEAD&gt;
			&lt;TBODY&gt;
				&lt;TR&gt;
					&lt;TD width="100%" class="bgcolor(#f5f5f5)" valign="top" bgcolor="#f5f5f5" headers="d83315e46 "&gt;
						&lt;P&gt;Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.&lt;/P&gt;

						&lt;P&gt;Notice revision #20110804&lt;/P&gt;
					&lt;/TD&gt;
				&lt;/TR&gt;
			&lt;/TBODY&gt;
		&lt;/TABLE&gt;
	&lt;/DIV&gt;

	&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Tue, 28 Apr 2015 03:31:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032256#M20217</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-04-28T03:31:00Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032257#M20218</link>
      <description>&lt;P&gt;Hello Ying,&lt;/P&gt;

&lt;P&gt;sorry for the error in my information. The code runs on 4 Intel Xeon X5550 CPU's with 4 cores each. /proc/cpuinfo tells me that each Intel Xeon processor has 4 cores and 4 siblings i.e. hyper threading is off.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Tue, 28 Apr 2015 07:45:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032257#M20218</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-04-28T07:45:09Z</dc:date>
    </item>
    <item>
      <title>Hi Felix, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032258#M20219</link>
      <description>&lt;P&gt;Hi Felix,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks for hardware information. &amp;nbsp;When you run the test, could you please try&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;gt; export KMP_AFFINITY=verbose&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;gt; your &amp;nbsp;exe. &amp;nbsp; for example &amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;4x4&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and copy the output. or provide your exe and we may try here?&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2015 01:50:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032258#M20219</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-05-05T01:50:12Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032259#M20220</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hello Ying,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;using &amp;gt; export KMP_AFFINITY='proclist=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],explicit,verbose' with (MKL_NUM_THREAS=4 and OMP_NUM_THREADS=4, OMP_NESTED=true, MKL_DYNAMIC=false and OMP_ACTIVE_LEVELS=2) and typing &amp;gt; ./exe afterwards results in the following output:&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Warning #2: Cannot open message catalog "libiomp5.cat":&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: System error #2: No such file or directory&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Hint: Check NLSPATH environment variable, its value is "/opt/intel/mkl/10.2.3.029/lib/em64t/locale/%l_%t/%N:/opt/intel/Compiler/11.1/072/lib/intel64/locale/%l_%t/%N:/opt/intel/Compiler/11.1/072/ipp/em64t/lib/locale/%l_%t/%N:/opt/intel/Compiler/11.1/072/mkl/lib/em64t/locale/%l_%t/%N:/opt/intel/Compiler/11.1/072/idb/intel64/locale/%l_%t/%N".&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Hint: Check LANG environment variable, its value is "de_DE.UTF-8".&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #3: Default messages are used.&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #149: KMP_AFFINITY: Affinity capable, using global cpuid instr info&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #156: KMP_AFFINITY: 128 available OS procs&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #157: KMP_AFFINITY: Uniform topology&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #159: KMP_AFFINITY: 32 packages x 4 cores/pkg x 1 threads/core (128 total cores)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #160: KMP_AFFINITY: OS proc to physical thread map ([] =&amp;gt; level not in map):&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 1 maps to package 0 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 2 maps to package 0 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 3 maps to package 0 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 4 maps to package 1 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 5 maps to package 1 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 6 maps to package 1 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 7 maps to package 1 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 8 maps to package 2 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 9 maps to package 2 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 10 maps to package 2 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 11 maps to package 2 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 12 maps to package 3 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 13 maps to package 3 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 14 maps to package 3 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 15 maps to package 3 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 16 maps to package 4 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 17 maps to package 4 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 18 maps to package 4 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 19 maps to package 4 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 20 maps to package 5 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 21 maps to package 5 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 22 maps to package 5 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 23 maps to package 5 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 24 maps to package 6 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 25 maps to package 6 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 26 maps to package 6 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 27 maps to package 6 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 28 maps to package 7 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 29 maps to package 7 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 30 maps to package 7 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 31 maps to package 7 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 32 maps to package 8 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 33 maps to package 8 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 34 maps to package 8 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 35 maps to package 8 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 36 maps to package 9 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 37 maps to package 9 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 38 maps to package 9 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 39 maps to package 9 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 40 maps to package 10 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 41 maps to package 10 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 42 maps to package 10 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 43 maps to package 10 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 44 maps to package 11 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 45 maps to package 11 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 46 maps to package 11 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 47 maps to package 11 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 48 maps to package 12 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 49 maps to package 12 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 50 maps to package 12 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 51 maps to package 12 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 52 maps to package 13 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 53 maps to package 13 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 54 maps to package 13 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 55 maps to package 13 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 56 maps to package 14 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 57 maps to package 14 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 58 maps to package 14 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 59 maps to package 14 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 60 maps to package 15 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 61 maps to package 15 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 62 maps to package 15 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 63 maps to package 15 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 64 maps to package 16 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 65 maps to package 16 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 66 maps to package 16 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 67 maps to package 16 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 68 maps to package 17 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 69 maps to package 17 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 70 maps to package 17 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 71 maps to package 17 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 72 maps to package 18 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 73 maps to package 18 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 74 maps to package 18 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 75 maps to package 18 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 76 maps to package 19 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 77 maps to package 19 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 78 maps to package 19 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 79 maps to package 19 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 80 maps to package 20 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 81 maps to package 20 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 82 maps to package 20 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 83 maps to package 20 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 84 maps to package 21 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 85 maps to package 21 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 86 maps to package 21 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 87 maps to package 21 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 88 maps to package 22 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 89 maps to package 22 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 90 maps to package 22 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 91 maps to package 22 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 92 maps to package 23 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 93 maps to package 23 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 94 maps to package 23 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 95 maps to package 23 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 96 maps to package 24 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 97 maps to package 24 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 98 maps to package 24 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 99 maps to package 24 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 100 maps to package 25 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 101 maps to package 25 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 102 maps to package 25 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 103 maps to package 25 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 104 maps to package 26 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 105 maps to package 26 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 106 maps to package 26 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 107 maps to package 26 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 108 maps to package 27 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 109 maps to package 27 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 110 maps to package 27 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 111 maps to package 27 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 112 maps to package 28 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 113 maps to package 28 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 114 maps to package 28 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 115 maps to package 28 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 116 maps to package 29 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 117 maps to package 29 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 118 maps to package 29 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 119 maps to package 29 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 120 maps to package 30 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 121 maps to package 30 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 122 maps to package 30 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 123 maps to package 30 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 124 maps to package 31 core 0 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 125 maps to package 31 core 1 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 126 maps to package 31 core 2 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #168: KMP_AFFINITY: OS proc 127 maps to package 31 core 3 [thread 0]&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 3 bound to OS proc set {3}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 2 bound to OS proc set {2}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 1 bound to OS proc set {1}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 9 bound to OS proc set {9}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 8 bound to OS proc set {8}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 4 bound to OS proc set {4}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 6 bound to OS proc set {6}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 7 bound to OS proc set {7}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 5 bound to OS proc set {5}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 11 bound to OS proc set {11}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 12 bound to OS proc set {12}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 10 bound to OS proc set {10}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 14 bound to OS proc set {14}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 13 bound to OS proc set {13}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;OMP: Info #147: KMP_AFFINITY: Internal thread 15 bound to OS proc set {15}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;I could provide the source code. But the diagonalization is just a part of a larger code. So I would like to create a minimal example and then post its source code. Unfortunately I don't know how the symmetric matrices have to be set up in order to be well suited for benchmark tests. Are there any advantageous properties (besides, that they have to be symmetric and diagonalizable) I should be aware of ?&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;Best regards,&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 12px; line-height: normal; font-family: Monaco;"&gt;Felix&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2015 06:48:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032259#M20220</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-05-05T06:48:40Z</dc:date>
    </item>
    <item>
      <title>Hi Felix, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032260#M20221</link>
      <description>&lt;P&gt;Hi Felix,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks a lot. &amp;nbsp;So you have really huge Xeon cpu cluster than we have here :). &amp;nbsp; So the problem seems not be in HT. &amp;nbsp; When you run the&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;export KMP_AFFINITY='proclist=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15],explicit,verbose' with (MKL_NUM_THREAS=2 and OMP_NUM_THREADS=8, OMP_NESTED=true, MKL_DYNAMIC=false and OMP_ACTIVE_LEVELS=2) and typing&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;(2x8| dynamic,1): 1h 26min 15sec, (2x8| static,1): 1h 22min 26sec , (2x8| guided): 1h 25min 52sec&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;What the CPU active looks like? &amp;nbsp;first 16 are actived?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Another factor, I noticed you are used&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;/opt/intel/mkl/10.2.3.029, &amp;nbsp;and the latest &amp;nbsp;version are MKL 11.2.3. &amp;nbsp;is it possible to try the new version?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;There is no special request about the&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;&amp;nbsp;symmetric matrices. you may wrote out &amp;nbsp;10 of them in your larger code. &amp;nbsp;and test&amp;nbsp;MKL_NUM_THREAS=2 and OMP_NUM_THREADS=8. should be ok.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 06 May 2015 07:32:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032260#M20221</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-05-06T07:32:06Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032261#M20222</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Hello Ying,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;yes first 16 are active, and one thread runs on each of the 16 cores. As an additional information, the operating system for the tests I posted before was an Red Hat 5.8 (Tikanga) with icpc (ICC) 11.1. The user activity is roughly &amp;gt;95% (linux top). I tried OMP_NUM_THREADS=1,2 and MKL_NUM_THREADS=4,8 (guided scheduling) on a different system where the code runs on 2 Intel Xeon E5-2680 @ 2.7GHz with 8 cores per processor. I get the following results( this time the operating system was SUSE Linux Server 11 patch level 3, and the MKL version 11.2, compiler version icpc(ICC) 15.0.2):&lt;/P&gt;

&lt;P&gt;(1x8|none): 3min 49sec , (2x8|guided): 2min 52sec, (4x4|guided): 2min 29sec&lt;/P&gt;

&lt;P&gt;I also tested the minimal example (compiled with icpc -O3 -Wall example.cpp -o example $MKL_INC $MKL_LIB -openmp -openmp_report2):&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;omp.h&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;mkl.h&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;vector&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;cmath&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(209, 47, 27);"&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #78492a"&gt;#include &lt;/SPAN&gt;&amp;lt;iostream&amp;gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;ctime&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;sstream&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;iomanip&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;string&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;#include &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;&amp;lt;cstring&amp;gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(187, 44, 162);"&gt;class&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt; A&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;{&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(187, 44, 162);"&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;public&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; Dim;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt; *Memory;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt; *Matrix;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt; *EigenValues;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A() : Dim(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;1&lt;/SPAN&gt;), Memory(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;), Matrix(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;), EigenValues(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;) {};&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; A(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;const&lt;/SPAN&gt; A &amp;amp;other) : Dim(other.Dim)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Memory = (&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;*)mkl_malloc(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;sizeof&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;)*(Dim * Dim + Dim), &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;64&lt;/SPAN&gt;);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;if&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt; != other.Memory) std::copy(other.Memory, other.Memory + Dim * Dim + Dim, Memory);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matrix = Memory;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; EigenValues = Memory + Dim * Dim;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; ~A()&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;if&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt; != Memory)&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; mkl_free(Memory);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matrix = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; EigenValues = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Memory = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;NULL&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;void&lt;/SPAN&gt; Create(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; Dim)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;this&lt;/SPAN&gt;-&amp;gt;Dim = Dim;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Memory = (&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;*)mkl_malloc(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;sizeof&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;)*(Dim * Dim + Dim), &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;64&lt;/SPAN&gt;);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; std::memset(Memory,&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;,&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;sizeof&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;)*(Dim * Dim + Dim));&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Matrix = Memory;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; EigenValues = Memory + Dim * Dim;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;for&lt;/SPAN&gt;(std::size_t i = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;; i &amp;lt; Dim; ++i)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;for&lt;/SPAN&gt;(std::size_t j = i+&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;1&lt;/SPAN&gt;; j &amp;lt; Dim; ++j)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Memory[i+j*Dim] = sin((&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;)i) * j;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; Diagonalize()&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;char&lt;/SPAN&gt; Job = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;'V'&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;char&lt;/SPAN&gt; UpLo = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;'L'&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; DimWorkspace = -&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;1&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; Info = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt; WorkspaceQuery;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; DSYEV(&amp;amp;Job, &amp;amp;UpLo, &amp;amp;Dim, Matrix, &amp;amp;Dim, EigenValues, &amp;amp;WorkspaceQuery, &amp;amp;DimWorkspace, &amp;amp;Info);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; DimWorkspace = (&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt;)WorkspaceQuery;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt; *Workspace = (&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;*)mkl_malloc(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;sizeof&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;) * DimWorkspace, &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;64&lt;/SPAN&gt;);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; std::memset(Workspace, &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;, &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;sizeof&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;double&lt;/SPAN&gt;) * DimWorkspace);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; DSYEV(&amp;amp;Job, &amp;amp;UpLo, &amp;amp;Dim, Matrix, &amp;amp;Dim, EigenValues, Workspace, &amp;amp;DimWorkspace, &amp;amp;Info);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; mkl_free(Workspace);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;return&lt;/SPAN&gt; Info;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;};&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; main()&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;{&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; omp_set_max_active_levels(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;2&lt;/SPAN&gt;);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; Info = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; N = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;30&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(39, 42, 216);"&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt; D[&lt;/SPAN&gt;30&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;] = {&lt;/SPAN&gt;100&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;100&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;200&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;200&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;700&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;700&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1050&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1050&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1750&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1800&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1800&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;1800&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2700&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2700&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;2950&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;4000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;4000&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;5500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;5500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;5500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;5500&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;, &lt;/SPAN&gt;5750&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #000000"&gt;};&lt;/SPAN&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; std::vector&amp;lt;A&amp;gt; H(N);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;time_t start;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;time(&amp;amp;start);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;"Time at Start: "&lt;/SPAN&gt; &amp;lt;&amp;lt; ctime(&amp;amp;start) &amp;lt;&amp;lt; std::endl;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; std::stringstream *Buffers = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;new&lt;/SPAN&gt; std::stringstream&lt;N&gt;;&lt;/N&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; color: rgb(120, 73, 42);"&gt;&amp;nbsp; &amp;nbsp; #pragma omp parallel for shared(H) schedule(static,&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;1&lt;/SPAN&gt;)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;for&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; i = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;; i &amp;lt; N; ++i)&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; {&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; ID = omp_get_thread_num();&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; H&lt;I&gt;.Create(D&lt;I&gt;);&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Buffers&lt;I&gt; &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;"Diagonalize matrix with dimension "&lt;/SPAN&gt; &amp;lt;&amp;lt; D&lt;I&gt; &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;". Thread "&lt;/SPAN&gt; &amp;lt;&amp;lt; ID &amp;lt;&amp;lt; std::endl;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Info = H&lt;I&gt;.Diagonalize();&lt;/I&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;if&lt;/SPAN&gt;(Info != &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;) std::cout &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;"Full diagonalization of matrix with dimension "&lt;/SPAN&gt; &amp;lt;&amp;lt; D&lt;I&gt; &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;" failed."&lt;/SPAN&gt; &amp;lt;&amp;lt; std::endl;&lt;/I&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; }&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;for&lt;/SPAN&gt;(&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;int&lt;/SPAN&gt; i = &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;; i &amp;lt; N; ++i) std::cout &amp;lt;&amp;lt; Buffers&lt;I&gt;.str() &amp;lt;&amp;lt; std::endl;&lt;/I&gt;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;delete&lt;/SPAN&gt;[] Buffers;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; time_t end;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;time(&amp;amp;end);&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;std::cout &amp;lt;&amp;lt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #d12f1b"&gt;"Time at End: "&lt;/SPAN&gt; &amp;lt;&amp;lt; ctime(&amp;amp;end) &amp;lt;&amp;lt; std::endl;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo; min-height: 13px;"&gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp; &amp;nbsp; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #bb2ca2"&gt;return&lt;/SPAN&gt; &lt;SPAN style="font-variant-ligatures: no-common-ligatures; color: #272ad8"&gt;0&lt;/SPAN&gt;;&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;}&lt;/P&gt;

&lt;P style="margin-bottom: 0px; font-size: 11px; line-height: normal; font-family: Menlo;"&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;On both systems and I got the same qualitative result: The source of the problem could be either the underlying operating system or the MKL library itself. For a further investigation I will try to use the newest MKL version on the Red Hat system.&lt;/P&gt;

&lt;P&gt;Best regards,&lt;/P&gt;

&lt;P&gt;Felix&lt;/P&gt;</description>
      <pubDate>Sun, 10 May 2015 07:56:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032261#M20222</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-05-10T07:56:26Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032262#M20223</link>
      <description>Hi Felix, 
is there any wrong with the result?  
(1x8|none): 3min 49sec , (2x8|guided): 2min 52sec, (4x4|guided): 2min 29sec

I test on one 2 Intel Xeon E5-2680 @ 2.7GHz with 8 cores per processor,   
 cat /etc/issue
Red Hat Enterprise Linux Server release 6.3 (Santiago)
source /opt/intel/composer_xe_2015.2.164/bin/compilervars.sh intel64
icpc -O3 -Wall example_dsyev.cpp -o example -mkl -openmp -openmp_report2

the result looks fine.  2x8  and  4x4 are faster than 1x16.   if with different static, dynamic, guided, there are imbalance issues (100, 5000), so guided and dynamic have better result. 
(1x16| none) ;  1m0.370s
(1x8|none):    1m33.253s , (1x8|dynamic):  1m37.697s
, 
(2x8|guided,2): 0m50.530s ,   (2x8|static,1): real  1m2.764s   (2x8|dynamic,1):  0m54.719s

(4x4|guided,2):  0m49.588s  (4x4|static,1): 0m55.362s  (4x4|dynamic,1): 0m54.256s

#pragma omp parallel for shared(H) private(Info) //schedule(dynamic,1), schedule(guided,2)

 KMP_AFFINITY="verbose,compact"
 OMP_ACTIVE_LEVELS="2"
 OMP_NESTED="true"
 MKL_DYNAMIC="false"

[yhu5@snb04 MKL_forum]$ export MKL_NUM_THREADS=4
[yhu5@snb04 MKL_forum]$ export OMP_NUM_THREADS=4
[yhu5@snb04 MKL_forum]$ echo $MKL_NUM_THREADS
4
[yhu5@snb04 MKL_forum]$ echo $OMP_NUM_THREADS
4
[yhu5@snb04 MKL_forum]$ time ./example


real    0m55.362s
user    11m53.768s
sys     0m7.580s

Best Regards,
Ying</description>
      <pubDate>Mon, 11 May 2015 03:45:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032262#M20223</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-05-11T03:45:17Z</dc:date>
    </item>
    <item>
      <title>Hello Ying,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032263#M20224</link>
      <description>&lt;P&gt;Hello Ying,&lt;/P&gt;

&lt;P&gt;to clarify this, the results:&amp;nbsp;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;(1x8|none): 3min 49sec , (2x8|guided): 2min 52sec, (4x4|guided): 2min 29sec were not obtained by running example.cpp. They are obtained by running the original (larger) code on a different machine (Suse).&amp;nbsp;These results are correct.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;I tested the example.cpp on the SUSE machine (see above)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;(1x8|none): 1m 3sec, (1x16|none): 50sec, (2x8|guided): 53sec, (4x4|guided):52sec&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;as well as on the Red Hat machine (see above)&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;(1x8|none): 2m 48sec, (1x16|none): 3m 1sec, (2x8|guided):4m 42sec, (4x4|guided):3m 31sec&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Did I missunderstand you ?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Best regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Felix&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 May 2015 10:25:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032263#M20224</guid>
      <dc:creator>Felix__K_</dc:creator>
      <dc:date>2015-05-11T10:25:41Z</dc:date>
    </item>
    <item>
      <title>Hi Felix,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032264#M20225</link>
      <description>&lt;P&gt;Hi Felix,&lt;/P&gt;

&lt;P&gt;Let's summary,&amp;nbsp; there are two test platform&amp;nbsp;: &amp;nbsp;Redhat&amp;nbsp; and Suse&amp;nbsp;,&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; and two test codes:&amp;nbsp; Large problem&amp;nbsp;(127 iteration) and example.cpp (30 iterations).&lt;/P&gt;

&lt;P&gt;Case A:&amp;nbsp; On RedHat machine&amp;nbsp; &amp;nbsp;(32 packages x 4 cores/pkg x 1 threads/core (128 total cores), with 10.2.3.029,&amp;nbsp; . 4 Intel Xeon X5550 CPU's with 4 cores each&lt;/P&gt;

&lt;P&gt;large problem:&lt;/P&gt;

&lt;P&gt;(1x8 | none): 8min 24sec, (2x8| dynamic,1): 1h 26min 15sec, (2x8| static,1): 1h 22min 26sec , (2x8| guided): 1h 25min 52sec&lt;/P&gt;

&lt;P&gt;(1x4| none): 12min 25sec, (2x4| dynamic, 1): 8min 6sec (2x4| static, 1): 8min 5sec, (4x4| guided): 22min 30sec&lt;/P&gt;

&lt;P&gt;example.cpp:&lt;/P&gt;

&lt;P&gt;(1x8|none): 2m 48sec, (1x16|none): 3m 1sec, (2x8|guided):4m 42sec, (4x4|guided):3m 31sec&lt;/P&gt;

&lt;P&gt;Case B:&amp;nbsp; On&amp;nbsp; SUSE machine (2 Intel Xeon E5-2680 @ 2.7GHz with 8 cores per processor.&amp;nbsp; SUSE Linux Server 11 patch level 3, and the MKL version 11.2, compiler version icpc(ICC) 15.0.2)&amp;nbsp;&lt;/P&gt;

&lt;P&gt;large problem: (1x8|none): 3min 49sec , (2x8|guided): 2min 52sec, (4x4|guided): 2min 29sec&lt;/P&gt;

&lt;P&gt;example.cpp: (1x8|none): 1m 3sec, (1x16|none): 50sec, (2x8|guided): 53sec, (4x4|guided):52sec&lt;/P&gt;

&lt;P&gt;So the problem is&amp;nbsp; on &amp;nbsp;the Red Hat machine,&amp;nbsp; while&amp;nbsp;all of&amp;nbsp;results (both large problem and example.cpp)&amp;nbsp;on SUSE machine &amp;nbsp;&amp;nbsp; are expected, right?.&lt;/P&gt;

&lt;P&gt;If yes,&amp;nbsp; please upgrade your MKL version on that Redhad machine.&lt;/P&gt;

&lt;P&gt;Another small&amp;nbsp;issues on Redhad machine:&amp;nbsp;&amp;nbsp; you mentioned &amp;nbsp; 4 Intel Xeon X5550 CPU's with 4 cores each , but from KMP_AFFINITY, it shows &amp;nbsp;(32 packages x 4 cores/pkg x 1 threads/core (128 total cores), .&amp;nbsp;&amp;nbsp; so the &amp;nbsp;4 Intel Xeon X5550 CPU's&amp;nbsp; are&amp;nbsp;part of the&amp;nbsp; 32 packages x 4 cores/pkg ?&lt;/P&gt;

&lt;P&gt;and you may need&amp;nbsp;"compact" to make&amp;nbsp;openmp threads not migrate on different cores,&amp;nbsp;&amp;nbsp; &amp;gt;export KMP_AFFINITY='verbose,compact'&amp;nbsp; and see if any changes.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Tue, 12 May 2015 02:13:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Significant-Overhead-if-threaded-MKL-is-called-from-OpenMP/m-p/1032264#M20225</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2015-05-12T02:13:45Z</dc:date>
    </item>
  </channel>
</rss>

