<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Executing two calls to a LAPACK routine in parallel in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Executing-two-calls-to-a-LAPACK-routine-in-parallel/m-p/834848#M5979</link>
    <description>&lt;DIV id="_mcePaste"&gt;Hello,&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;I'd like to execute two calls to a LAPACK routine (for example SVD) in parallel using openMP directives. I'd like both those calls to be threaded, i.e. if in total I have 16 cores, both calls should use 8 cores apiece. Can anyone suggest a way of doing that? I tried three approaches:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;1) Nested pragmas:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;omp_set_nested(1);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;#pragma omp parallel num_threads(2)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;{&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  #pragma omp parallel num_threads(8)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  {&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;   //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  #pragma omp parallel num_threads(8)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  {&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;   //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;This starts 8 separate single-threaded SVD computations concurrently, on each matrix.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;2) Using mkl_set_num_threads:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;mkl_set_num_threads(2);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;This computes the two SVDs serially.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;3) Using a pragma for the two threads that start the SVDs, and then using mkl_set_num_threads:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;#pragma omp parallel num_threads(2)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;{&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;The mkl_set_num_threads() call is ignored and both SVDs are single-threaded.&lt;/DIV&gt;</description>
    <pubDate>Tue, 06 Sep 2011 21:57:13 GMT</pubDate>
    <dc:creator>catalogue126</dc:creator>
    <dc:date>2011-09-06T21:57:13Z</dc:date>
    <item>
      <title>Executing two calls to a LAPACK routine in parallel</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Executing-two-calls-to-a-LAPACK-routine-in-parallel/m-p/834848#M5979</link>
      <description>&lt;DIV id="_mcePaste"&gt;Hello,&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;I'd like to execute two calls to a LAPACK routine (for example SVD) in parallel using openMP directives. I'd like both those calls to be threaded, i.e. if in total I have 16 cores, both calls should use 8 cores apiece. Can anyone suggest a way of doing that? I tried three approaches:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;1) Nested pragmas:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;omp_set_nested(1);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;#pragma omp parallel num_threads(2)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;{&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  #pragma omp parallel num_threads(8)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  {&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;   //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  #pragma omp parallel num_threads(8)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  {&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;   //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;This starts 8 separate single-threaded SVD computations concurrently, on each matrix.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;2) Using mkl_set_num_threads:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;mkl_set_num_threads(2);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;This computes the two SVDs serially.&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV id="_mcePaste"&gt;3) Using a pragma for the two threads that start the SVDs, and then using mkl_set_num_threads:&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;#pragma omp parallel num_threads(2)&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;{&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; if (omp_get_thread_num() == 0){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  //SVD of matrix 1&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }else if(omp_get_thread_num() == 1){&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  mkl_set_num_threads(8);&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;  //SVD of matrix 2&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt; }&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;}&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;&lt;/DIV&gt;&lt;DIV id="_mcePaste"&gt;The mkl_set_num_threads() call is ignored and both SVDs are single-threaded.&lt;/DIV&gt;</description>
      <pubDate>Tue, 06 Sep 2011 21:57:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Executing-two-calls-to-a-LAPACK-routine-in-parallel/m-p/834848#M5979</guid>
      <dc:creator>catalogue126</dc:creator>
      <dc:date>2011-09-06T21:57:13Z</dc:date>
    </item>
    <item>
      <title>Executing two calls to a LAPACK routine in parallel</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Executing-two-calls-to-a-LAPACK-routine-in-parallel/m-p/834849#M5980</link>
      <description>Hello,&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;You also need to add &lt;A href="http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/mkl/refman/support/functn_mkl_set_dynamic.html"&gt;mkl_set_dynamic&lt;/A&gt;(false); before the OpenMP region.&lt;/DIV&gt;&lt;DIV&gt;By default the value is true and means that MKL dynamically allowed to change number of threads set by mkl_set_num_threads() if it seems reasonable. In the example MKL detects that higher level threading is in use. MKL couldn't detect all the details about the higher threading, thus just relies on it and runs in sequential mode assuming that the higher threading could refine the behavior with help of mkl_set_dynamic() and mkl_set_num_threads().&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;More details could be found in &lt;A href="http://software.intel.com/en-us/articles/intel-math-kernel-library-documentation/"&gt;Intel MKL User's Guides&lt;/A&gt;at Managing Performance and Memory -&amp;gt; Using Parallelism of the Intel Math Kernel Library -&amp;gt; Using Additional Threading Control.&lt;/DIV&gt;&lt;DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;With best regards,&lt;/DIV&gt;&lt;DIV&gt;Alexander&lt;/DIV&gt;</description>
      <pubDate>Wed, 07 Sep 2011 08:30:22 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Executing-two-calls-to-a-LAPACK-routine-in-parallel/m-p/834849#M5980</guid>
      <dc:creator>Alexander_K_Intel3</dc:creator>
      <dc:date>2011-09-07T08:30:22Z</dc:date>
    </item>
  </channel>
</rss>

