<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi  in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168208#M28357</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;

&lt;P&gt;What is the size of m, n, p and how do you set KMP_AFFINITY for the operation&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Could you please set MKL_VERBOSE=1&amp;nbsp; and KMP_AFFINITY=compact&lt;/P&gt;

&lt;P&gt;or expose the MKL_VERBOSE=1 and your.exe and obverse the result?&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;and Please submit your question to our official support channel:&amp;nbsp;&lt;A href="https://supporttickets.intel.com/"&gt;&lt;SPAN style="font-weight: 700;"&gt;&lt;I&gt;&lt;I&gt;Online&lt;/I&gt;&amp;nbsp;&lt;I&gt;Service&lt;/I&gt;&amp;nbsp;&lt;I&gt;Center&lt;/I&gt;&lt;/I&gt;&lt;/SPAN&gt;&lt;I&gt;&amp;nbsp;-&amp;nbsp;&lt;/I&gt;&lt;SPAN style="font-weight: 700;"&gt;&lt;I&gt;&lt;I&gt;Intel&lt;/I&gt;&lt;/I&gt;&lt;/SPAN&gt;&lt;I&gt;&amp;nbsp;&lt;I&gt;Support&lt;/I&gt;&lt;/I&gt;&lt;/A&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;I&gt;&lt;I&gt;Best regards,&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;I&gt;&lt;I&gt;Ying&amp;nbsp;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 07 Dec 2017 01:15:31 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2017-12-07T01:15:31Z</dc:date>
    <item>
      <title>Parallel cblass-dgemm functions</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168207#M28356</link>
      <description>&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px; margin-top: 0px; margin-bottom: 0px;"&gt;&lt;I&gt;&amp;nbsp; Hi,&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px; margin-top: 0px; margin-bottom: 0px;"&gt;&lt;I&gt;&amp;nbsp;We want to run two MKL cblass-dgemm functions in parallel on a KNL platform. We want these two functions to run on two disjoint&amp;nbsp; &amp;nbsp;set of cores. As the total number of threads on our KNL is 64, we would like the first function to run on 32 cores, and the second&amp;nbsp; &amp;nbsp;function to run on another set of 32 cores, disjoint from the first one, and in parallel. Our current code is something like this :&amp;nbsp;&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px; margin-top: 0px; margin-bottom: 0px;"&gt;&lt;I&gt;&amp;nbsp;...&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px; margin-top: 0px; margin-bottom: 0px;"&gt;&lt;FONT size="3"&gt;&lt;SPAN style="font-size: 12pt;"&gt;&lt;I&gt;&amp;nbsp;omp_set_num_threads(&amp;nbsp;&lt;/I&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT size="3"&gt;&lt;SPAN style="font-size: 12pt;"&gt;&lt;I&gt;64&lt;/I&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;FONT size="3"&gt;&lt;SPAN style="font-size: 12pt;"&gt;&lt;I&gt;&amp;nbsp;);&lt;/I&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px; margin-top: 0px; margin-bottom: 0px;"&gt;&lt;I&gt;&amp;nbsp;....&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;#pragma omp parallel num_threads(2)&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;{&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; if (omp_get_thread_num() == 0){&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; omp_set_num_threads(32);&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,m, n, p, 1, A, p, B, n, 0, C1, n);&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; }else{&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; omp_set_num_threads(32);&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans,m, n, p, 1, A, p, B, n, 0, C2, n);&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; }&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;}&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;....&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;The problem is that running those two functions serially takes less time than running them in parallel. Could you please help us&amp;nbsp; &amp;nbsp;figure out what is wrong with this code section and how to have two cblas_dgemm calls run in parallel?&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&lt;I&gt;&amp;nbsp;Thank you very much&lt;/I&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(0, 0, 0); font-family: Calibri, Helvetica, sans-serif; font-size: 16px;"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Tue, 05 Dec 2017 02:04:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168207#M28356</guid>
      <dc:creator>Gheibi__Sanaz</dc:creator>
      <dc:date>2017-12-05T02:04:46Z</dc:date>
    </item>
    <item>
      <title>Hi </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168208#M28357</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;/P&gt;

&lt;P&gt;What is the size of m, n, p and how do you set KMP_AFFINITY for the operation&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Could you please set MKL_VERBOSE=1&amp;nbsp; and KMP_AFFINITY=compact&lt;/P&gt;

&lt;P&gt;or expose the MKL_VERBOSE=1 and your.exe and obverse the result?&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;and Please submit your question to our official support channel:&amp;nbsp;&lt;A href="https://supporttickets.intel.com/"&gt;&lt;SPAN style="font-weight: 700;"&gt;&lt;I&gt;&lt;I&gt;Online&lt;/I&gt;&amp;nbsp;&lt;I&gt;Service&lt;/I&gt;&amp;nbsp;&lt;I&gt;Center&lt;/I&gt;&lt;/I&gt;&lt;/SPAN&gt;&lt;I&gt;&amp;nbsp;-&amp;nbsp;&lt;/I&gt;&lt;SPAN style="font-weight: 700;"&gt;&lt;I&gt;&lt;I&gt;Intel&lt;/I&gt;&lt;/I&gt;&lt;/SPAN&gt;&lt;I&gt;&amp;nbsp;&lt;I&gt;Support&lt;/I&gt;&lt;/I&gt;&lt;/A&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;I&gt;&lt;I&gt;Best regards,&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;I&gt;&lt;I&gt;Ying&amp;nbsp;&lt;/I&gt;&lt;/I&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2017 01:15:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168208#M28357</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2017-12-07T01:15:31Z</dc:date>
    </item>
    <item>
      <title>You should be able to set</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168209#M28358</link>
      <description>You should be able to set both number of threads and core affinity by use of kmp_hw_subset to choose non overlapping tile groups. As ying suggested,  adding mkl_verbose should help with diagnosis.  You probably need to run from a script so the 2 jobs don't see each others hw_subset offset and resulting hw thread assignment.   The suggestion about kmp_affinity =compact seems more applicable to knc where you might use 4 threads per core.
Failing to affinitize to distinct tile sets might be expected to yield poor performance.</description>
      <pubDate>Thu, 07 Dec 2017 03:29:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168209#M28358</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-12-07T03:29:00Z</dc:date>
    </item>
    <item>
      <title>Thank you very much Ying and</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168210#M28359</link>
      <description>&lt;P&gt;Thank you very much Ying and Tim for your help.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;In our case, m, n, p are all set to 100.We did a test in which we set&amp;nbsp;&lt;SPAN style="font-size: 12px;"&gt;set MKL_VERBOSE=1&amp;nbsp; and KMP_AFFINITY=compact, and here is the result we got:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;EM&gt;&lt;SPAN style="font-size: 13.008px;"&gt;For the parallel version:&amp;nbsp;&lt;/SPAN&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: rgb(33, 33, 33); font-family: Menlo; font-size: 11px;"&gt;MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) for Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) enabled processors, Lnx 1.30GHz lp64 intel_thread NMICDev:0&lt;/SPAN&gt;&lt;/P&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;MKL_VERBOSE DGEMM(N,N,100,100,100,0x7f43897fd890,0x1291a00,100,0x127e100,100,0x7f43897fd898,0x12b8b80,100) 81.96ms CNR:OFF Dyn:0 FastMM:1&amp;nbsp;&amp;nbsp;&lt;STRONG&gt;TID:1&lt;/STRONG&gt;&amp;nbsp; &amp;nbsp;NThr:32 WDiv:HOST:+0.000&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;MKL_VERBOSE DGEMM(N,N,100,100,100,0x7f43897fd890,0x1291a00,100,0x127e100,100,0x7f43897fd898,0x12b8b80,100) 81.96ms CNR:OFF Dyn:0 FastMM:1&amp;nbsp;&amp;nbsp;&lt;STRONG&gt;TID:1&amp;nbsp;&lt;/STRONG&gt;&amp;nbsp; NThr:32 WDiv:HOST:+0.000&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;&lt;STRONG&gt;time elapsed is 0.174498&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;EM&gt;&lt;SPAN style="font-size: 13.008px;"&gt;For the serial version:&lt;/SPAN&gt;&lt;/EM&gt;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;SPAN style="font-size: 13.008px;"&gt;&amp;nbsp;&lt;/SPAN&gt;

	&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 512 (Intel(R) AVX-512) for Intel(R) Many Integrated Core Architecture (Intel(R) MIC Architecture) enabled processors, Lnx 1.30GHz lp64 intel_thread NMICDev:0&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;MKL_VERBOSE DGEMM(N,N,100,100,100,0x7ffdb6480090,0x80ca00,100,0x7f9100,100,0x7ffdb6480098,0x8202c0,100) 132.72ms CNR:OFF Dyn:0 FastMM:1&amp;nbsp;&amp;nbsp;&lt;STRONG&gt;TID:0&lt;/STRONG&gt;&amp;nbsp; NThr:64 WDiv:HOST:+0.000&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;MKL_VERBOSE DGEMM(N,N,100,100,100,0x7ffdb6480090,0x80ca00,100,0x7f9100,100,0x7ffdb6480098,0x833b80,100) 140.48us CNR:OFF Dyn:0 FastMM:1&amp;nbsp;&amp;nbsp;&lt;STRONG&gt;TID:0&lt;/STRONG&gt;&amp;nbsp; &amp;nbsp;NThr:64 WDiv:HOST:+0.000&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&lt;FONT face="Menlo" size="1"&gt;&lt;SPAN style="font-size: 11px;"&gt;&lt;STRONG&gt;time elapsed is 0.134626&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/DIV&gt;

	&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; margin: 0px;"&gt;As you see, the elapsed time is larger for the parallel case than for the serial case.&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; margin: 0px;"&gt;Another confusing issue is that in the parallel case, both the &lt;STRONG&gt;TID&lt;/STRONG&gt; values are &lt;STRONG&gt;1&lt;/STRONG&gt; . This is not what we wanted. As you can see from the code in our original post, we produced two threads using "&lt;I style="font-size: 12px;"&gt;#pragma omp parallel num_threads(2)&lt;/I&gt;" each of which was meant to further divide into 32 threads. What we would expect is to have two different &lt;STRONG&gt;TID&lt;/STRONG&gt;s for the parallel case. We don't know what is going wrong here, and we would really appreciate your help.&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; margin: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; margin: 0px;"&gt;Thank you very much&lt;/DIV&gt;

&lt;DIV style="color: rgb(33, 33, 33); font-family: wf_segoe-ui_normal, &amp;quot;Segoe UI&amp;quot;, &amp;quot;Segoe WP&amp;quot;, Tahoma, Arial, sans-serif, serif, EmojiFont; font-size: 15px; margin: 0px;"&gt;&amp;nbsp;&lt;/DIV&gt;</description>
      <pubDate>Sat, 09 Dec 2017 02:27:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168210#M28359</guid>
      <dc:creator>Gheibi__Sanaz</dc:creator>
      <dc:date>2017-12-09T02:27:06Z</dc:date>
    </item>
    <item>
      <title>If your intent is to use</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168211#M28360</link>
      <description>If your intent is to use nested omp parallelism, you must activate omp_nested and set omp_num_threads =2,32  . With the diagnostics enable, you may be able to see whether the mkl default affinity is spreading the threads correctly across tiles.</description>
      <pubDate>Sat, 09 Dec 2017 13:17:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168211#M28360</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2017-12-09T13:17:16Z</dc:date>
    </item>
    <item>
      <title>Thank you very much Tim. It</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168212#M28361</link>
      <description>&lt;P&gt;Thank you very much Tim. It will help us a lot. Thanks again !&lt;/P&gt;</description>
      <pubDate>Sun, 10 Dec 2017 16:50:38 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168212#M28361</guid>
      <dc:creator>Gheibi__Sanaz</dc:creator>
      <dc:date>2017-12-10T16:50:38Z</dc:date>
    </item>
    <item>
      <title>Hi again, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168213#M28362</link>
      <description>&lt;P&gt;Hi again,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;We still have another question, and we will really appreciate your help:&amp;nbsp;&lt;/P&gt;

&lt;P&gt;How can we know which threads are executing a certain cblas-dgemm function? If we could know that, we would be able to put those threads close to each other using &lt;STRONG&gt;proc_list&lt;/STRONG&gt; with &lt;STRONG&gt;KMP_AFFINITY&lt;/STRONG&gt;.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thank you very much&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2017 16:10:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168213#M28362</guid>
      <dc:creator>Gheibi__Sanaz</dc:creator>
      <dc:date>2017-12-11T16:10:39Z</dc:date>
    </item>
    <item>
      <title>Hi Gheibi,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168214#M28363</link>
      <description>&lt;P&gt;Hi Gheibi,&lt;/P&gt;

&lt;P&gt;manually, you &amp;nbsp;would be able to put those threads close to each other using &lt;STRONG&gt;proc_list&lt;/STRONG&gt; with &lt;STRONG&gt;KMP_AFFINITY&lt;/STRONG&gt;.&amp;nbsp; and get information for which threads are executing a certain cblas-dgemm function. but&amp;nbsp; it may bring all kind of technique discussion.&amp;nbsp; So you may do that&amp;nbsp;to&amp;nbsp;&amp;nbsp;set cblas-dgemm's openmp threads to proc_list&amp;nbsp;by KMP_AFFINITY&lt;/P&gt;

&lt;P&gt;MKL threading is based on OpenMP.&amp;nbsp;&amp;nbsp;you can control them&amp;nbsp; as MKL developer&amp;nbsp;guide mentioned: &amp;nbsp;&lt;A href="https://software.intel.com/en-us/node/528550"&gt;https://software.intel.com/en-us/node/528550&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;or intel compiler documentation &lt;A href="https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-thread-affinity-interface-linux-and-windows#LOW_LEVEL_AFFINITY_API"&gt;https://software.intel.com/en-us/cpp-compiler-18.0-developer-guide-and-reference-thread-affinity-interface-linux-and-windows#LOW_LEVEL_AFFINITY_API&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/node/528546#92D6DAD0-A858-4824-9A90-AC2AD2A9C2E1"&gt;https://software.intel.com/en-us/node/528546#92D6DAD0-A858-4824-9A90-AC2AD2A9C2E1&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;and other discussion&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application"&gt;https://software.intel.com/en-us/articles/using-threaded-intel-mkl-in-multi-thread-application&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;&lt;A href="https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/283564"&gt;https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/283564&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;theoretically, we don't recommend that.&lt;/P&gt;

&lt;P&gt;about the performance, as you tested, if same sgemm function in multi-thread call,&amp;nbsp; then use MKL internal multi-thread may better&amp;nbsp;than your design thread affinity.&lt;/P&gt;

&lt;P&gt;Best Regards,&lt;/P&gt;

&lt;P&gt;Ying&lt;/P&gt;</description>
      <pubDate>Thu, 14 Dec 2017 07:34:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Parallel-cblass-dgemm-functions/m-p/1168214#M28363</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2017-12-14T07:34:59Z</dc:date>
    </item>
  </channel>
</rss>

