<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi, baizq,  in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119465#M24877</link>
    <description>&lt;P&gt;Hi, baizq,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;which lapack function are you calling? &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;1) threads and &amp;nbsp;the speed was lowered to about 1/3 for each of the jobs&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;General speaking, the link line -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread will invoke MKL internal openmp threading. Which mean&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;./a.out &amp;nbsp;on&amp;nbsp;small workstation equipped with four&amp;nbsp;16 core AMD Opteron 6376 processors if without any core affinity setting,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;will run with 64 threads.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Could you please try to &amp;gt;export KMP_AFFINITY=verbose&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and let us know the output result when single job and 4 job respectively?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;(i guess, when you run 4 job simultaneously, each job may invoke 64 threads, so overload the threads, thus, t&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;he speed was lowered to about 1/3 for each of the jobs. but it depends on the second questions)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;2) You mentioned, "&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;when we submitted four of this same program (each requiring one core, totally four cores) simultaneously.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Could you please describle the details?, like how do you bind &amp;nbsp;one core for one a.out? &amp;nbsp;As i understand, you may want to run four job on four processor (16 cores)?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;There is some discussion about CPU usage, in &lt;A href="https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/605789.&amp;nbsp;" target="_blank"&gt;https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/605789.&amp;nbsp;&lt;/A&gt;;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Ying&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 14 Feb 2016 01:59:41 GMT</pubDate>
    <dc:creator>Ying_H_Intel</dc:creator>
    <dc:date>2016-02-14T01:59:41Z</dc:date>
    <item>
      <title>LAPACK memory problem</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119462#M24874</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;Hi all, &lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;We have a small workstation equipped with four&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;&amp;nbsp;16 core AMD Opteron 6376 processors running at 2.3 GHz, for a total of 64 cores, and 256 GB memory. While doing tests with INTEL MKL package, we met a problem: When we submitted&amp;nbsp;a single job (requiring one core) which was compiled by ifort and calls MKL LAPACK, it runs much faster than a similar program compiled with gfortran and calling the open source LAPACK. However, when we submitted four of this same p&lt;/FONT&gt;&lt;/SPAN&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;rogram (each requiring one core, totally four cores) simultaneously, the speed was lowered to about 1/3 for each of the jobs. The jobs compiled with gfortran and calling open-source LAPACK did not have this problem.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;I heard from others that this may be due to some memory consumption problems. Could anyone suggest me what exactly the problem is? Thanks in advance.&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-family: &amp;quot;Times New Roman&amp;quot;,serif; font-size: 12pt; mso-fareast-font-family: &amp;quot;Times New Roman&amp;quot;; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;&lt;FONT color="#000000"&gt;baizq&lt;/FONT&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Feb 2016 00:31:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119462#M24874</guid>
      <dc:creator>Zhaoqiang_B_</dc:creator>
      <dc:date>2016-02-06T00:31:21Z</dc:date>
    </item>
    <item>
      <title>If you run multiple MKL jobs</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119463#M24875</link>
      <description>&lt;P&gt;If you run multiple MKL jobs simultaneously, with each one set to use all the "cores" (possibly meaning all the supported hardware thread contexts), you will surely run into issues with cache.&amp;nbsp; Consider running each one in a separate shell, setting a number of threads appropriate to a single CPU, with the affinity set, e.g. by OMP_PROC_BIND, to the thread context numbers of a single CPU.&amp;nbsp; Unless you have an appropriate resource manager, this means the submitters of each task will need to agree on which CPU each one uses.&lt;/P&gt;

&lt;P&gt;According my limited knowledge of current style AMD CPUs, you might want 16 threads per CPU if running single precision, or 8 if double.&lt;/P&gt;

&lt;P&gt;You didn't say whether you run linux or Windows, besides not saying whether you are comparing single and multiple thread cases.&amp;nbsp; In linux, there are more options to accomplish this, such as submitting the tasks under taskset.&amp;nbsp; Typical linux distributions of lapack are not well optimized even according to the capabilities of current gfortran (as well as being single threaded), so it would be rather embarrassing if you can't get better performance by appropriate use of Intel software capabilities.&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;If you have such huge problems that each needs more than 64GB, you would expect running them simultaneously to be inefficient,&lt;/P&gt;</description>
      <pubDate>Sat, 06 Feb 2016 03:26:39 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119463#M24875</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-02-06T03:26:39Z</dc:date>
    </item>
    <item>
      <title>Hi Tim,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119464#M24876</link>
      <description>&lt;P&gt;Hi Tim,&lt;/P&gt;

&lt;P&gt;Thank you for your reply.&lt;/P&gt;

&lt;P&gt;As for your question, I am running the job on Linux (CentOS 6.5). It is not quite clear to me how to make single/multiple thread cases. This is the way we compile/link our source code:&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: black; font-family: &amp;quot;Tahoma&amp;quot;,sans-serif; font-size: 10pt; mso-fareast-font-family: SimSun; mso-fareast-theme-font: minor-fareast; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;ifort test.f90 -I/opt/intel/mkl/include/ -L/opt/intel/mkl/lib/intel64_lin -L -static-intel -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="color: black; font-family: &amp;quot;Tahoma&amp;quot;,sans-serif; font-size: 10pt; mso-fareast-font-family: SimSun; mso-fareast-theme-font: minor-fareast; mso-ansi-language: EN-US; mso-fareast-language: ZH-CN; mso-bidi-language: AR-SA;"&gt;Here is the information of our CPUs. The number of threads is 16. &lt;/SPAN&gt;&lt;A href="http://www.cpu-world.com/CPUs/Bulldozer/AMD-Opteron%206376%20-%20OS6376WKTGGHK.html"&gt;http://www.cpu-world.com/CPUs/Bulldozer/AMD-Opteron%206376%20-%20OS6376WKTGGHK.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;I followed your suggestion to export the environment parameter OMP_PROC_BIND as 8. But it did not fix the problem. The time consumed&amp;nbsp;is the same. Also&amp;nbsp;I was returned a warning message&amp;nbsp;"&lt;SPAN style="color: rgb(31, 73, 125); font-family: &amp;quot;Calibri&amp;quot;,sans-serif; font-size: 11pt;"&gt;OMP: Warning #42: OMP_PROC_BIND: "8" is an invalid value; ignored&lt;/SPAN&gt;"&amp;nbsp;.&lt;/P&gt;

&lt;P&gt;Could you please help me to look into the problem? Or suggest me some resources&amp;nbsp;from which I can read and learn about the multi-thread stuff of the MKL package. Thank you in advance.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;baizq&lt;/P&gt;</description>
      <pubDate>Tue, 09 Feb 2016 18:53:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119464#M24876</guid>
      <dc:creator>Zhaoqiang_B_</dc:creator>
      <dc:date>2016-02-09T18:53:00Z</dc:date>
    </item>
    <item>
      <title>Hi, baizq, </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119465#M24877</link>
      <description>&lt;P&gt;Hi, baizq,&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;which lapack function are you calling? &lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;1) threads and &amp;nbsp;the speed was lowered to about 1/3 for each of the jobs&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;General speaking, the link line -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lmkl_lapack95_lp64 -liomp5 -lpthread will invoke MKL internal openmp threading. Which mean&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;./a.out &amp;nbsp;on&amp;nbsp;small workstation equipped with four&amp;nbsp;16 core AMD Opteron 6376 processors if without any core affinity setting,&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;will run with 64 threads.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Could you please try to &amp;gt;export KMP_AFFINITY=verbose&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;and let us know the output result when single job and 4 job respectively?&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;(i guess, when you run 4 job simultaneously, each job may invoke 64 threads, so overload the threads, thus, t&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;he speed was lowered to about 1/3 for each of the jobs. but it depends on the second questions)&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;2) You mentioned, "&lt;/SPAN&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;when we submitted four of this same program (each requiring one core, totally four cores) simultaneously.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Could you please describle the details?, like how do you bind &amp;nbsp;one core for one a.out? &amp;nbsp;As i understand, you may want to run four job on four processor (16 cores)?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;There is some discussion about CPU usage, in &lt;A href="https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/605789.&amp;nbsp;" target="_blank"&gt;https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/605789.&amp;nbsp;&lt;/A&gt;;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Best Regards,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 12px; line-height: 18px;"&gt;Ying&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Feb 2016 01:59:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/LAPACK-memory-problem/m-p/1119465#M24877</guid>
      <dc:creator>Ying_H_Intel</dc:creator>
      <dc:date>2016-02-14T01:59:41Z</dc:date>
    </item>
  </channel>
</rss>

