<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sleeping Threads in MKL in Intel® Moderncode for Parallel Architectures</title>
    <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987941#M5916</link>
    <description>Hi Georg,&lt;BR /&gt;The -parallel option should not be necessary to use MKL nor should it be necessary to execute a parallel region before calling an MKL function. This could be an MKL bug but I'm not able to reproduce it locally. Please submit this issue to &lt;A href="http://developer.intel.com/support/performancetools/libraries/mkl/linux/index.htm"&gt;Intel Premier Support&lt;/A&gt;. The MKL experts can probably explain what's happening.&lt;BR /&gt;&lt;BR /&gt;What error message is given about stack limits? You shouldn't have to adjust the KMP_STACKSIZE environment variable because MKL functions should not overflow the thread stacks.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Henry</description>
    <pubDate>Thu, 10 Jul 2003 02:03:43 GMT</pubDate>
    <dc:creator>Henry_G_Intel</dc:creator>
    <dc:date>2003-07-10T02:03:43Z</dc:date>
    <item>
      <title>Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987936#M5911</link>
      <description>&lt;BR /&gt;In a simple test program I have measured the performance&lt;BR /&gt;of MKL6.0 DGEMM on a Dual Xeon (2.66 GHz, 533FSB) for&lt;BR /&gt;different matrix sizes.&lt;BR /&gt;When OMP_NUM_THREADS is greater than 1, I encounter&lt;BR /&gt;program stalls, i.e. the threads just start sleeping&lt;BR /&gt;and do not do any more work. The matrix size for&lt;BR /&gt;which this happens differs from run to run with the&lt;BR /&gt;same binary.&lt;BR /&gt;Has anybody else seen this effect yet? Any ideas?&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Georg.&lt;BR /&gt;</description>
      <pubDate>Mon, 30 Jun 2003 21:50:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987936#M5911</guid>
      <dc:creator>schorscherl</dc:creator>
      <dc:date>2003-06-30T21:50:15Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987937#M5912</link>
      <description>Hi Georg,&lt;BR /&gt;I have some ideas about what might be happening but I need a little more information about your program. DGEMM scales very well for large matrices. Are you seeing any parallel speedup when DGEMM is executed with two threads? Is OMP_NUM_THREADS greater than the number of CPUs? Do the threads start sleeping as the matrix sizes get smaller? Is DGEMM called inside an OpenMP parallel region?&lt;BR /&gt;&lt;BR /&gt;Henry</description>
      <pubDate>Tue, 01 Jul 2003 00:13:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987937#M5912</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2003-07-01T00:13:05Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987938#M5913</link>
      <description>Hi Henry,&lt;BR /&gt;&lt;BR /&gt;this is my simple test program:&lt;BR /&gt;&lt;BR /&gt;      do i=10,200&lt;BR /&gt;         jend=(dble(1000)**3+1.d0)/(5*(dble(i)**3))+1&lt;BR /&gt;         st=MPI_WTIME()&lt;BR /&gt;&lt;BR /&gt;         do j=1,jend&lt;BR /&gt;            call dgemm('N','N',i,i,i,1.d0,a,i,b,i,0.d0,c,i)&lt;BR /&gt;         enddo&lt;BR /&gt;         st=MPI_WTIME()-st&lt;BR /&gt;         write (*,*) i,(jend*2.d0*dble(i)*dble(i)*dble(i))/st&lt;BR /&gt;      enddo&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;I see some parallel speedup, although with matrix sizes &lt;BR /&gt;as small as in this program the speedup is moderate&lt;BR /&gt;(this was written in order to investigate performance&lt;BR /&gt;of MKL for a larger application program that tends to&lt;BR /&gt;use rather small matrices with DGEMM).&lt;BR /&gt;OMP_NUM_THREADS was set to 2. The program works fine &lt;BR /&gt;up to i=40 or so and then hangs, but not always at&lt;BR /&gt;the same i - sometimes it gets as high as 100, another&lt;BR /&gt;time i=50 is the limit. As you can see, the code does &lt;BR /&gt;not use any OpenMP by itself (I have left out&lt;BR /&gt;the variable declarations etc.). The MPI_WTIME()&lt;BR /&gt;is there for convenience, one can of course use any&lt;BR /&gt;other timing mechanism.&lt;BR /&gt;&lt;BR /&gt;We have seen this effect also in "real" OpenMP&lt;BR /&gt;applications that were compiled with the Intel &lt;BR /&gt;compilers, on IA32 as well as on IA64 systems.&lt;BR /&gt;Starting with MKL6 though, it became very pronounced.&lt;BR /&gt;&lt;BR /&gt;Kind Regards,&lt;BR /&gt;Georg.&lt;BR /&gt;</description>
      <pubDate>Tue, 01 Jul 2003 16:21:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987938#M5913</guid>
      <dc:creator>schorscherl</dc:creator>
      <dc:date>2003-07-01T16:21:12Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987939#M5914</link>
      <description>Hi Georg,&lt;BR /&gt;I compiled the following program with the Intel 7.1 Fortran compiler and ran it on a dual-processor Windows 2000 Pro workstation with OMP_NUM_THREADS set to one or two threads:&lt;BR /&gt;&lt;PRE&gt;
      program mklomp
 
      double precision a(200,200), b(200,200), c(200,200)
 
      integer start, finish, rate
      real seconds
 
      call system_clock (COUNT_RATE = rate)
 
      do i = 10, 200
         jend = (dble(1000)**3 + 1.d0) / (5 * (dble(i)**3)) + 1
 
         call system_clock (COUNT = start)
 
         do j = 1, jend
            call dgemm('N', 'N', i, i, i, 1.d0, a, i, b, i, 0.d0, c, i)
         enddo
 
         call system_clock (COUNT = finish)
         seconds = float (finish - start) / float (rate)
 
         write(*,*) i, jend, seconds,
     +        (jend * 2.d0 * dble(i) * dble(i) * dble(i)) / seconds
      enddo
      end
&lt;/PRE&gt;&lt;BR /&gt;The program did not hang and showed reasonable parallel speedup going from one to two threads.&lt;BR /&gt;&lt;BR /&gt;Please check that my test program is an accurate representation of yours. What operating system are you using?&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Henry</description>
      <pubDate>Fri, 04 Jul 2003 04:56:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987939#M5914</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2003-07-04T04:56:41Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987940#M5915</link>
      <description>Hi Henry,&lt;BR /&gt;&lt;BR /&gt;I'm using Linux (Debian, Redhat, SuSE, it happens on all &lt;BR /&gt;of them, with different compiler and libc versions).&lt;BR /&gt;&lt;BR /&gt;I have compiled your program with ifc 7.1 and linked to&lt;BR /&gt;MKL 6.0:&lt;BR /&gt;&lt;BR /&gt;ifc -parallel -static momptest.f -L/opt/intel/mkl/lib/32 -lmkl_ia32&lt;BR /&gt;&lt;BR /&gt;When setting OMP_NUM_THREADS=2 it hangs sometimes after some iterations, as described.&lt;BR /&gt;&lt;BR /&gt;A little sidenote: I had to insert something like &lt;BR /&gt;a=0 before the main loop, so that the compiler &lt;BR /&gt;generates a (auto-)parallel region. This is necessary,&lt;BR /&gt;I have observed, because if the program runs into &lt;BR /&gt;MKL (DGEMM) without having executed at least one parallel&lt;BR /&gt;region first, I get runtime errors about stacksize&lt;BR /&gt;problems (shell limit is 4 GBytes!), reproducibly at&lt;BR /&gt;i=17:&lt;BR /&gt;          ...&lt;BR /&gt;          16       48829  0.2635000       1518053739.65626     &lt;BR /&gt;Unable to set worker thread stacksize to 4194304&lt;BR /&gt;Perhaps try reducing KMP_STACKSIZE or increasing your shell stack limit.&lt;BR /&gt; &lt;BR /&gt;Setting KMP_STACKSIZE to anything doesn't help. But&lt;BR /&gt;maybe I'm doing something seriously wrong here...&lt;BR /&gt;&lt;BR /&gt;Kind regards,&lt;BR /&gt;Georg.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Tue, 08 Jul 2003 21:57:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987940#M5915</guid>
      <dc:creator>schorscherl</dc:creator>
      <dc:date>2003-07-08T21:57:54Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987941#M5916</link>
      <description>Hi Georg,&lt;BR /&gt;The -parallel option should not be necessary to use MKL nor should it be necessary to execute a parallel region before calling an MKL function. This could be an MKL bug but I'm not able to reproduce it locally. Please submit this issue to &lt;A href="http://developer.intel.com/support/performancetools/libraries/mkl/linux/index.htm"&gt;Intel Premier Support&lt;/A&gt;. The MKL experts can probably explain what's happening.&lt;BR /&gt;&lt;BR /&gt;What error message is given about stack limits? You shouldn't have to adjust the KMP_STACKSIZE environment variable because MKL functions should not overflow the thread stacks.&lt;BR /&gt;&lt;BR /&gt;Best regards,&lt;BR /&gt;Henry</description>
      <pubDate>Thu, 10 Jul 2003 02:03:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987941#M5916</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2003-07-10T02:03:43Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987942#M5917</link>
      <description>Hi Henry,&lt;BR /&gt;&lt;BR /&gt;ok so this time I've done it by the book. That's my shell log:&lt;BR /&gt;---------------------------------------------------------&lt;BR /&gt;~/loopkernels &amp;gt; ifc momptest.f -L/opt/intel/mkl/lib/32 -lmkl_ia32 -lguide -lpthread                                program MKLOMP&lt;BR /&gt;&lt;BR /&gt;29 Lines Compiled&lt;BR /&gt;~/loopkernels &amp;gt; ./a.out&lt;BR /&gt;          10      200001  0.4974000       804185788.957131&lt;BR /&gt;          11      150263  0.4789000       835247688.594273&lt;BR /&gt;          12      115741  0.3382000       1182734750.32458&lt;BR /&gt;          13       91034  0.3794000       1054305166.88130&lt;BR /&gt;          14       72887  0.4212000       949676754.895770&lt;BR /&gt;          15       59260  0.3890000       1028290492.21332&lt;BR /&gt;          16       48829  0.2761000       1448776363.54996&lt;BR /&gt;OMP abort: Unable to set worker thread stack size to 4195328 bytes&lt;BR /&gt;Try reducing KMP_STACKSIZE or increasing the shell stack limit.&lt;BR /&gt;&lt;BR /&gt;Abort&lt;BR /&gt;----------------------------------------------------------------&lt;BR /&gt;No KMP_STACKSIZE was set here, and OMP_NUM_THREADS was 2.&lt;BR /&gt;There is no problem with OMP_NUM_THREADS=1.&lt;BR /&gt;&lt;BR /&gt;As I had said, my shell stack limit is at 4GBytes. If I add -parallel&lt;BR /&gt;to the compiler command, the stacksize problem goes away because&lt;BR /&gt;of the additional parallel region in the initialization loop(s). If I&lt;BR /&gt;prevent those loops from being parallelized, the stacksize problem&lt;BR /&gt;reappears.&lt;BR /&gt;&lt;BR /&gt;I think I will now submit both issues (stacksize and sleeping threads)&lt;BR /&gt;to premier support. Thank you nevertheless for your help.&lt;BR /&gt;&lt;BR /&gt;Kind regards,&lt;BR /&gt;Georg.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 10 Jul 2003 15:28:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987942#M5917</guid>
      <dc:creator>schorscherl</dc:creator>
      <dc:date>2003-07-10T15:28:24Z</dc:date>
    </item>
    <item>
      <title>Re: Sleeping Threads in MKL</title>
      <link>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987943#M5918</link>
      <description>Hi Georg,&lt;BR /&gt;When the MKL team gives you a solution to this problem, please post it here.&lt;BR /&gt;&lt;BR /&gt;Thanks,&lt;BR /&gt;Henry</description>
      <pubDate>Thu, 10 Jul 2003 20:28:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Moderncode-for-Parallel/Sleeping-Threads-in-MKL/m-p/987943#M5918</guid>
      <dc:creator>Henry_G_Intel</dc:creator>
      <dc:date>2003-07-10T20:28:52Z</dc:date>
    </item>
  </channel>
</rss>

