<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic From Linux in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143385#M26474</link>
    <description>&lt;P&gt;From Linux&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;MKL_VERBOSE DGEMM(N,N,696,85,696,0x7ff9fa764008,0x7ff9e0206b80,696,0x7ffa200805c0,696,0x7ff9fa764000,0x7ffa1867c400,696) 1.01ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE ZGEMM3M(N,N,85,85,696,0x7ff9fa764150,0x7ff9d6aea0c0,85,0x7ff9e3872300,696,0x7ff9fa764160,0x7ffa28a643c0,85) 671.57us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE ZLANGE(1,85,85,0x7ffa013a3780,85,0x22c54c0) 44.34us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGETRF(85,85,0x7ffa013a3780,85,0x7ffa2023be80,0) 268.39us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGECON(1,85,0x7ffa013a3780,85,0x7fff7c91f950,0x7fff7c91fb80,0x22c6930,0x22c54c0,0) 137.77us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGETRI(85,0x7ff9e3ac67c0,85,0x7ffa2023be80,0x7fff7c91e1c8,-1,0) 2.08us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
line 94: 184309 Segmentation fault     
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 20 Jun 2018 16:51:45 GMT</pubDate>
    <dc:creator>AndrewC</dc:creator>
    <dc:date>2018-06-20T16:51:45Z</dc:date>
    <item>
      <title>MKL 2018 Update 3 (Windows) zgetri crash</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143381#M26470</link>
      <description>&lt;P&gt;MKL 2018 Update 3 has broken zgetri.&lt;/P&gt;

&lt;P&gt;Our QA process now crashes in a call to zgetri. Reverting to MKL Update 2 DLL's resolves the issue. It does not happen on every call to zgetri&lt;/P&gt;

&lt;P&gt;I will point out that Intel Inspector complains about zgetri and data races (and has for a long time...)&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;Not Flagged	&amp;gt;	14996	0	Main Thread	Main Thread	libiomp5md.dll!__kmp_task_team_wait
 	 	 	 	 	 	[External Code]
 	 	 	 	 	 	libiomp5md.dll!__kmp_task_team_wait(kmp_info * this_thr, kmp_team * team, void * itt_sync_obj, int wait) Line 401
 	 	 	 	 	 	libiomp5md.dll!__kmp_join_barrier(int gtid) Line 2037
 	 	 	 	 	 	libiomp5md.dll!__kmp_join_call(ident * loc, int gtid, fork_context_e fork_context, int exit_teams) Line 7493
 	 	 	 	 	 	libiomp5md.dll!__kmpc_fork_call(ident * loc, int argc, void(*)(int *, int *) microtask) Line 372
 	 	 	 	 	 	mkl_intel_thread.dll!000007feb0d84b70()
 	 	 	 	 	 	mkl_core.dll!000007feacbc61fe()
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2018 15:25:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143381#M26470</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-06-19T15:25:48Z</dc:date>
    </item>
    <item>
      <title>I can reproduce the same</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143382#M26471</link>
      <description>&lt;P&gt;I can reproduce the same issue on Linux&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;(gdb) bt
#0  0x00007fffe457b288 in __kmp_execute_tasks_64 () from libiomp5.so
#1  0x00007fffe450d38d in _INTERNAL_25_______src_kmp_barrier_cpp_71f3cf03::__kmp_hyper_barrier_release(barrier_typ                                         e, kmp_info*, int, int, int, void*) () from /opt/ESI/VAOne2018/libiomp5.so
#2  0x00007fffe450e7a2 in __kmp_fork_barrier(int, int) () from libiomp5.so
#3  0x00007fffe454fb23 in __kmp_launch_thread () from libiomp5.so
#4  0x00007fffe4589c30 in _INTERNAL_26_______src_z_Linux_util_cpp_ea62c7c0::__kmp_launch_worker(void*) ()
   from /opt/ESI/VAOne2018/libiomp5.so
#5  0x00007fffe4291064 in start_thread (arg=0x7fffcb23ea00) at pthread_create.c:309
#6  0x00007fffe0b6262d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 Jun 2018 16:39:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143382#M26471</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-06-19T16:39:16Z</dc:date>
    </item>
    <item>
      <title>I  see no issues with my</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143383#M26472</link>
      <description>&lt;P&gt;I&amp;nbsp; see no issues with my internal tests. Could you share us reproducer or input parameters? You may set MKL_VERBOSE and shared the output.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jun 2018 04:09:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143383#M26472</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-06-20T04:09:17Z</dc:date>
    </item>
    <item>
      <title>I will try and get more info</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143384#M26473</link>
      <description>&lt;P&gt;I will try and get more info using MKL_VERBOSE&lt;/P&gt;

&lt;P&gt;Just to reiterate the steps&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;We run thousands of QA tests&lt;/LI&gt;
	&lt;LI&gt;Updating from 2018 Update 2 to Update 3 resulted in a few failures ( crashes). Reverting the MKL DLL's (only) solved the issue&lt;/LI&gt;
	&lt;LI&gt;The failures are in a call to zgetri&lt;/LI&gt;
	&lt;LI&gt;The failures happen on both Windows and Linux ( this is usually a helpful hint for debugging as it eliminates platform issues)&lt;/LI&gt;
	&lt;LI&gt;Sensitive to # of threads. I had to set OMP_NUM_THREADS=8 on Linux ( same as Windows)&lt;/LI&gt;
	&lt;LI&gt;If OMP_NUM_THREADS=1 , there are no failures.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;A code review of any changes to zgetri from Update 2-&amp;gt;3 might be worthwhile&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jun 2018 16:10:31 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143384#M26473</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-06-20T16:10:31Z</dc:date>
    </item>
    <item>
      <title>From Linux</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143385#M26474</link>
      <description>&lt;P&gt;From Linux&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;MKL_VERBOSE DGEMM(N,N,696,85,696,0x7ff9fa764008,0x7ff9e0206b80,696,0x7ffa200805c0,696,0x7ff9fa764000,0x7ffa1867c400,696) 1.01ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE ZGEMM3M(N,N,85,85,696,0x7ff9fa764150,0x7ff9d6aea0c0,85,0x7ff9e3872300,696,0x7ff9fa764160,0x7ffa28a643c0,85) 671.57us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:4
MKL_VERBOSE ZLANGE(1,85,85,0x7ffa013a3780,85,0x22c54c0) 44.34us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGETRF(85,85,0x7ffa013a3780,85,0x7ffa2023be80,0) 268.39us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGECON(1,85,0x7ffa013a3780,85,0x7fff7c91f950,0x7fff7c91fb80,0x22c6930,0x22c54c0,0) 137.77us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
MKL_VERBOSE ZGETRI(85,0x7ff9e3ac67c0,85,0x7ffa2023be80,0x7fff7c91e1c8,-1,0) 2.08us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:8
line 94: 184309 Segmentation fault     
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 20 Jun 2018 16:51:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143385#M26474</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-06-20T16:51:45Z</dc:date>
    </item>
    <item>
      <title> </title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143386#M26475</link>
      <description>&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;Hi Andrew,&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;as I don't see the problem with this routine on my side. the output is below.&lt;/DIV&gt;

&lt;DIV&gt;MKL v 2018 u3, 8 threads, zgetri, AVX -based systems&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;u780583_zgetri_ESI]$ ./a.out&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MKL_VERBOSE Intel(R) MKL 2018.0 Update 3 Product build 20180406 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.80GHz lp64 intel_thread&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MKL_VERBOSE ZGETRF(85,85,0x2261680,85,0x225c010,0) 5.64ms CNR:OFF Dyn:1 FastMM:1 TID:0&amp;nbsp; NThr:8&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;... getrf passed with info...0&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;MKL_VERBOSE ZGETRI(85,0x2261680,85,0x225c010,0x225c170,85,0) 6.64ms CNR:OFF Dyn:1 FastMM:1 TID:0&amp;nbsp; NThr:8&lt;/SPAN&gt;&lt;/DIV&gt;

&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;&lt;SPAN style="font-size: 13.008px;"&gt;Сould you share with us the input matrix?&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 21 Jun 2018 11:14:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143386#M26475</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-06-21T11:14:48Z</dc:date>
    </item>
    <item>
      <title>Hi Gennady,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143387#M26476</link>
      <description>&lt;P&gt;Hi Gennady,&lt;/P&gt;

&lt;P&gt;I left out some important information. ZGETRI is being called from multiple boost::thread(s). If I only use one boost::thread ( but leave MKL_NUM_THREADS=8) then the problem goes away. So either there is some kind of data race inside zgetri (or my code , clearly). I will run Intel Inspector xe to see if I can locate any issues.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Andrew&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jun 2018 15:41:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143387#M26476</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-06-21T15:41:34Z</dc:date>
    </item>
    <item>
      <title>To close out this issue. I</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143388#M26477</link>
      <description>&lt;P&gt;To close out this issue. I was compiling my code with -axAVX. On both Linux and Windows this has seemed to not be 100% robust and caused problems. When I removed the -ax option , all my issues were resolved.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jul 2018 18:50:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143388#M26477</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-07-11T18:50:21Z</dc:date>
    </item>
    <item>
      <title>More on this issue. It came</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143389#M26478</link>
      <description>&lt;P&gt;More on this issue. It came back to bite me. While trying to generate a test case I found that the pivots array , generated by zgetrf contained occasional garbage entries ( large -ve numbers). This causes the ZGETRI&amp;nbsp; to crash , of course. Its quite odd as it only happens occasionally even with the same input data.&lt;/P&gt;</description>
      <pubDate>Fri, 27 Jul 2018 23:12:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143389#M26478</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-07-27T23:12:36Z</dc:date>
    </item>
    <item>
      <title>The problem with zgetri</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143390#M26479</link>
      <description>&lt;P&gt;The problem with zgetri continues. The latest issue is "bogus" value returned when lwork=-1 ( estimating workspace). I am getting obviously garbage numbers in work[0].&lt;/P&gt;

&lt;P&gt;I have attached a sample program on a 7x7 matrix. It does not fail when running this same problem. But I get failures in my code.&lt;/P&gt;

&lt;P&gt;zgetri fails Intel Inspector&amp;nbsp; with two data races when&amp;nbsp; called from a parallel region. See attached JPEG.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 19:04:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143390#M26479</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-08-10T19:04:04Z</dc:date>
    </item>
    <item>
      <title>Just to add some detail, when</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143391#M26480</link>
      <description>&lt;P&gt;Just to add some detail, when the zgetri is called in my threaded running code, with the exact same matrix as in the example, it will "sometimes" return&amp;nbsp; garbage numbers as per below in "work".&amp;nbsp; If it was working properly the first entry should be cast to int of value 7.&lt;/P&gt;

&lt;P&gt;I am wondering how this can possibly happen. When passed (-1) in lwork, zgetri should compute a work array size. I would assume that for small "n" such as 7, it would be a very simple calculation and should be thread safe.&lt;/P&gt;

&lt;P&gt;work={-nan,3.458459520889e-323#DEN}&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 19:53:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143391#M26480</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-08-10T19:53:33Z</dc:date>
    </item>
    <item>
      <title>Hi Andrew,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143392#M26481</link>
      <description>&lt;P&gt;Hi Andrew,&lt;/P&gt;

&lt;P&gt;Thanks for your report. I tried to reproduce the reproducer you provided with MKL 2018 Update 3 but it did not fail.&lt;/P&gt;

&lt;P&gt;Could you please provide more information regarding the last issue you reported (with incorrect work[0] value):&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;MKL_VERBOSE output&lt;/LI&gt;
	&lt;LI&gt;Platform details: OS, hardware&lt;/LI&gt;
	&lt;LI&gt;Environment: number of threads&lt;/LI&gt;
	&lt;LI&gt;Link line&lt;/LI&gt;
	&lt;LI&gt;Compiler version&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Also any additional diagnostics on your side would be very helpful (if it's Linux then it might be valgrind output, any additional printing - e.g. is garbage actually written to work[0] or it's simply was there before the call, etc). Also if you can provide built executable that demonstrates the problem - it would be also helpful.&lt;/P&gt;

&lt;P&gt;Also I was able to get the same output from Intel Inspector with two data races you mentioned but I believe they both are unrelated to this problem. The issues are about writing the same pointer/integer value to the same location from different threads. It is definitely a data race but on most systems such writes are atomic and shouldn't cause any issues. But anyway we will fix that issue in next releases.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Aug 2018 22:10:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143392#M26481</guid>
      <dc:creator>Eugene_C_Intel1</dc:creator>
      <dc:date>2018-08-10T22:10:00Z</dc:date>
    </item>
    <item>
      <title>Hi Eugene,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143393#M26482</link>
      <description>&lt;P&gt;Hi Eugene,&lt;/P&gt;

&lt;P&gt;Thanks for your reply. I spent some more time looking at this. The example I sent is probably not going to be a reproducer - I went off on a wrong track with that one ..except for one thing. Using Intel Inspector on Windows&amp;nbsp; I get a R/W data race and a W/W data race inside zgetri in that reproducer.&amp;nbsp; As an example, this is very frustrating when trying to diagnose problems as the reported data races inside MKL, and in particular in zgetri/zgetrf make me want to throw my hands in the air when trying to diagnose multithread problems in my own code.&lt;/P&gt;

&lt;P&gt;From my point of view here is my summary&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;After updating from 2018 Update 2 to 2018 Update 3 our QA process fails and/or crashes inside zgetrf/zgetri both on Windows and Linux&lt;/LI&gt;
	&lt;LI&gt;Reverting ONLY MKL DLLs to Update 2, the problems go away&lt;/LI&gt;
	&lt;LI&gt;The crash/failures are a bit random and do not happen when we use only one thread to call MKL&lt;/LI&gt;
	&lt;LI&gt;Post-mortem examining the data shows that the "pivots" array has a garbage number in it ( large -ve) that of course results in the crash&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I suppose I am asking that someone do a code review on Update 2-&amp;gt;Update 3 changes for zgetrf/zgetri. I am using MKL_DIRECT_CALL but the matrices I am seeing the crash in are 384x384&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Andrew&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Aug 2018 15:15:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-2018-Update-3-Windows-zgetri-crash/m-p/1143393#M26482</guid>
      <dc:creator>AndrewC</dc:creator>
      <dc:date>2018-08-13T15:15:28Z</dc:date>
    </item>
  </channel>
</rss>

