Re: BUG: Race condition in Intel MKL Update 3 matrix multiplication.

Tetrakist · ‎10-01-2020

Note: Upon further research, this erroneous behavior is documented:

From https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921207 :

On Sun, Feb 03, 2019 at 12:07:20PM +0000, Mo Zhou wrote:
> It turns out that the incorrect matrix product is a result of
> gomp + iomp library clash: octave is linked against the GNU OMP,
> while libmkl-rt.so invokes Intel(LLVM) OMP by default.

I got in touch with MKL team and they confirmed that the iomp+gomp
mixture is actually a very common error among users. They plan to change
the loading mechanism of libmkl-rt for the 2020 production line, to
avoid iomp+gomp clash (sounds like yet another magic).

So let's keep this bug open for both MKL and Octave for a while,
in case any other user came across similar errors. Maybe this
bug will be fixed in the late 2019 (they released MKL 2019 in late
2018).

Clearly, this has not been fixed.

=====

In R 4.0.2 on Debian 10 running the Intel(R) Math Kernel Library 2020 Update 3 for Linux, the following R code produces inconsistent results:

==Code==

n <- 10000
d <- as.matrix(data.frame(
var.a = seq(0,1, length.out = n),
var.b = seq(0,1, length.out = n,
var.c = seq(0,1, length.out = n),
var.d = seq(0,1, length.out = n)
))

for(i in 1:10){
print(max((d %*% (c(1,1,1,1)))))

}

==OUTPUT==

[1] 8
[1] 8
[1] 4
[1] 12
[1] 8
[1] 4
[1] 8
[1] 12
[1] 8
[1] 14.38524

=====

Removal of the Intel MKL package results in consistent output. Alternately, setting n <- 1000, produces consistent output. When n<-1000, only one core runs the code. When n <- 10000, then all cores run the code. The behavior is also present if the Debian 10 package "libmkl-rt" version 2020.1.217-2 is installed.

==Misc. Info==

Debian 10 R package:

r-base-core 4.0.2-1

Intel MKL package (64-bit):

intel-mkl-core-rt-2020.3-279

/proc/cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz
stepping : 7
microcode : 0x710
cpu MHz : 1197.151
cache size : 15360 KB
physical id : 0
siblings : 6
core id : 0
cpu cores : 6
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 5786.30
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

Khang_N_Intel · ‎05-25-2021

Hi Tetrakist,

It has been fixed!

The latest version of oneMKL is 2021.2.

I linked R 4.0.5 to oneMKL 2021.2 and your code works as expected!

[1] 16
[1] 16
[1] 16
[1] 16
[1] 16
[1] 16
[1] 16
[1] 16
[1] 16
[1] 16

Best,

Khang

BUG: Race condition in Intel MKL Update 3 matrix multiplication.

Code Samples

Error