MKL DFT module not using AVX2 backend when the processor does support it

Jason_R · ‎05-10-2017

I have an application that uses MKL to perform DFTs. I'm using the latest release of MKL on CentOS 7. For a version reference, the MKL shared libraries are installed to /opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64/.

I noticed while profiling my application that MKL is dispatching to the libmkl_avx.so backend library instead of libmkl_avx2.so as I would expect. This results in slower performance than I would expect. The host processor (a Haswell Xeon) does support AVX2 and FMA, however. Here is a snippet from /proc/cpuinfo:

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 63
model name    : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
stepping    : 2
microcode    : 0x36
cpu MHz        : 2599.968
cache size    : 20480 KB
physical id    : 0
siblings    : 16
core id        : 0
cpu cores    : 8
apicid        : 0
initial apicid    : 0
fpu        : yes
fpu_exception    : yes
cpuid level    : 15
wp        : yes
flags        : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf 
eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr 
pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx 
f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority 
ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_
llc cqm_occup_llc
bogomips    : 4800.23
clflush size    : 64
cache_alignment    : 64
address sizes    : 46 bits physical, 48 bits virtual
power management:

The runtime CPU feature detection doesn't seem to be working properly for this processor. I tried to fool it by renaming libmkl_avx.so and symlinking it to libmkl_avx2.so instead (so even if MKL detected that it should load the AVX library, it would instead get the AVX2 library via the symlink). After doing that, I received the following error message:

Intel MKL WARNING: Library libmkl_avx.so (MKL type 5) is not suitable for this processor 
(MKL type 4).

This again suggests that MKL believes that the host processor can't use the AVX2 library for some reason. Is this a known issue, and if so, is there a workaround?

Jason_R · ‎05-10-2017

In debugging this problem, I tried calling mkl_get_version() to see what it reported for my processor. It yields:

Major version:           2017
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20170126
Platform:                Intel(R) 64 architecture
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors

So it appears that MKL is detecting the AVX2 capability, but it's still dispatching to the AVX library instead. I should note that in this application, I'm doing 1-D real-to-complex DFTs.

Jing_Xu · ‎05-10-2017

Would you like to try https://software.intel.com/en-us/mkl-linux-developer-guide-instruction-set-specific-dispatching-on-intel-architectures to see whether it works?

Jason_R · ‎05-10-2017

I set the environment variable MKL_ENABLE_INSTRUCTIONS to AVX2 and it started using the AVX2 library. I then unset the environment variable and it kept using AVX2 as expected. I also verified that setting MKL_ENABLE_INSTRUCTIONS to AVX caused it to only use the AVX library. So it appears to be working properly now.

I was experimenting with that environment variable earlier today, but I thought I was still seeing the AVX-only behavior even with the variable unset. I must have been doing something wrong. Thanks.