I have an application that uses MKL to perform DFTs. I'm using the latest release of MKL on CentOS 7. For a version reference, the MKL shared libraries are installed to /opt/intel/compilers_and_libraries_2017.2.174/linux/mkl/lib/intel64/.
I noticed while profiling my application that MKL is dispatching to the libmkl_avx.so backend library instead of libmkl_avx2.so as I would expect. This results in slower performance than I would expect. The host processor (a Haswell Xeon) does support AVX2 and FMA, however. Here is a snippet from /proc/cpuinfo:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 63 model name : Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz stepping : 2 microcode : 0x36 cpu MHz : 2599.968 cache size : 20480 KB physical id : 0 siblings : 16 core id : 0 cpu cores : 8 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 15 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_ llc cqm_occup_llc bogomips : 4800.23 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management:
The runtime CPU feature detection doesn't seem to be working properly for this processor. I tried to fool it by renaming libmkl_avx.so and symlinking it to libmkl_avx2.so instead (so even if MKL detected that it should load the AVX library, it would instead get the AVX2 library via the symlink). After doing that, I received the following error message:
Intel MKL WARNING: Library libmkl_avx.so (MKL type 5) is not suitable for this processor (MKL type 4).
This again suggests that MKL believes that the host processor can't use the AVX2 library for some reason. Is this a known issue, and if so, is there a workaround?
In debugging this problem, I tried calling mkl_get_version() to see what it reported for my processor. It yields:
Major version: 2017 Minor version: 0 Update version: 2 Product status: Product Build: 20170126 Platform: Intel(R) 64 architecture Processor optimization: Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
So it appears that MKL is detecting the AVX2 capability, but it's still dispatching to the AVX library instead. I should note that in this application, I'm doing 1-D real-to-complex DFTs.
I set the environment variable MKL_ENABLE_INSTRUCTIONS to AVX2 and it started using the AVX2 library. I then unset the environment variable and it kept using AVX2 as expected. I also verified that setting MKL_ENABLE_INSTRUCTIONS to AVX caused it to only use the AVX library. So it appears to be working properly now.
I was experimenting with that environment variable earlier today, but I thought I was still seeing the AVX-only behavior even with the variable unset. I must have been doing something wrong. Thanks.