Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Zhu__Deqi
Beginner
517 Views

illegal instruction error with newer AMD CPU

One of the the illegal instructions is from function mkl_blas_def_dgemm_kernel_bdz() which is in libmkl_core.a

Need help!

0 Kudos
6 Replies
Ruqiu_C_Intel
Employee
517 Views

Hi Deqi,

Thanks for your question!

Could you provide which MKL version that you used? And send us the reproducer, then we can test and verify it in our side.

Best Regards,

Ruqiu

Gennady_F_Intel
Moderator
517 Views

and plus with what Rugiu asked into the previous comment, we also need to know if the problem happens on some specific CPU type, then give us this info also.

Zhu__Deqi
Beginner
517 Views

Thanks for your  reply, will try to collect the /proc/cpuinfo from customer side soon, 

MKL version is 10.0.3.020

------------------------

Below is the stack trace when SIGILL is caught, the function appears to be

mkl_blas_def_dgemm_kernel_bdz . The assembly code for 0x17970c2 seems to be

"vfmaddpd %ymm0,%ymm12,%ymm13,%ymm0" . 
 

 

/lib64/libc.so.6(+0x363f0)[0x2ab7d36b33f0]
 

/RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static(mkl_blas_def_dgemm_kernel_bdz+0xd2)[0x17970c2]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static(mkl_blas_def_xdgemm_bdz+0x565)[0x1795425]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static(mkl_blas_def_xdgemm+0xff7)[0xe6aee7]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static(DGEMM+0x12c)[0xa3b21c]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static[0x8ea299]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static[0x8ee8c6]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static[0x8ef220]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static[0x8f45eb]
./RedHawk_Linux64e6_V2020R1.1/bin/asimplus_static[0x8f5e42]

Zhu__Deqi
Beginner
517 Views

Here is the CPU info:

 

rocessor : 0
vendor_id : AuthenticAMD
cpu family  : 23
model   : 49
model name  : AMD EPYC 7702P 64-Core Processor
stepping  : 0
microcode : 0x8301025
cpu MHz   : 2000.000
cache size  : 512 KB
physical id : 0
siblings  : 128
core id   : 0
cpu cores : 64
apicid    : 0
initial apicid  : 0
fpu   : yes
fpu_exception : yes
cpuid level : 16
wp    : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc art rep_good nopl xtopology nonstop_tsc extd_apicid aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 cpb cat_l3 cdp_l3 hw_pstate retpoline_amd ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip overflow_recov succor smca
bogomips  : 3992.84
TLB size  : 3072 4K pages
clflush size  : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
 

Gennady_F_Intel
Moderator
517 Views

>> MKL version is 10.0.3.020

This version has been released > 8 years ago and therefore no longer supported. 

you could try the latest 2019 and let us know the result.

Gennady_F_Intel
Moderator
517 Views

+ the link where you can download the latest MKL 2019 bits for free - https://software.intel.com/en-us/mkl/choose-download

Reply