Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6426 Discussions

Runtime error in using MKL l_mkl_2018.0.128

Sujay_K_Intel1
Employee
325 Views

I am facing a runtime error while running a deep learning model on Xeon with Ubuntu 16.04 with gcc version gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4.

I1030 15:56:03.653736 115861 caffe.cpp:361] Performing Backward
*** Aborted at 1509404163 (unix time) try "date -d @1509404163" if you are using GNU date ***
PC: @     0x7f1c75d85c40 mkl_blas_avx2_xscopy
*** SIGSEGV (@0xc35fb0) received by PID 115861 (TID 0x7f1c716cbd80) from PID 12804016; stack trace: ***
    @     0x7f1c90d2ecb0 (unknown)
    @     0x7f1c75d85c40 mkl_blas_avx2_xscopy
    @     0x7f1c79702fc5 mkl_blas_scopy
    @     0x7f1c7b2c4ac3 __kmp_invoke_microtask
    @     0x7f1c7b293257 __kmp_invoke_task_func
    @     0x7f1c7b2928d5 __kmp_launch_thread
    @     0x7f1c7b2c4fa4 _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker()
    @     0x7f1c8e914184 start_thread
    @     0x7f1c90df237d (unknown)
    @                0x0 (unknown)

 

Is there a requirement on minimum gcc version ? Any ideas on what is going wrong?

 

 

0 Kudos
3 Replies
Zhen_Z_Intel
Employee
325 Views

Hi Sujay,

Could you please provide detail info about CPU, caffe version (Intel caffe?), and please also export MKL_VERBOSE=1 to have a check, and paste result here. Thanks.

Best regards,
Fiona

Sujay_K_Intel1
Employee
325 Views

CPU version : Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz

Custom caffe : https://github.com/onalbach/caffe-deep-shading

I have changed the BLAS library to MKL in the config file and provided necessary include/library paths. Is there anything more that needs to be done?

MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.00GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DSCAL(64,0x7fff87de4620,0x1b8bf80,1) 3.41ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,8,50,0x7fff87de4ae8,0x7fc01e382010,65536,0x1b96dc0,50,0x7fff87de4af0,0x7fc01f003010,65536) 26.74ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,8,1,0x7fff87de4af8,0x7fc03c2d5010,65536,0x1b8cdc0,1,0x7fff87de4b00,0x7fc01f003010,65536) 485.24us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,1,200,0x7fff87de4ae8,0x7fbfbcdff010,65536,0x1b79c80,200,0x7fff87de4af0,0x7fc01c039010,65536) 2.22ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,1,1,0x7fff87de4af8,0x7fc03c294010,65536,0x1b7a1f0,1,0x7fff87de4b00,0x7fc01c039010,65536) 815.90us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SCOPY(1024,0x7fc00c02d010,1,0x7fc00c02d010,1) 20.30us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SSCAL(1024,0x7fff87de4aa8,0x7fc00c02d010,1) 14.06us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SCOPY(1024,0x1b9a280,1,0x7fc00c02d010,1) 773ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SSCAL(1024,0x7fff87de4aa8,0x7fc00c02d010,1) 389ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SDOT(1024,0x1bf7d00,1,0x1bf8d10,1) 14.50us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SDOT(1,0x1b2aa00,1,0x1b87c70,1) 905ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000

Let me know if you need more information.

 

 

Zhen_Z_Intel
Employee
325 Views

Hi Sujay,

According to you provided MKL verbose info, the MKL routines are already used successfully. I suppose, the problem may happen on some place when Caffe API calling scopy function with invalid attributes/pointer. Please check with your caffe program if you set any invalid/wrong attributes for some caffe APIs, or use GDB to debug finding the stack which processed by caffe lead to this runtime error.

Best regards,
Fiona

Reply