Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Employee
92 Views

Runtime error in using MKL l_mkl_2018.0.128

I am facing a runtime error while running a deep learning model on Xeon with Ubuntu 16.04 with gcc version gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4.

I1030 15:56:03.653736 115861 caffe.cpp:361] Performing Backward
*** Aborted at 1509404163 (unix time) try "date -d @1509404163" if you are using GNU date ***
PC: @     0x7f1c75d85c40 mkl_blas_avx2_xscopy
*** SIGSEGV (@0xc35fb0) received by PID 115861 (TID 0x7f1c716cbd80) from PID 12804016; stack trace: ***
    @     0x7f1c90d2ecb0 (unknown)
    @     0x7f1c75d85c40 mkl_blas_avx2_xscopy
    @     0x7f1c79702fc5 mkl_blas_scopy
    @     0x7f1c7b2c4ac3 __kmp_invoke_microtask
    @     0x7f1c7b293257 __kmp_invoke_task_func
    @     0x7f1c7b2928d5 __kmp_launch_thread
    @     0x7f1c7b2c4fa4 _INTERNAL_26_______src_z_Linux_util_cpp_16f8393c::__kmp_launch_worker()
    @     0x7f1c8e914184 start_thread
    @     0x7f1c90df237d (unknown)
    @                0x0 (unknown)

 

Is there a requirement on minimum gcc version ? Any ideas on what is going wrong?

 

 

0 Kudos
3 Replies
Highlighted
Employee
92 Views

Hi Sujay,

Could you please provide detail info about CPU, caffe version (Intel caffe?), and please also export MKL_VERBOSE=1 to have a check, and paste result here. Thanks.

Best regards,
Fiona

0 Kudos
Highlighted
Employee
92 Views

CPU version : Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz

Custom caffe : https://github.com/onalbach/caffe-deep-shading

I have changed the BLAS library to MKL in the config file and provided necessary include/library paths. Is there anything more that needs to be done?

MKL_VERBOSE Intel(R) MKL 2018.0 Product build 20170720 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 3.00GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DSCAL(64,0x7fff87de4620,0x1b8bf80,1) 3.41ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,8,50,0x7fff87de4ae8,0x7fc01e382010,65536,0x1b96dc0,50,0x7fff87de4af0,0x7fc01f003010,65536) 26.74ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,8,1,0x7fff87de4af8,0x7fc03c2d5010,65536,0x1b8cdc0,1,0x7fff87de4b00,0x7fc01f003010,65536) 485.24us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,1,200,0x7fff87de4ae8,0x7fbfbcdff010,65536,0x1b79c80,200,0x7fff87de4af0,0x7fc01c039010,65536) 2.22ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SGEMM(N,N,65536,1,1,0x7fff87de4af8,0x7fc03c294010,65536,0x1b7a1f0,1,0x7fff87de4b00,0x7fc01c039010,65536) 815.90us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SCOPY(1024,0x7fc00c02d010,1,0x7fc00c02d010,1) 20.30us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SSCAL(1024,0x7fff87de4aa8,0x7fc00c02d010,1) 14.06us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SCOPY(1024,0x1b9a280,1,0x7fc00c02d010,1) 773ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SSCAL(1024,0x7fff87de4aa8,0x7fc00c02d010,1) 389ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SDOT(1024,0x1bf7d00,1,0x1bf8d10,1) 14.50us CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000
MKL_VERBOSE SDOT(1,0x1b2aa00,1,0x1b87c70,1) 905ns CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:24 WDiv:HOST:+0.000

Let me know if you need more information.

 

 

0 Kudos
Highlighted
Employee
92 Views

Hi Sujay,

According to you provided MKL verbose info, the MKL routines are already used successfully. I suppose, the problem may happen on some place when Caffe API calling scopy function with invalid attributes/pointer. Please check with your caffe program if you set any invalid/wrong attributes for some caffe APIs, or use GDB to debug finding the stack which processed by caffe lead to this runtime error.

Best regards,
Fiona

0 Kudos