- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After installing MKL 11.0 Update 4 over MKL 11.0 Update 2 on Linux our QA process is SIGSEGV at...
#0 0x00002aaab745874a in mkl_serv_malloc ()
#1 0x00002aaab7f6bbcc in mkl_blas_mc3_dgemm_get_bufs ()
#2 0x00002aaab6ae8a99 in mkl_blas_mc3_xdgemm_par ()
#3 0x00002aaab4c2cf74 in mkl_blas_xdgemm_par ()
#4 0x00002aaab4b81ecb in mkl_blas_dgemm_2d_bsrc ()
#5 0x00002aaab4b7b489 in gemm_host ()
#6 0x00002aaabb92b4f3 in L_kmp_invoke_pass_parms ()
from /opt/intel/composer_xe_2013.4.183/compiler/lib/intel64/libiomp5.so
100% reproducible in certain cases.
Reverting to MKL Update 2 solves the issue.
Seems to happen after many iterations , and many threads computation created/destroyed.
Note we are running multiple (boost) threads that call MKL. We call MKL_Thread_Free_Buffers at the completion of each thread.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew, How can we reproduce the issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only way to reproduce is for Intel to have a copy of our software and an evaluation license from us. I will pursue this through premier support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok. we will take this issue as soon as you will submit it there
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, I created a ticket, but I said to reproduce Intel will have to download 400MB installer and license file but no response to that question.
No doubt, this will be a painful process for everyone to reproduce,but I cannot use MKL 11.0 Update 4 until this is resolved.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Premier support issue # 697704
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I hope you put some of the missing details in your issue submission.
I don't see any clues as to which checklists you have followed; there are several good ones, including
http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors
I can't even guess whether you explored simple remedies such as increasing stack (both global and thread stack) or using heap options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not sure what you mean by "obsolete"? On Linux, signals such as SIGSEGV are a fundamental part of the OS. A segementation violation can be caused by accessing an address that is illegal. Such as dereferencing a NULL pointer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
TimP (Intel) wrote:
I hope you put some of the missing details in your issue submission.
I don't see any clues as to which checklists you have followed; there are several good ones, including
http://software.intel.com/en-us/articles/determining-root-cause-of-sigse...
I can't even guess whether you explored simple remedies such as increasing stack (both global and thread stack) or using heap options.
The details are that MKL 11 Update 2 passes 300-400 QA tests without failure, MKL Update 4 fails 6+ of those tests with a segmentation violation inside MKL, reproducibly. I have supplied premier support with a reproducible example. I will update this thread with the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Currently I am having to give the Premier support person a tutorial in GDB.
But heres a clue for anyone at Intel who cares about this issue.
Does this look like a race condition in MKL?
Thread 1 is crashing with a segmentation violation in....
#11 0x00002aaab75d40da in mkl_serv_malloc ()
from /opt/intel/composer_xe_2013.4.183/mkl/lib/intel64/libmkl_core.so
#12 0x00002b93a4980aec in mkl_blas_mc3_dgemm_get_bufs ()
Thread 2 is calling
#0 0x00002aaab75dfe00 in mkl_blas_dgemm_set_blks_size ()
#1 0x00002aaab66135d9 in gemm_host ()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Andrew, we definitely care and the local MKL team is now looking into the issue. We will report back once we have more information. -Shane
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I just installed MKL 11 Update 5 and the problem has gone away....looks like someone found and fixed the isssue....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To close the loop on this issue. Intel premier support confirmed there was an issue in Update 4 and it was fixed in Update 5. Thanks guys!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
we are always welcome to help you :)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page