- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am experiencing a segfault on Linux with an application of mine when I link it against MKL shipped with composer_xe_2013.3.163 (Update 3 - March 2013), which should be 11.0.3 according to http://software.intel.com/en-us/articles/which-version-of-the-intel-ipp-intel-mkl-and-intel-tbb-libraries-are-included-in-the-intel.
My application is multi-threaded and it uses pthreads. The segfault happens in cblas_dgemm when I spawn 8 threads: runs with 1, 2 or 4 threads work fine. I am linking against libmkl_intel_lp64.so, libmkl_core.so, libmkl_sequential.so. I have the following environment:
MKL_DISABLE_FAST_MM=1
MKL_SERIAL=YES
MKL_NUM_THREADS=1
The same binary compiled with composer_xe_2013.3.163 runs perfectly on 8 threads if I point LD_LIBRARY_PATH to the MKL libraries shipped with Intel Compilers version 11.1.069. So it really seems to be a version-specific issue.
I have tried to set:
ulimit -s unlimited
MKL_DOMAIN_NUM_THREADS="MKL_DOMAIN_ALL=1"
OMP_NUM_THREADS=1
OMP_DYNAMIC=FALSE
MKL_DYNAMIC=FALSE
OMP_NESTED=FALSE
but it makes no difference. Here I attach the valgrind trace:
==20175== Thread 3:
==20175== Invalid read of size 8
==20175== at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175== by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175== by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175== by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175== by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175== by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175== by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175== by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175== by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)
==20175== Address 0xd0 is not stack'd, malloc'd or (recently) free'd
==20175==
==20175==
==20175== Process terminating with default action of signal 11 (SIGSEGV)
==20175== Access not within mapped region at address 0xD0
==20175== at 0x53DB0DA: mkl_serv_malloc (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175== by 0x860C01B: mkl_blas_mc_dgemm_get_bufs (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x8684768: mkl_blas_mc_xdgemm_par (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x8683B4B: mkl_blas_mc_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_mc.so)
==20175== by 0x53ED8DB: mkl_blas_xdgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_core.so)
==20175== by 0x662A7CE: mkl_blas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_sequential.so)
==20175== by 0x4CF0AA8: DGEMM (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175== by 0x4D02452: cblas_dgemm (in /mnt/XI/prog/compilers/intel/ics_2013/composer_xe_2013.3.163/mkl/lib/intel64/libmkl_intel_lp64.so)
==20175== by 0x455204: pred_y_values (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175== by 0x4689F7: lmo_cv_thread (in /mnt/XI/home/toscopa1/open3dtools/bin/open3dqsar)
==20175== by 0x3B0A00683C: start_thread (in /lib64/libpthread-2.5.so)
==20175== by 0x3B094D4F8C: clone (in /lib64/libc-2.5.so)
I consistenly get this error on "Address 0xd0".
As I mentioned, my program works perfectly when linked against older Intel MKL versions, as well as ATLAS or Sun Performance LIbrary.
I would be very glad if you could indicate a way to solve my problem.
Thanks, best regards,
Paolo
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just wish to add a detail which may help: the issue happens only when I make repeated calls to cblas_dgemm during an iteration. The iteration in a function where I call cblas_dgemm 3-4 times, and it does not always happen on the same call, but in a random fashion. Calling mkl_free_buffers() after each iteration does not make a difference (as expected, since I'm running with MKL_DISABLE_FAST_MM=1).
Paolo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for replying to myself, but I just realized that the problem disappears updating to the latest Composer_xe_2013 bundle 2013.5.192 - so it really looks like it was a bug in 2013.3.163.
Thanks all the same, cheers
Paolo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, it may be the well known issue introduced in 11.0.update 4. pls see more detail from here : http://software.intel.com/en-us/articles/svd-multithreading-bug-in-mkl
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Gennady,
also using the lastest ICC compiler and MKL libraries I still get a bunch of valgrind warnings about possible data races (I am using 8 threads). The same program compiled with gcc and linked against ATLAS libraries does not raise any valgrind warning. The two programs give exactly the same results, and the results do not change when running on 1 thread or 8 threads, so I would conclude the warnings on ICC are harmless. Could you please confirm that? Can you guess what might be the reason of those warnings?
Many thanks in advance, best regards
Paolo
$ valgrind --tool=helgrind open3dqsar.icc2013 -i sample_input_MM2.inp -o sample_input_MM2.out.icc2013
==9381== Helgrind, a thread error detector
==9381== Copyright (C) 2007-2011, and GNU GPL'd, by OpenWorks LLP et al.
==9381== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==9381== Command: /home/ptosco/open3dtools/bin/open3dqsar.icc2013 -i sample_input_MM2.inp -o /dev/null
==9381==
==9382== Warning: invalid file descriptor 1014 in syscall close()
==9383== Warning: invalid file descriptor 1014 in syscall close()
==9384== Warning: invalid file descriptor 1014 in syscall close()
==9385== Warning: invalid file descriptor 1014 in syscall close()
==9381== ---Thread-Announcement------------------------------------------
==9381==
==9381== Thread #2 was created
==9381== at 0x36B8EE769E: clone (in /lib64/libc-2.12.so)
==9381== by 0x36B960673F: do_clone.clone.0 (in /lib64/libpthread-2.12.so)
==9381== by 0x36B9606C21: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==9381== by 0x4A0B97C: pthread_create_WRK (hg_intercepts.c:255)
==9381== by 0x4A0BA90: pthread_create@* (hg_intercepts.c:286)
==9381== by 0x4224C6: calc_field (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4174ED: parse_input (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x406940: main (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381==
==9381== ---Thread-Announcement------------------------------------------
==9381==
==9381== Thread #4 was created
==9381== at 0x36B8EE769E: clone (in /lib64/libc-2.12.so)
==9381== by 0x36B960673F: do_clone.clone.0 (in /lib64/libpthread-2.12.so)
==9381== by 0x36B9606C21: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==9381== by 0x4A0B97C: pthread_create_WRK (hg_intercepts.c:255)
==9381== by 0x4A0BA90: pthread_create@* (hg_intercepts.c:286)
==9381== by 0x4224C6: calc_field (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4174ED: parse_input (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x406940: main (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381==
==9381== ----------------------------------------------------------------
==9381==
==9381== Possible data race during read of size 8 at 0x7E4B20 by thread #2
==9381== Locks held: none
==9381== at 0x4FB9C0: __svml_rint2 (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4259D8: calc_mm_thread (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4A0BB19: mythread_wrapper (hg_intercepts.c:219)
==9381== by 0x36B9607850: start_thread (in /lib64/libpthread-2.12.so)
==9381== by 0x7A646FF: ???
==9381==
==9381== This conflicts with a previous write of size 8 by thread #4
==9381== Locks held: none
==9381== at 0x4FBA22: __svml_rint2_dispatch_table_init (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381==
==9381== ---Thread-Announcement------------------------------------------
==9381==
==9381== Thread #19 was created
==9381== at 0x36B8EE769E: clone (in /lib64/libc-2.12.so)
==9381== by 0x36B960673F: do_clone.clone.0 (in /lib64/libpthread-2.12.so)
==9381== by 0x36B9606C21: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==9381== by 0x4A0B97C: pthread_create_WRK (hg_intercepts.c:255)
==9381== by 0x4A0BA90: pthread_create@* (hg_intercepts.c:286)
==9381== by 0x446816: parallel_cv (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46A2B5: uvepls (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x40E059: parse_input (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x406940: main (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381==
==9381== ---Thread-Announcement------------------------------------------
==9381==
==9381== Thread #18 was created
==9381== at 0x36B8EE769E: clone (in /lib64/libc-2.12.so)
==9381== by 0x36B960673F: do_clone.clone.0 (in /lib64/libpthread-2.12.so)
==9381== by 0x36B9606C21: pthread_create@@GLIBC_2.2.5 (in /lib64/libpthread-2.12.so)
==9381== by 0x4A0B97C: pthread_create_WRK (hg_intercepts.c:255)
==9381== by 0x4A0BA90: pthread_create@* (hg_intercepts.c:286)
==9381== by 0x446816: parallel_cv (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46A2B5: uvepls (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x40E059: parse_input (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x406940: main (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381==
==9381== ----------------------------------------------------------------
==9381==
==9381== Possible data race during read of size 4 at 0x6575EDC by thread #19
==9381== Locks held: none
==9381== at 0x53E2207: mkl_serv_lock (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x7F: ???
==9381==
==9381== This conflicts with a previous write of size 4 by thread #18
==9381== Locks held: none
==9381== at 0x53E2220: mkl_serv_unlock (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x7F: ???
==9381==
==9381== ----------------------------------------------------------------
==9381==
==9381== Possible data race during read of size 4 at 0x6575EA0 by thread #19
==9381== Locks held: none
==9381== at 0x53DFD5D: mkl_serv_malloc (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x4D846BD: DGETRF (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_intel_lp64.so)
==9381== by 0x458970: pred_y_values (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46C1C7: lmo_cv_thread (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4A0BB19: mythread_wrapper (hg_intercepts.c:219)
==9381== by 0x36B9607850: start_thread (in /lib64/libpthread-2.12.so)
==9381== by 0x98676FF: ???
==9381==
==9381== This conflicts with a previous write of size 4 by thread #18
==9381== Locks held: none
==9381== at 0x53DFD88: mkl_serv_malloc (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x4D846BD: DGETRF (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_intel_lp64.so)
==9381== by 0x458970: pred_y_values (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46C1C7: lmo_cv_thread (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4A0BB19: mythread_wrapper (hg_intercepts.c:219)
==9381== by 0x36B9607850: start_thread (in /lib64/libpthread-2.12.so)
==9381== by 0xA2686FF: ???
==9381==
==9381== ----------------------------------------------------------------
==9381==
==9381== Possible data race during write of size 4 at 0x6575EA0 by thread #19
==9381== Locks held: none
==9381== at 0x53DFD88: mkl_serv_malloc (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x4D846BD: DGETRF (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_intel_lp64.so)
==9381== by 0x458970: pred_y_values (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46C1C7: lmo_cv_thread (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4A0BB19: mythread_wrapper (hg_intercepts.c:219)
==9381== by 0x36B9607850: start_thread (in /lib64/libpthread-2.12.so)
==9381== by 0x98676FF: ???
==9381==
==9381== This conflicts with a previous write of size 4 by thread #18
==9381== Locks held: none
==9381== at 0x53DFD88: mkl_serv_malloc (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x4D846BD: DGETRF (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_intel_lp64.so)
==9381== by 0x458970: pred_y_values (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x46C1C7: lmo_cv_thread (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
==9381== by 0x4A0BB19: mythread_wrapper (hg_intercepts.c:219)
==9381== by 0x36B9607850: start_thread (in /lib64/libpthread-2.12.so)
==9381== by 0xA2686FF: ???
==9381==
==9381== ----------------------------------------------------------------
==9381==
==9381== Possible data race during write of size 4 at 0x6575EDC by thread #19
==9381== Locks held: none
==9381== at 0x53E2220: mkl_serv_unlock (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x7F: ???
==9381==
==9381== This conflicts with a previous write of size 4 by thread #18
==9381== Locks held: none
==9381== at 0x53E2220: mkl_serv_unlock (in /opt/intel/composer_xe_2013.5.192/mkl/lib/intel64/libmkl_core.so)
==9381== by 0x7F: ???
==9381==
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These cases might be just false positives. We have noticed the similar cases and sometimes valgrind team confirmed these cases. I would recomend you to check the problem with Intel Inspector ( http://software.intel.com/en-us/intel-inspector-xe )- you can try evaluate it and check the problem with this tool.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
==9381== Possible data race during read of size 8 at 0x7E4B20 by thread #2
==9381== Locks held: none
==9381== at 0x4FB9C0: __svml_rint2 (in /home/ptosco/open3dtools/bin/open3dqsar.icc2013)
Dear Sergey,
doesn't this mean that the data race was inside svml_rint2()? I don't get this warning when I build with gcc (which uses up glibc rint()). So I thought this was related to ICC libs, though I am pretty sure this is harmless, as also Gennady said.
Cheers,
p.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Sergey,
I am using single-threaded MKL since I set OMP_NUM_THREADS=1 and MKL_NUM_THREADS=1. There are no performance issues, and cblas_dgemm is not involved in the valgrind complaints, which are only about svml_rint2 and dgetrf. The problem with cblas_dgemm was solved by updating to the latest MKL library version. Here I was just wondering if some multithreading-related issue was still present (in spite of correctly computed results) since I got a few warnings by Valgrind. But admittedly Valgrind complaints often with Intel binaries which work just fine, so I guess they are false alarms. I think Valgrind works best with gcc-compiled binaries.
Thanks for your interest in this matter, best regards
Paolo

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page