- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a mwe with the bug. I reproduced with the following setup:
export MKL_CBWR=COMPATIBLE
export MKL_VERBOSE=1
./foo
Output:
Foo is called
MKL_VERBOSE Intel(R) MKL 2018.0 Update 1 Product build 20171007 for Intel(R) 64 architecture Intel(R) Architecture processors, Lnx 3.20GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DGETRF(27,27,0x1f34280,27,0x7ffd36f28c20,27) 22.90ms CNR:COMPATIBLE Dyn:1 FastMM:1 TID:0 NThr:6 WDiv:HOST:+0.000
Calling MPI_Init:
foo compiled with:
mpicxx -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_lp64 -liomp5 -ldl -lpthread -o foo foo.cc
mpicxx tested: OpenMPI {2.1.2,3.0.1}
It works on another computer with Intel MKL 2015 installed:
./foo
Foo is called
MKL_VERBOSE Intel(R) MKL 11.2 Update 2 Product build 20150120 for Intel(R) 64 architecture Intel(R) Architecture processors, Lnx 2.40GHz lp64 intel_thread NMICDev:0
MKL_VERBOSE DGETRF(27,27,0xdc7c20,27,0x7ffc8b6f1870,27) 8.38ms CNR:COMPATIBLE Dyn:1 FastMM:1 TID:0 NThr:12 WDiv:HOST:+0.000
Calling MPI_Init:
MPI_Init done...
Foo is called
MKL_VERBOSE DGETRF(27,27,0xf15610,27,0x7ffc8b6f18e0,27) 127.27us CNR:COMPATIBLE Dyn:1 FastMM:1 TID:0 NThr:12 WDiv:HOST:+0.000
It also works if I set MKL_NUM_THREADS=1
Thanks,
Eric
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Eric, could you please check this case with MKL 2018 u2 we released 2 weeks ago!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi M. Gennady,
we installed the latest update this morning and I tested it:
Foo is called MKL_VERBOSE Intel(R) MKL 2018.0 Update 2 Product build 20180127 for Intel(R) 64 architecture Intel(R) Architecture processors, Lnx 3.20GHz lp64 intel_thread MKL_VERBOSE DGETRF(27,27,0x1bf4280,27,0x7ffff1fce520,27) 17.55ms CNR:COMPATIBLE Dyn:1 FastMM:1 TID:0 NThr:6 Calling MPI_Init:
It hangs at the same place. I have been suggested to change mkl_blacs_intelmpi_lp64 to mkl_blacs_openmpi_lp64, but it changed nothing! I also changed -lmkl_intel_thread to lmkl_gnu_thread, but still have the same problem.
Here is the backtrace when it hang:
(gdb) bt #0 0x00007fffef681e47 in sched_yield () from /lib64/libc.so.6 #1 0x00007ffff0a5fe74 in _INTERNAL_26_______src_z_Linux_util_cpp_d7ee2e5e::__kmp_atfork_prepare () at ../../src/z_Linux_util.cpp:1534 #2 0x00007fffef66852d in fork () from /lib64/libc.so.6 #3 0x00007fffea07c842 in rte_init.part () from /opt/openmpi-3.0.1/lib/openmpi/mca_ess_singleton.so #4 0x00007fffef30f8c6 in orte_init () from /opt/openmpi-3.0.1/lib/libopen-rte.so.40 #5 0x00007ffff028f4ec in ompi_mpi_init () from /opt/openmpi-3.0.1/lib/libmpi.so.40 #6 0x00007ffff02b7bdb in PMPI_Init () from /opt/openmpi-3.0.1/lib/libmpi.so.40 #7 0x0000000000400851 in main ()
I also have followed a suggestion from a reply on OpenMPI issue I have opened:
https://github.com/open-mpi/ompi/issues/5070#issuecomment-381572059
But as reported, it did not fixed anything...
Thanks,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Gilles Gouaillardet from OpenMPI narrowed the problem down to a simple call to "fork" that is causing the bug, independently from OpenMPI itself. Please give a try at his reproducer:
cat a.cpp:
#include <unistd.h> #include <cstdio> #include <mkl.h> int foo() { printf("Foo is called\n"); const int lN = 27; double* lA = (double*)malloc(lN*lN*sizeof(double)); for (int i = 0; i < lN*lN; ++i) { lA = i; } int lPiv[lN]; int lRes; dgetrf_(&lN, &lN, lA, &lN, lPiv, &lRes); return lRes; } int main(int pArgc, char* pArgv[]) { foo(); printf("Forking:\n"); pid_t pid = fork(); if (0 == pid) { exit(0); } printf("Forked...\n"); foo(); return 0; }
compiled and launched with:
g++ a.cpp -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -liomp5
$ MKL_CBWR=COMPATIBLE ./a.out
(extracted from Gilles reply: https://github.com/open-mpi/ompi/issues/5070#issuecomment-382224431 )
Thanks,
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Eric. I see the similar hanging with intel threading ( lmkl_intel_thread) also. the case is escalated. We will keep into updated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
Is it same to this? https://software.intel.com/en-us/forums/intel-c-compiler/topic/758961
Thanks,
Alex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Eric, the suggested workaround the problem is to use export KMP_INIT_AT_FORK=FALSE till compiler team will not fix the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, thanks, it works for me!
You wrote:
"...FALSE till compiler team will not fix the problem."
but I hope you meant "till compiler team will fix the problem."... ???
:)
Thanks,
Eric

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page