Community
cancel
Showing results for 
Search instead for 
Did you mean: 
may_ka
Beginner
112 Views

deadlock/zombiie from Ifort executable

Hi there,

I am not 100% sure whether this is the appropriate forum to ask this questions.

I have programmed a linear model preconditioned conjugate gradient solver in Fortran 2008, compiled with ifort 17 and linked against MKL.

Compiler opitons are:

-mkl=parallel -warn nounused -warn declarations -O3 -static -qopenmp

with mkl linker options:

MKLPARA=  ${MKLROOT}/lib/intel64/libmkl_blas95_lp64.a \
	${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a -Wl,--start-group \
	${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
	${MKLROOT}/lib/intel64/libmkl_core.a \
	${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm -ldl

The mkl options were once generated via the related tool on the intel webpage.

I have installed the compiler in a virtual box environment with a centos7 guest. The only warning I am getting when making the executable is

/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64/libmkl_core.a(mkl_memory_patched.o): In function `mkl_serv_set_memory_limit':
mkl_memory.c:(.text+0x599): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

When I now transfer the executable to a centos7 operated server and run it (solving a system with about 150Mio equations) the program run fine till iteration 7500. Then it froze. Note that in every round of the iterative procedure the program is doing the same.

Running

pidof MyProgram | xargs -i gdb -p {}

yields at the end

[New LWP 210465]
[New LWP 210464]
[New LWP 210463]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x0000000001b75055 in pthread_cond_wait ()

Initially I thought this happens due to compiling under a more recent kernel version (e.g. 4.8) and then running it on a system with an older kernel (eg. 3.10). But, as mentioned above, compiling in a virtual machine with the same operation system as the target server did not help.

Does anyone has any idea about the reason for this.

Thanks a lot.

0 Kudos
4 Replies
Laura_S_3
Beginner
112 Views

From the algorithm side, which this isn't the forum for, but oh well :-) Is there any chance you have satisfied the termination condition for the gradient descent and it is waiting for input or something? If you vary the starting point, does it change the number of iterations until it hangs, or is it always the same? Are you maybe storing values in an array and you've overflowed the array (resulting in weird behavior).

From the Fortran side, if you run with 1 thread does it hang? Have you tried compiling with the optimization level dialed down to 0 and debug turned on?

Good luck. :-)

Laura

Steven_L_Intel1
Employee
112 Views

Since the Fortran compile options don't ask for threading, it seems more likely to be an MKL issue so I moved the thread to the MKL forum.

Ying_H_Intel
Employee
112 Views

Hi 

Could you please tell what MKL function you are calling and  try 

> export MKL_NUM_THREADS=1  and let us know if there is any change? 

and as i understand,  the key problem  is in the run time ,  the program run fine till iteration 7500, right? Could you please provide us the test case so we can try in our lab environment?

The problem may related to compile processing , but as MKL was complied library,  it will seldom affected by compile environment  once if the compile was completed.

Best Regards,

Ying  

may_ka
Beginner
112 Views

Hi there, many thanks for the proposal.

However, at the moment it looks as if there was a bug in glibc (which as far as I understand also ships libpthread).

I had compiled the executable in a freshly updated centos7 system running in a virtual box environment. The target server, also running centos7, had version 2.17-106.el7_2.6 installed while the virtual machine had version 2.17-206.el7_2.8 installed. After updating the target server the system seems to run (currently at round +10000). If the error occurs again I'll provide you with an executable and all the input file it needs to run it in your lab.

One thing I noticed was that when compiling in centos7 environment I got the message:

/usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libpthread.a(libpthread.o): In function `sem_open':
(.text+0x682b): warning: the use of `mktemp' is dangerous, better use `mkstemp'

This did not occur when compiling in an ArchLinux or Ubuntu14.04 environment.

Thanks and best regards

Reply