Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

OMP error

Alfredo
Beginner
1,254 Views

Hello,

I'm developing a threaded application in Fortran and OpenMP and using the intel ifort compiler. Some runs of my code end up in an error message of the type below. This also happens when only one thread is used.

Could anyone please explain me what this error means?

Thanks

forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libpthread.so.0 00002ACFACA95CB0 Unknown Unknown Unknown
libpthread.so.0 00002ACFACA95B5D Unknown Unknown Unknown
libiomp5.so 00002ACFAC95C4A2 Unknown Unknown Unknown
OMP: Error #141: Monitor did not reap properly.
OMP: System error #35: Resource deadlock avoided
forrtl: error (76): Abort trap signal
Image PC Routine Line Source
dtest 00000000004BBBED Unknown Unknown Unknown
dtest 00000000004BA6F5 Unknown Unknown Unknown
dtest 000000000046D689 Unknown Unknown Unknown
dtest 0000000000430B0A Unknown Unknown Unknown
dtest 0000000000435323 Unknown Unknown Unknown
libpthread.so.0 00002ACFACA95CB0 Unknown Unknown Unknown
libc.so.6 00002ACFACBD1DA5 Unknown Unknown Unknown
libc.so.6 00002ACFACBD31A0 Unknown Unknown Unknown
libiomp5.so 00002ACFAC93D220 Unknown Unknown Unknown
Aborted
forrtl: error (78): process killed (SIGTERM)Image PC Routine Line Source libpthread.so.0 00002ACFACA95CB0 Unknown Unknown Unknownlibpthread.so.0 00002ACFACA95B5D Unknown Unknown Unknownlibiomp5.so 00002ACFAC95C4A2 Unknown Unknown UnknownOMP: Error #141: Monitor did not reap properly.OMP: System error #35: Resource deadlock avoidedforrtl: error (76): Abort trap signalImage PC Routine Line Source dtest 00000000004BBBED Unknown Unknown Unknowndtest 00000000004BA6F5 Unknown Unknown Unknowndtest 000000000046D689 Unknown Unknown Unknowndtest 0000000000430B0A Unknown Unknown Unknowndtest 0000000000435323 Unknown Unknown Unknownlibpthread.so.0 00002ACFACA95CB0 Unknown Unknown Unknownlibc.so.6 00002ACFACBD1DA5 Unknown Unknown Unknownlibc.so.6 00002ACFACBD31A0 Unknown Unknown Unknownlibiomp5.so 00002ACFAC93D220 Unknown Unknown UnknownAborted

0 Kudos
4 Replies
Grant_H_Intel
Employee
1,254 Views

This is an internal error reported from the OpenMP run-time library. It means that the "monitor" thread was notavailableto do a POSIX thread "join" operation when the run-time library is being shut down. The most common problem that causes this is fork()/exec() code, which copies the library data structures, but not the actual POSIX threads. Since these functions are rarely used in Fortran code, it may not be the cause of the internal error.

If you are not using fork()/exec(), then the best think to do would be to submit a bug report which includes a test case that reproduces the problem to the Intel Premier Support web site. Without a test case, it may be nearly impossible for support engineers to reproduce the problem.

Please let me know if you have any further questions.

0 Kudos
Alfredo
Beginner
1,254 Views

Dear Grant,

I'm not using fork/exec. I'll try to reproduce the problem on a smaller code since what I have now is quite complicated and submit a bug report.

Thanks for helping.

Alfredo

0 Kudos
Alfredo
Beginner
1,254 Views

Just a little update on this topic. I found out that the problem only happens when launching the job on a cluster through pbs and when the job does not have exclusive access to the node that has been allocated to it. On the contrary, if the job has exclusive access, no problems at all. I'll check with the sysadmin and eventually add more useful info to this discussion.

best regards

alfredo

0 Kudos
Grant_H_Intel
Employee
1,254 Views
Alfredo,

Did you find out anything more about this problem after talking to the sysadmin? It sounds to me like it may be some buggy interaction between the PBS system and POSIX threads, but I could be mistaken. Please let me know if you have any more info.

Thanks,
0 Kudos
Reply