Community
cancel
Showing results for 
Search instead for 
Did you mean: 
mecej4
Black Belt
67 Views

Bug in Ifort or MKL (?), example ex_nlsqp_f.f

With IFORT 12.0.2, MKL 10.3.2, I started adapting one of the examples (../MKL/examples/solverf/source/ex_nlsqp_f.f), at first just adding a WRITE statement to monitor the first few variables. I changed the function subroutine as follows:

[fxfortran]      SUBROUTINE EXTENDET_POWELL (M, N, X, F)
IMPLICIT NONE
INTEGER M, N
DOUBLE PRECISION X (*), F (*)
INTEGER I, ICNT
DATA ICNT/0/

DO I = 1, N/4
F (4*I-3) = X(4*I - 3) + 10.D0 * X(4*I - 2)
F (4*I-2) = 2.2360679774998D0*(X(4*I-1) - X(4*I))
F (4*I-1) = (X(4*I-2) - 2.D0*X(4*I-1))**2
F (4*I) = 3.1622776601684D0*(X(4*I-3) - X(4*I))**2
END DO
ICNT=ICNT+1
write(*,10)ICNT,X(1),X(2),X(3),X(4)
10 format(1x,i3,' x = ',4F10.5)
END SUBROUTINE EXTENDET_POWELL
[/fxfortran]
Compiling the program and running it with either the 32-bit or 64-bit compiler on SUSE 11.3 as follows

[bash]$ ifort -traceback -mkl ex_nlsqp_f.f
$ ./a.out
[/bash]
produced an unexpected abort after 42 calls to the subroutine, with the message
[bash]forrtl: severe (40): recursive I/O operation, unit -1, file unknown
Image PC Routine Line Source
a.out 000000000047865A Unknown Unknown Unknown
a.out 00000000004771D5 Unknown Unknown Unknown
a.out 0000000000443B86 Unknown Unknown Unknown
a.out 0000000000429A15 Unknown Unknown Unknown
a.out 000000000040A8D3 Unknown Unknown Unknown
a.out 0000000000404634 extendet_powell_ 236 ex_nlsqp_f.f
libmkl_intel_thre 00007F0EC8288423 Unknown Unknown Unknown
[/bash]
Note that line-236 is the WRITE statement. Changing the unit from '*' to a number such as 37 gives that unit number in the abort message.

The same problem occurs with the Windows versions, but the error message is worded slightly differently:

[bash]forrtl: severe (152): unresolved contention for Intel Fortran RTL global resource
Image PC Routine Line Source
ex_nlsqp_f.exe 0044A71A Unknown Unknown Unknown
ex_nlsqp_f.exe 00410EBA Unknown Unknown Unknown
ex_nlsqp_f.exe 00407F2F Unknown Unknown Unknown
ex_nlsqp_f.exe 0040182B _EXTENDET_POWELL 232 ex_nlsqp_f.f
ex_nlsqp_f.exe 00450CE0 Unknown Unknown Unknown
libiomp5md.dll 100621F5 Unknown Unknown Unknown
libiomp5md.dll 10046BDA Unknown Unknown Unknown
libiomp5md.dll 100446C3 Unknown Unknown Unknown
libiomp5md.dll 100632C8 Unknown Unknown Unknown
kernel32.dll 77073677 Unknown Unknown Unknown
ntdll.dll 77EA9F02 Unknown Unknown Unknown
ntdll.dll 77EA9ED5 Unknown Unknown Unknown
[/bash]
Once again, the error is at the line with the WRITE statement.
0 Kudos
7 Replies
Gennady_F_Intel
Moderator
67 Views

if you comment all mkl's function, would be the same results?
mecej4
Black Belt
67 Views

"if you comment all mkl's function, would be the same results?"

Not at all, since this is an example program that does nothing much if the calls to MKL routines (and the invocations of functions DTR_NLSP_xxxx) are commented out. Since this particular solver routine is called with a Reverse Call Interface, commenting out the calls to MKL routines would also cause no calls to be made to the EXTENDET_POWELL routine, and the program would essentially do nothing.

I do not think that the WRITE statement in the EXTENDET_POWELL is causing I/O problems of the type that one sees when a DLL written in a language other than Fortran calles a Fortran routine, when the runtimes of the two languages can interact in odd ways.

Thanks

ADDED 9.05 AM PDT:

The problem goes away if the environmental variable OMP_NUM_THREADS is set to 1..
A__Valle
Beginner
67 Views

We encounter a problem with the same diagnostics here.
The last post was more of a workaround than a solution to the observed behavior. Do you know if there has been found a root cause?

Thanks in advance for any comments
Dirk van Meeuwen
mecej4
Black Belt
67 Views

Another possible workaround, with less of a performance penalty, is to use the /Qopenmp (Windows) or -fopenmp (Linux/Mac) option.
67 Views

Hi mecej4,

Just an idea: function EXTENDET_POWELL calls from parallel region in case when openmp doesnt disable. So it seems that you global variable ICNT change and return on screen by different threads. I am not sure that this is a case of problem, I have not reproduced this issue yet, but it could be.

With best regards,

Alexander Kalinkin

mecej4
Black Belt
67 Views

You are probably correct!

When I first ran into this problem, it never occurred to me that threading issues could cause problems.

Now I see that multiple threads could update the variable ICNT and produce unpredictable values of that variable. However, is it not a bug that, instead of incorrect values being printed, a run-time abort occurs?
67 Views

Hi,
As I wrote I didn't reproduced your issue so can't say something special about it. But when you removedvariable ICNT problem with abort of RCI solver have disappeared or not? If yes when problem was in incorrect use of global variable if not - then we will try to find it somewhere else :)
With best regards,
Alexander Kalinkin
Reply