- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all.
I have a problem with the Intel Fortran compiler (version 16.0.2 20160204) combined with IntelMPI (version 5.1.3 Build 20160120). The error does not appear without MPI.
The code that is attached below runs correctly when compiled with -O0, -O1, and -O3 but not with -O2. With "-fno-inline-functions", the problem does not occur even with -O2. Also, the compiler versions 18.0.0 20170811 (with IntelMPI 2018 Build 20170713) and 12.1.3 20120212 (with MPICH 3.2) do not show this problem. Unfortunately, I was not able to create a testcase, because a small change of the code (e.g., a write statement or the removal of an if statement, even with a false condition) often makes the error disappear.
I should also mention that the code runs correctly in most cases, and the error appears only in very few cases. But if it does, the error is reproducible.
As you can see in the attached (simplified) code snippet, the subroutine "qp_adiabatic" modifies the array "hqp". It should increase the diagonal elements by 1 and set the offdiagonal elements to zero. In the problematic case, the array "hqp" is nonzero in all elements (including the diagonal ones). After the modification, the diagonal elements are all 1 except for the first, see output file "fort.123" below.
Of course, I could simply use a different compiler, but there is always the nagging suspicion that the error might be caused by wrong coding (perhaps by an error elsewhere in the code). So, my questions are the following:
- Is there an obvious breach of the Fortran standard? (Obviously, I modify an array that is not passed to the subroutine explicitly. In fact, the error goes away if I declare it with INTENT(INOUT). However, the subroutine is contained in a parent routine, and all variables and arrays of the parent routine should be accessible in my understanding.)
- Is this a known problem of the present compiler and MPI versions?
- Would it be possible that this error is caused by wrong coding elsewhere in the code? (Since it seems to be quite clear-cut: the array is obviously modified incorrectly.)
- Is there something in the code that would suggest that the -O2 optimization might create a wrong executable (by optimizing something away or so)?
- Is there a possibility to analyze this problem in a more detailed way to see what the computer does with the array exactly?
This issue might be related to:
https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/606074
https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/685354
The code has already been simplified a bit:
integer :: n
complex(8), allocatable :: hqp(:,:)
minstep=10
n=60
allocate(hqp(n,n))
call qp_adiabatic(21,0)
...
contains
subroutine qp_adiabatic(i,istep)
implicit none
integer, intent(in) :: i,istep
integer :: j,k
if(istep.lt.minstep) then
if(istep.eq.0.and.i.eq.21) then
write(123,*) (hqp(j,j),j=1,n)
endif
if(istep.lt.0) call fatal('qp_scale: istep<0. (bug?)')
do k = 1,n ; do j = 1,n
if(j.eq.k) then
hqp(j,j) = hqp(j,j) + 1d0
else
hqp(j,k) = 0
endif
enddo ; enddo
if(istep.eq.0.and.i.eq.21) then
write(123,*)
write(123,*) (hqp(j,j),j=1,n)
endif
endif
end subroutine qp_adiabatic
The file fort.123 looks like this if the error occurs:
(-0.156621693392732,1.681941124276988E-003)
(-0.156621693392732,1.681941124276986E-003)
(-0.156040111750621,1.717955673865940E-003)
... nonzero complex values ...
(0.253993187912219,6.920763917916860E-004)
(0.262222706526087,7.572720081067049E-004)
(0.262222706526087,7.572720081067215E-004)
(0.843378306607268,1.681941124276988E-003)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
... all values are 1 ...
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
(1.00000000000000,0.000000000000000E+000)
Thank you.
Best regards
Christoph
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
At least for the code that you showed, here is the explanation:
When the statement hqp(j,j) = hqp(j,j) + 1d0 is executed, the value of hqp(j,j) on the right hand side is undefined, because you just allocated hqp and immediately after that called the subroutine.
The WRITE statement at the beginning of the subroutine will just print garbage for the same reason.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, good point. Sorry, I have not made it clear enough. The code outside the subroutine should be understood as pseudo code. In the real code, the array hqp is actually calculated before qp_adiabatic is called. I should have written:
allocate(hqp(n,n))
... array hqp is calculated ...
call qp_adiabatic(21,0)
So, the array is completely defined, and the diagonal elements are written to fort.123. Then, the computer should increase them by 1, but the computer writes 1d0 into the array elements, instead.
By the way, as written in my first posting, the computer does it correctly in most cases. The error appears only occasionally.
In any case, thank you for your reply.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page