There seems to be an Intel Fortran compiler bug introduced sometime around version 19.1.0.
It's a bit difficult to pin down, but the problem occurs in our electronic structure code Elk (http://elk.sourceforge.net/) with direct-access files when many threads are writing to the same file, but not at the same time.
Here is a snippet of the code that writes the file:
!$OMP CRITICAL(u122) open(122,file=trim(fname),form='UNFORMATTED',access='DIRECT',action='WRITE', & recl=recl) write(122,rec=ik) vkl(:,ik),nmatmax,nstfv,nspnfv,evecfv close(122) !$OMP END CRITICAL(u122)
And the code that reads the file:
!$OMP CRITICAL(u122) open(122,file=trim(fname),form='UNFORMATTED',access='DIRECT',action='READ', & recl=recl) read(122,rec=ik) vkl_,nmatmax_,nstfv_,nspnfv_,evecfv close(122) !$OMP END CRITICAL(u122)
The named OMP CRITICAL sections ensure that only one thread is reading or writing at the same time. Unit 122 is never used anywhere else in the code, just in these two subroutines.
Now the bug. It occurs when I run the code with several threads.
The error is:
forrtl: severe (554): direct I/O not consistent with OPEN options
The Intel Fortran versions which are affected (and to which I have access) are 19.1.0 and 19.1.1.
Intel Fortran versions 18.0.0, 18.0.5, 19.0.4 and 19.0.5 work fine. GFortran also works fine.
To test this, you'll have to download Elk, compile it and run 'make test'. Most tests pass but several crash because of this problem.
(Thanks to Pavlov Nikita for pointing out the bug in the first place.)
(Max-Planck Institute, Germany.)
Not that this is related to the error message...
In your read, you use evecfv, shouldn't this be evecfv_? (or is evecfv PRIVATE?)
FWIW I found this:
severe (554): Direct I/O not consistent with OPEN options
FOR$IOS_F6203. A REC= option was included in a statement that transferred data to a file that was opened with the ACCESS='SEQUENTIAL' option.
Do you have any files that are opened elsewhere that use SEQUENTIAL access?
If so, you might want to insert some ASSERT code to assure it didn't snag unit 122.
Reading evecfv is the intention of the code. The other variables have the underscore to avoid conflict with their global equivalents.
Unit 122 is not used anywhere else in the code. Also, this code has been working fine for the past decade or so with several generations of Intel compilers. The error appears to be a recent development.
Which compile options are you using?
All I can think of is that the CRITICAL section isn't doing what it is supposed to, and that you end up with this code executing in more than one thread at the same time. Regardless, reporting here using a snippet isn't going to be productive - I suggest you open a ticket at https://supporttickets.intel.com/servicecenter?lang=en-US and see if you can provide a complete reproducible example.
>>Unit 122 is not used anywhere else in the code
The point I was trying to make is "You know what the code is supposed to be doing"
All I asked was to programically verify that it is not being used.
Examples of potential points of error:
1) You are using (elsewhere) a variable (not parameter, not literal) that you assume is something other than 122.
2) You are using NEWUNIT, and for some reason internal to IVF its assigned unit number (supposed to be negative) somehow corrupts non-NEWUNIT unit numbers. (bug in IVF).
The supposition of problem with critical can be tested by encapsulating the open and close "protected" critical sections with an OpenMP lock and unlock of global shared variable. (also note that this be the issue, you now have a work around)
All file open statements in Elk have an explicit unit number. NEWUNIT is not used anywhere in the code.
The fact that all the code works on most versions of Intel Fortran and all versions of GFortran suggests that this is a compiler problem.
I'll see if I can create a small test program with reproduces the error.
The compile options are
ifort -O3 -ip -qopenmp -traceback
Unfortunately, try as I might, I can't produce a self-contained example.
I do have some more information though. When I insert:
!$OMP CRITICAL(u122) open(122,file=trim(fname),form='UNFORMATTED',access='DIRECT',action='READ',recl=recl) !************** inquire(122,action=action_,form=form_,name=name_,named=named_,opened=opened_,access=access_) print * print *,'action ',action_ print *,'form ',form_ print *,'name ',trim(name_) print *,'named ',named_ print *,'opened ',opened_ print *,'access ',access_ !************** read(122,rec=ik) vkl_,nmatmax_,nstfv_,nspnfv_,evecfv close(122) !$OMP END CRITICAL(u122)
... then every so often, I get
action READ form UNFORMATTED name /cobra/u/jdewhurs/elk/src/EVECFV.OUT named T opened T access DIRECT action READ form UNFORMATTED name /cobra/u/jdewhurs/elk/src/EVECFV.OUT named T opened T access SEQUENTIAL forrtl: severe (554): direct I/O not consistent with OPEN options Image PC Routine Line Source elk 00000000011DA68B Unknown Unknown Unknown elk 00000000011FA83C Unknown Unknown Unknown elk 000000000046DFCC getevecfv_ 81 getevecfv.f90 elk 000000000041BF0E rhomagv_ 46 rhomagv.f90 libiomp5.so 00002B01CD110CC3 __kmp_invoke_micr Unknown Unknown libiomp5.so 00002B01CD096283 Unknown Unknown Unknown libiomp5.so 00002B01CD09524E Unknown Unknown Unknown libiomp5.so 00002B01CD11119C Unknown Unknown Unknown libpthread-2.22.s 00002B01CD3ED6DA Unknown Unknown Unknown libc-2.22.so 00002B01CD8F527D clone Unknown Unknown
As you can see, the access to the file has been randomly switched to SEQUENTIAL despite being opened in the previous line with DIRECT. As I mentioned before, unit 122 is only ever used with DIRECT access.
It appears that the latest versions of Intel Fortran (19.1.0 and 19.1.1) are opening unit 122 unbeknownst to the rest of the code and not respecting the CRITICAL statement. I also tried removing the name of the CRITICAL section (u122) but that didn't work either.
A long shot but maybe the stored data variables for the open are getting corrupted by some unrelated problem. Try setting the "access" as a variable just before the open and then write that also. That will in my mind confirm it is a problem in the Fortran RTL or not.
Also try adding -threads. I don't think this ought to be necessary, but it can't hurt to try. This adds a call from the main program to have the run-time library protect itself against multithread access. (I assume the main program is in Fortran.)
I also have a sneaking suspicion that OPEN on a connected unit is involved somehow. If I had to guess, the bug is in the run-time library and not the compiler. I suggest opening a ticket with Intel support and give them what you have so far.
Similar to Andrew's post "unrelated problem"
The cause of this anomaly may be due to memory corruption. Compile your program with full compile time and runtime diagnostics. This may catch a programming error (incorrect interface, array bounds exceeded, etc...). Note, the diagnostics will not catch all such errors.
One particular problem (RE: Serial works, Parallel has problems) that can show up (or at least used to be a problem) was using PRIVATE(unallocatedArray) as opposed to FIRSTPRIVATE(unallocatedArray). The problem used to be that the contents of the private array descriptor was not initialized.
I tried recompiling the code with -threads but the problem persisted. (The entire code is in Fortran.)
I also followed your suggestion and opened a ticket with Intel support.
yes that is what I meant, did was the value of the variable still "direct" when the open said it was sequential and crashed? I think filing a ticket is the way to go.
I wonder if there is any progress with this. I have the same problem with the same code.
severe (554): direct I/O not consistent with OPEN options
I did verify that it is not a problem with the run time library I think. If I compile after
ifort version 126.96.36.1991
The code runs fine, even if I try running under the new compiler environment:
ifort version 188.8.131.52
But compiling under the newest compiler gives the error reported above.
Is there a workaround yet to use the new compiler, or a bug fix coming?
I opened a ticket with Intel Support and they confirmed that it is indeed a compiler bug.
The versions affected are 19.1.0, 19.1.1, 19.1.2 and 19.1.3.
I'll update this thread as soon as a fix or workaround is available. In the meantime, you'll have to use version 19.0.5 or earlier.
Regarding the status of this compiler bug, the compiler developer root caused the problem and is testing the solution. As you know it is an inconsistent failure. Expect a fix in a future release.