- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have troubles deploying code Telemac http://opentelemac.org with Intel Fortran Compiler 16. All routines applying an old trick to skip lengthy records in an unformatted sequential file in order to get to some place in which one wants to read, deliver read error when compiled with optimisation higher than O0. The trick is instead of reading thousands of numbers, just read one number from this record and skip to the next record with next READ, and so on until you get where you want. This trick (a typical old Fortran way of doing things...) stopped working with optimised Intel 16.
I have written a short program containing the original Telemac routines (skipgeo) and a routine being a simple workaround with allocating large enough buffers (skipgeo_improved). The code works well with Intel Fortran 14 and with gfortran (gcc 4.8.4) and yields a read error with optimised Intel 16, catched by a wrapper routine (lit) for the Fortran READ. If you use -warn all -catch all and/or -O0, everything goes well.
I wonder if this is not an optimisation bug. I have not found in the documentation anything about changes in unformatted sequential file treatment.
Please find included code and the file to be read (big endian).
Looking forward for your reactions, best regards,
jaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I expect this to be fixed in Parallel Studio XE 2016 Update 3. The underlying problem was that the run-time library was incorrectly positioning the file when only part of a very long record was read, due to a bug in the buffering implementation.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your program contains at least these errors: In subroutine SKIPGEO, arrays XBID, W and IBID are declared with actual size=1, but these arrays are passed as arguments to LIT, where these variables are declared to have size=NVAL, and NVAL may be as high as 72. Similarly for the character variable CBID. Reading more than one element of any of these arrays from the file, e.g., reading XBID(2), would cause array overrun and possibly cause memory corruption.
Whether this actually happens or not would be known only from a detailed examination of the program behavior. In general, optimized buggy code has unpredictable run time behavior.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It may be an optimization bug, possibly where the optimizer mistakenly removed what it thought was dead code. As a potential work around and confirmation of this, try the following
After you READ the one number, insert a statement
IF(ISNAN(TheOneNumber)) PRINT *,"Not supposed to happen"
This will insure the compiler optimization will not assume TheOneNumber (whatever its name) was .NOT. never used, and thus the code used to generate its value is subject to removal.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej,
Thanks for examining the code for blatant errors. In the event that when after the code is corrected it still exhibits this symptom, then he can try my diagnostic.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
@mecej4: Thank you for your interest. The routine skipgeo is not supposed to -read- the lengthy records (positions marked with (1) to (4)), but to -skip- them reading just one number in a record and turn to the next record while executing the new READ. Therefore the short declarations of dummy fields you mention. Please note, everywhere by LIT calls -- records marked with (1..4) -- the value of NVAL=1. There are perfectly no memory overruns. Please note this is a veteran legacy code, in duty since ca. 1985...
Best regards, jaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
@Jim Dempsey. Please note I mention Fortran READ statement errors, the record in LIT is read with
READ(CANAL,END=100,ERR=101)(W(J),J=1,NVAL)
so it jumps to the label given by ERR. Please run the code, you should have the output like (on other platforms one may have error already by (1)):
jaj@neo:~/prog/telemac/v6p3r2/work/litanie$ ./generr_intel
opening the geometry file
skipping geometry improved
(1)
(2)
(3)
(4)
skipping geometry original
(1)
(2)
(3)
(4)
LIT : ABNORMAL END OF FILE
ONE INTENDED TO READ
A RECORD OF 1 VALUES
OF TYPE : R4
ON LOGICAL UNIT : 10
PLANTE: PROGRAM STOPPED AFTER AN ERROR
2
Best regards, jaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jacek,
The system I am using to inspect your program is Windows 7 Pro x64 with IVF V16 update 1.
The program reads the first header record, then experiences an EOF on the next read. In examining the contents of the geo_wesel.slf file using a hex dump it appears that the records were written using Big-endian format. When I change your open statement:
!*OPEN (inp, FILE='geo_wesel.slf', FORM='unformatted', STATUS='unknown', ACTION='read') OPEN (inp, FILE='geo_wesel.slf', FORM='unformatted', STATUS='unknown', ACTION='read', CONVERT='BIG_ENDIAN')
The output becomes:
opening the geometry file skipping geometry improved (1) (2) (3) (4) skipping geometry original (1) (2) (3) (4) closing the geometry file
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
@Jim Dempsey : Thank you for your time. Yes, the file is big endian, as given in the compilation instructions in the comments at the very beginning of the code included:
! please note the file geo_wesel.slf is written in big endian ! it is a sequential unformatted file with records of mixed type ! called "Telemac Serafin format" ! => use export F_UFMTENDIAN=big for reading correctly! ! => or set appropriate compiler flags: ! ! ifort -convert big_endian generr.f90 -o generr_intel ! gfortran -fconvert=big-endian generr.f90 -o generr_gfortran ! ! (the error occurs as well when using little endian files) ! ! NOTICE: ! ifort -warn all -check all -convert big_endian generr.f90 -o generr_intel ! ifort -O0 -convert big_endian generr.f90 -o generr_intel ! deliver correctly running executables... Optimisation problem?
ifort (IFORT) 16.0.1 20151021
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jacek, I think that the problem may be closer to a Fortran RTL problem than an optimizer problem, in that I can reliably generate the premature end-of-file with the test code given below on your data file (sequential, big-endian, unformatted, variable record sizes), even with the /Od option on Windows, using the 32- and 64-bit 16.0.1 compilers. The bug is not seen with the 11.1.070 compiler, Lahey and Gfortran, all of which use the same unformatted file format.
The test code is a greatly stripped-down version of your code. The correct output should be:
W = 3678.738 Normal end
but, because of the bug, the actual output is:
forrtl: severe (24): end-of-file during read, unit 10, file s:\lang\Jacek\geo_wesel.slf Image PC Routine Line Source libifcoremd.dll 5FF819A2 Unknown Unknown Unknown libifcoremd.dll 5FFBE60F Unknown Unknown Unknown gen.exe 00341295 _GENERR_ip_SKIPGE 25 gen.f90 gen.exe 003410B8 _MAIN__ 7 gen.f90 ...
PROGRAM generr IMPLICIT NONE OPEN (10, FILE='geo_wesel.slf', FORM='unformatted', STATUS='OLD', & ACTION='read', CONVERT='BIG_ENDIAN') CALL skipgeo () CLOSE(10) STOP 'Normal end' CONTAINS SUBROUTINE SKIPGEO () REAL W(1) INTEGER IB(10),I ! REWIND 10 do i=1,6 read(10) end do CALL LIT(IB,1) read(10)w(1) read(10)w(1) write(*,*)'W = ',w(1) RETURN END SUBROUTINE SKIPGEO SUBROUTINE LIT (I, NVAL) INTEGER, INTENT(IN) :: NVAL INTEGER, INTENT(INOUT) :: I(NVAL) ! read(10)i(1:nval) return END SUBROUTINE LIT END PROGRAM generr
Furthermore, replacing the call to LIT() by the equivalent line
read(10)ib(1)
makes the bug disappear. Similarly, if I compile the source using the 16.0.1 compiler and then link the OBJ file to the runtime library of the 14.0.4.237 compiler, the bug goes away.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Mecej4,
thank you very much for your time and work. I can reproduce your results, erroneous with Intel 16.0.1 and correct with gfortran based on gcc 4.8.4 on my Linux laptop
Linux neo 3.13.0-77-generic #121-Ubuntu SMP Wed Jan 20 10:50:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Sadly, if you suspect Fortran RTL, this is very bad news indeed. I've also had correct runs with Intel 14 in the past, so it seems that your workaround with compiling with Intel 16 and linking with 14 libraries is thoroughly consistent...
So what should I do now? Submit this bug to some appropriate "complaints booth" by intel? Wait for a new compiler release with fingers crossed? ,^)
Confused, but thankful for your analysis, with best regards, jaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Reporting the bug here should be sufficient. The Intel personnel usually respond within a day or two (not counting week-ends). Since we now have a short reproducer, and reproducing the bug does not seem to affected by optimization level, it should be easy for them to see that there is a problem and file a bug report.
On the other hand, the fix for the bug may not become available until one or two compiler updates have been released.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is an even shorter reproducer, and a little-endian unformatted input file to go with it.
PROGRAM generr IMPLICIT NONE integer :: ib(1),wi OPEN (10, FILE='geo_wesel.lit', FORM='unformatted', STATUS='OLD', & ACTION='read') REWIND 10 call lit(ib,1) ! replacing by "read(10)ib(1)" makes bug go away write(*,'(A8,2x,Z8)')'IB(1) = ',ib read(10)wi write(*,'(A8,2x,Z8)')'W = ',wi CONTAINS SUBROUTINE LIT (V, n) integer, intent(in) :: n INTEGER, INTENT(OUT) :: V(n) read(10)v(1:n) return END SUBROUTINE LIT END PROGRAM generr
IFort 14.0.4.237 (Windows) output:
IB(1) = 1000000 W = DE009B44
IFort 16.0.1 (Windows) output:
IB(1) = 1000000 W = 6B22D045
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the smaller test cases. I will send this on to the developers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
@ Steve Lionel : Thank you for your interest and passing the case to the developers.
The very essence of this error is the different behaviour of READ if we want to read all the written records applying correct field lengths or just want to skip the record reading one number and passing to the next record. We encounter READ errrors or a premature(?) end-of-file. Note this has nothing to do with the endianess of the input file.
Please note as well that although the conclusions of mecej4 (thank you for your help) might be perfectly right (Fortran RTL?), his/her way of reading the provided Telemac input file might be confusing while searching for errors because the behaviour might be dependent on the type of variable to be read of if it is a field or not(?). The original input file structure description is given in comments in the routine skipgeo and the reading with perfectly set field lenths is realised in skipgeo_improved.
Looking forward to Intel developers reactions, best regards, jaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Escalated as issue DPD200381641. Another data point - If I build with the 15.0 compiler, I get an "internal error" in the run-time library.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I expect this to be fixed in Parallel Studio XE 2016 Update 3. The underlying problem was that the run-time library was incorrectly positioning the file when only part of a very long record was read, due to a bug in the buffering implementation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Steve,
sorry for not answering, I was on strictly non-internet holidays... Thank you for solving the problem, please follow my thanks to the developers team ,^) So far as I understand it remains for me to wait for the new update... Please inform me, what would be the approximate release date date one can assume to be realistic?
Best(!) regards, Jacek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think May 2016.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am extremely sorry, but with Parallel Studio Update 3 the problem remains as before. Disappointed...
Best regards, Jacek
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed, it seems that part of the fix didn't get into 16.0.3. It is fixed in 17.0 Beta (I tried it) and should also be fixed in 16.0.4.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page