Solved: IVF 13 bug report - compile-time error not flagged as such

Wayne_L_ · ‎02-09-2017

Many "division by zero" NaNs found in test-case output. This turned about to be a data-structure error, undiagnosed by IVF.

The code had once worked, an executable version had been available at one time. So its wasn't a matter of debugging, so much as repairing coding technique that is no longer supported. But no amount of setting IVF 13 Properties would avoid the problem (NO error was returned!). After many, many rather frustrating tries, we found the names of the offending data arrays by locating the origin of the NaNs. (This requires setting no less than 3 Properties correctly). But recoding the logic which uses the arrays AT THOSE LINES, to eliminate all division-by-zero statements, was futile!!

A careful use of the debugging capability eventually pointed to the now-erroneous lines of code relating to these arrays. The original programmer (no longer on contract) had been using the EQUIVALENCE statement to fill an essential array, eliminating nearly all zeros. The zeros intended to be filled by the second array resulted in NaNs -if uncorrected. The code had originally been compiled with "Microsoft Powerstation 4" ~1997, which apparently allowed an 'unconventional' type of "EQUIVALENCE" statement... which the IVF debugger recognized.

But the compiler itself returned NO error message - not even a runtime error. No direct reference to the problem code.

Such an occurrence is reminiscent of old-school FORTRAN 77 compilers - which frequently indicated the offending lines of code Incorrectly. Thus I consider this "missing error" to be a BUG in IVF 13. Please ensure that future versions (or patches, updates) repair this bug by implementing a USEFUL Error Message. At least maybe someone else can then benefit from many wasted hours - and years of useless modeling capability - experienced.

andrew_4619 · ‎08-28-2017

I think EQUIVALENCE (P21F(121),P21FX(1)) is invalid in more than one way but the compiler (Intel(R) Visual Fortran Compiler 17.0.4.210 [IA-32].)does not say so, I think it probably should.

The fact that the compiler 'seems' to ignore it is secondary as conforming code cannot really have a defined result. Clearly Powerstation interpreted this in a particular way and either copied p21fx into p21f or made the p21f array not in contiguous memory within the common block (which seems unlikely)!

That said it there are many ways of quickly fixing this to initialise p21f properly.

View solution in original post

andrew_4619 · ‎02-09-2017

Note that many checking options both at compile and run time are not switched on by default. You should post a code example the demonstrate the problem, and the build options (or better the build log) so that if there is a problem it ca be fixed. I find the checking to be pretty robust so I am surprised you have a problem.

Wayne_L_ · ‎02-10-2017

I'm well aware of the many compiler Properties options. I've made 30-odd codes work again by careful use of them, ensuring that any unneeded Properties remain at default setting. We (2-3 of us, over the course of 3 years) spent several weeks checking every sensible compiler Property option without success. There was no 'checking option' found which detects this odd data-structure method:

"Was able to compile the original source code ('IP_info.f') with a minor correction that removed the NaNs in the output. Within the "BLOCK DATA INITIAL" block, P21F, EC21F, etc, are all (15,15) arrays that are initialized by DATA statements. They all utilize EQUIVALENCE statements along with _an additional local array_ to initialize the large 15x15 array. The initialization of local arrays within the BLOCK DATA is not permitted. A way around this is to remove the EQUIVALENCEs and merely initialize the entire array in one DATA statement." I don't think that I can post any of the proprietary code.

It is not clear to me why the older compiler permitted it and the new one does not.

The under-filled matrices effect on the downstream code can be located with Properties ('checking options' or 'build options' ?) set to:

Fortran/FloatingPoint/FlPtEceptionHandler = Abort on IEEE exceptions

Fortran/Runtime/GenerateTraceback = Yes

Linker/General/EnableIncLink = No

which, as I said, merely locates the lines where division by zero actually occurs. So I conclude that the compiler simply cannot detect this anomaly. I don't have much experience working with the debugging utility (we don't develop code, per se), and eventually found a colleague who did. Same guy who provide the CVF Library files which worked so well.

I was surprised as well, thus felt compelled to submit a bug report.

andrew_4619 · ‎02-10-2017

"It is not clear to me why the older compiler permitted it and the new one does not." - This is quite simple the newer compilers are more likely to reject invalid Fortran. Use on non standard Fortran ( does your source compile with /stand ). might have worked in the past but of you are relying on a behaviour that is actually undefined then a different compiler is liable to give different outcomes.

It is non clear that the exact problems is you have posted no examples. If you have some Fortran that is reliant on trying lots of different compiler options to get to work is probably indicative of the fact you have buggy non standard code.

.

mecej4 · ‎02-10-2017

Wayne L wrote:
I was surprised as well, thus felt compelled to submit a bug report.

Please do assemble a "reproducer" and put in the effort to submit a bug report.

The description that you have provided so far is not complete and sounds too circumstantial to act upon. If I had to guess, I'd say that it fits in the class of problems where, according to the standard, the program is "nonconforming" and/or the behavior of the running program is "undefined". In such a situation, small and seemingly unrelated changes to the program, compiler options or previous machine state can induce unexpected changes or, conversely, fail to induce expected changes.

Most of the errors that you describe (e.g., division by zero) do not seem to be of such a nature that one can expect them to be detected at compile time, so I find the title of the thread not quite apt to the content of the thread. Again, providing a reproducer would help clarify these questions.

Wayne_L_ · ‎08-28-2017

I don't think that a "reproducer" is required, and you'd have to explain how to create one! (esp -without it including any proprietary info). I also don't know any other way to submit a bug report.

Please try compiling the code snippet below, which will produce unexpected "infinity's, and again without the four lines of code. These lines include an EQUIVALENCE statement which was not compiled or reported as an error by Intel 13.

{I tried attaching it but the browser wouldn't allow it}

C      COMPUTER PROGRAM INTENDED TO REPLICATE UNDIAGNOSED ERROR.
C         Div-by-0, Infinity results are in error due to DATA EQUIVALENCE
C         which Intel 12 did NOT report during Compile Time.  Wayne R Lundberg
      PROGRAM  MAIN
      COMMON/CAN/ERR(15,15),P21F(15,15)           
C     The original code was developed using Microsoft Powerstation 4
c         and is known to compile and run correctly using the older compiler      
C     ******************************************************************
C     THE FOLLOWING 2 LINES must be COMMENTED OUT AS THE
C     EQUIVALENCE STATEMENTS DO NOT WORK PROPERLY AND ARE NOT NEEDED
C     RON BUHRMAN - 7 DEC 2016
      DIMENSION P21FX(105)                                              00010580
      EQUIVALENCE (P21F(121),P21FX(1))                                  00010590
      DATA P21F/                                                        00011220
     $1.1374,1.1342,1.1273,1.1174,1.1096,1.0959,1.0587,1.0372,          00011230
     $1.0117,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,                 00011240
     $1.3038,1.3035,1.2984,1.2935,1.2816,1.2640,1.2357,1.2056,          00011250
     $1.1713,1.1498,1.1351,1.1254,1.1195,1.1069,1.0980,                 00011260
     $1.4645,1.4645,1.4644,1.4558,1.4432,1.4259,1.4104,1.3892,          00011270
     $1.3612,1.3322,1.3081,1.2907,1.2616,1.2278,1.1823,                 00011280
     $1.5747,1.5700,1.5623,1.5528,1.5422,1.5317,1.5134,1.4895,          00011290
     $1.4578,1.4279,1.4030,1.3780,1.3568,1.3212,1.2798,                 00011300
     $1.7310,1.7186,1.7032,1.6830,1.6658,1.6465,1.6244,1.6004,          00011310
     $1.5725,1.5389,1.5071,1.4831,1.4561,1.4253,1.3916,                 00011320
     $1.8772,1.8663,1.8495,1.8299,1.8103,1.7908,1.7713,1.7499,          00011330
     $1.7228,1.6937,1.6665,1.6548,1.6305,1.6062,1.5672,                 00011340
     $2.0060,2.0010,1.9866,1.9672,1.9499,1.9210,1.9095,1.8835,          00011350
     $1.8642,1.8277,1.7988,1.7650,1.7169,1.6784,1.6108,                 00011360
     $2.1716,2.1572,2.1372,2.1123,2.0941,2.0760,2.0587,2.0453,          00011370
     $2.0281,1.9726,1.9247,1.8768,1.8097,1.7713,1.7234,                 00011380
C     These next two lines also must be commented out for the code to work      
     & 105*0./ 
      DATA P21FX/
     $2.2767,2.2585,2.2327,2.2022,2.1735,2.1525,2.1257,2.1085,          00011400
     $2.0827,2.0300,2.0013,1.9534,1.8960,1.8481,1.8097,                 00011410
     $2.3842,2.3592,2.3323,2.3006,2.2728,2.2537,2.2317,2.2040,          00011420
     $2.1867,2.1005,2.0621,1.9997,1.9374,1.9038,1.8749,                 00011430
     $2.5300,2.4900,2.4500,2.4030,2.3650,2.3460,2.3160,2.2990,          00011440
     $2.2770,2.2300,2.1960,2.1620,2.0890,2.0420,1.9690,                 00011450
     $2.6610,2.6230,2.5750,2.5130,2.4630,2.4240,2.3790,2.3590,          00011460
     $2.3450,2.2650,2.2310,2.2060,2.1570,2.0970,2.0380,                 00011470
     $2.8750,2.8280,2.7780,2.7230,2.6700,2.6440,2.6070,2.5860,          00011480
     $2.5600,2.5020,2.4340,2.3450,2.2670,2.1990,2.1260,                 00011490
     $3.0300,2.9850,2.9400,2.8980,2.8450,2.8060,2.7720,2.7440,          00011500
     $2.7210,2.6610,2.5930,2.4820,2.3880,2.3160,2.2380,15*0./           00011510
      OPEN(6,file='IntelErr.txt',status='UNKNOWN')
C DO loop will produce incorrect div-by-0 without the corrections to arrays      
      DO 1 I=1,14                                                       00044400
      DO 1 J=1,15                                                       00044410
      DELTA=4/P21F(J,I)*2                                               00044430
      ERR(J,I)=1+DELTA                                                  00044440
    1 CONTINUE   
      DO 2 I=1,14                                             
c      DO 2 J=1,15                                            
    2 WRITE(6,3) ERR(J,I)                                               00046350
    3 FORMAT (1H ,14F12.4)                                              00047100      
      END

andrew_4619 · ‎08-28-2017

I think EQUIVALENCE (P21F(121),P21FX(1)) is invalid in more than one way but the compiler (Intel(R) Visual Fortran Compiler 17.0.4.210 [IA-32].)does not say so, I think it probably should.

The fact that the compiler 'seems' to ignore it is secondary as conforming code cannot really have a defined result. Clearly Powerstation interpreted this in a particular way and either copied p21fx into p21f or made the p21f array not in contiguous memory within the common block (which seems unlikely)!

That said it there are many ways of quickly fixing this to initialise p21f properly.

gib · ‎08-28-2017

As a general point, it can be helpful in bug tracking to build the program with another compiler. Since gfortran is free it is easy to use this to provide a "second opinion".

mecej4 · ‎08-28-2017

Neither FPS4 nor Gfortran will produce an EXE from the code that was given in #6:

s:\lang>fl32 lund.f
Microsoft (R) Fortran PowerStation  Version 4.00
Copyright (C) Microsoft Corp 1982-1995. All rights reserved.

lund.f
lund.f(13): error FOR2979: expected 2 subscripts but found 1 for array P21F
Errors in declarations, no further processing for MAIN

s:\lang>gfortran lund.f
lund.f:13:23:

       EQUIVALENCE (P21F(121),P21FX(1))                                  00010590
                       1
Error: Rank mismatch in array reference at (1) (1/2)

Please provide test code that does what you said: "The original code was developed using Microsoft Powerstation 4 and is known to compile and run correctly using the older compiler".

With the Intel compiler, you may wish to use the option /stand:f95 when compiling code of dubious correctness.

andrew_4619 · ‎08-29-2017

mecej4 wrote:
With the Intel compiler, you may wish to use the option /stand:f95 when compiling code of dubious correctness.

At a glance the code has many things that make me wince in pain however I think the point is that Ifort should split it dummy out with actual errors and it doesn't. With /stand:f95 we get:

Source1.for(4): warning #7373: Fixed form source is an obsolescent feature in Fortran 95.
Source1.for(13): warning #6920: In this context, Fortran 95 does not allow a single subscript in an EQUIVALENCE statement.   [P21F]
Source1.for(14): warning #6243: In Fortran 95, this DATA statement object cannot appear in either a blank COMMON block or a named COMMON block.   [P21F]
Source1.for(33): warning #6243: In Fortran 95, this DATA statement object cannot appear in either a blank COMMON block or a named COMMON block.   [P21FX]
Source1.for(53): warning #6033: Sharing of a DO termination statement by more than one DO statement is an obsolescent feature in Fortran 95.  Use an END DO or CONTINUE statement for each DO statement.   [1]
Source1.for(56): warning #6028: DO termination on a statement other than an END DO or CONTINUE is an obsolescent feature in Fortran 95.
Source1.for(57): warning #7359: The cH edit descriptor has been deleted in Fortran 95.   [1H ]
Source1.for(5): warning #5436: Overlapping storage initializations encountered with P21FX

It is clear that the program is non-confirming and the fact that Ifort does not correctly guess what the programmer had in mind with some meaningless code is not a problem for IMO.

Wayne_L_ · ‎08-29-2017

I agree with "the compiler (Intel(R) Visual Fortran Compiler 17.0.4.210 [IA-32].) does not say so, I think it probably should."!!

THAT would have saved hundreds of man-hours!

I/we got the problem fixed several months ago, so further customer-service recommendations don't help, esp those calling for use of OLDER compilers that we are _barred_ from using. I had enough trouble with the Compaq dfor.lib problem... thanks?

The code 'snippet' is meaningless on purpose! Sorry if you winced... I created/hacked it to produce the erroneous "Infinity" (often actually NaN) output so that the problem IVF had with compiling the EQUIVALENCE statement would be obviated. Also, at one point I'm sure that I tried /stand:f95 ; but several mere warnings which point to the wrong lines were unhelpful, as none explain the actual error. They don't even hint at anything that could cause the erroneous EXE.

Devorah_H_Intel · ‎08-29-2017

If you have paid support then please submit a bug or feature request at https://supporttickets.intel.com/?lang=en-US

andrew_4619 · ‎08-29-2017

Wayne L. wrote:
Also, at one point I'm sure that I tried /stand:f95 ; but several mere warnings which point to the wrong lines were unhelpful, as none explain the actual error. They don't even hint at anything that could cause the erroneous EXE.

I fairness the warnings 2,3 and 4 in my last post do actually tell you that the code is woefully bad and to expected unexpected results. This is not conforming code. You chose to ignore these warning I presume because there are so many nonconformances that is is too much work to fix the code. In the past a compiler acted on this code in a manner that the vendors saw fit (or it just behaved in a particular way by chance) and a coder made use of that fact. Some other compiler might behave in an entirely different way because it is not conforming so there is no defined 'correct way'.

I will also add that you are using the Ifort compiler with all the modern checks switched off, implicit typing, obsolete language features, no standard compliance etc, etc etc. You are using a circular saw with the safety guards removed, expect to lose some fingers.

Wayne_L_ · ‎08-29-2017

I didn't write these old codes, and am aware that they are rife with coding techniques that do not conform to modern software engineering practice. I haven't even done any Fortran coding in many, many, years!

While ideally all the non-conforming practices could be resolved, our/my objective is/was simply to get them working again. That I did (in this case with some help), and yes, I did lose money over it... but at least now I have professional oversight which agrees that the problems this particular code had were VERY odd, unusual, nonconformist, whatever.

Thanks

Devorah_H_Intel · ‎12-10-2017

I have escalated this issue.

The following is the result of the investigation:

----------------

The problem occurs because the last 105 elements of the array are each initialized twice, once with the value 0.0 (line 32 of the test program), and once with a non-zero value (lines 33-45). Only one value can be put in memory for each element. The 0.0 values are being used now, while the desired values were being used in the past. Multiple initializations of the same location in memory is a violation of Fortran standards, I believe all the way back to Fortran 77 or possibly Fortran 66, but no compiler that I am aware of can fully diagnose this as it can happen across multiple compilation units, and across files.

There is another violation of the standard, in that the array P21F is dimensioned (15,15), a two dimensional array, but is referenced in an EQUIVALENCE statement at line 13 as a one dimensional array. This violation of the standard can be diagnosed compiling with the –stand (Linux) or /stand (Windows) option. This is NOT the cause of the errors however. P21F(121) can be replaced with P21F(9, 1) (the 121 element of the array) and the last 105 elements still get initialized to 0.0 instead of the desired values.

A third violation of the standard is that a common block is being initialized in the main program. Common block initialization is permitted by the Fortran standards only in BLOCKDATA subprograms.

The code is obviously an old code which was written in this manner out of necessity. Fortran 77 allowed code in columns 7 – 72 only, with only 19 continuation lines. The style of data statement used here to initialize the array would not fit in 19 continuation lines, so it had to be broken up. The only valid way to do this across multiple data statements would be to provide individual subscripts for each initial value, such as

DATA P21F(1,1) /1.1374/, P21F(2,1) /1.1342/ …

DATA P21F(1,2) /1.3038/, P21F(2,2) /1.3035/ …

Because of these limitations, various extensions to F77 compilers were introduced by vendors, and other vendors supported them for portability. Unfortunately, for backward compatibility vendors must continue to support them.