Is there a problem in Fortran 15.4 that has been fixed in 15.6

Stephen_Painchaud · ‎05-12-2016

We have a modeling code that we have been using since 2010 and for years we compiled it with Fortran 11.x. This year we upgraded to 15.4 and noticed the code became unstable, sometimes giving incorrect answers. Today I tested with 15.6 and the problem seems to be gone. I mentioned to those in charge that we should upgrade to 15.6 but I was told to find the bug instead. So now I have two computers running side by side with the different compilers as I try to find the line(s) of code that need to be modified to work in 15.4.

Before I get in too deep I thought someone might know what has been changed, so I know what to look for. Otherwise this could take days, and would be a waste of time if there was an actual bug in 15.4. Can I get some ideas what the problem might be?

Steven_L_Intel1 · ‎05-12-2016

You can read the list of fixed issues at https://software.intel.com/en-us/articles/intel-composer-xe-2015-compilers-fixes-list I'll comment that the symptom you describe is so vague that you are unlikely to find anything helpful there.

Stephen_Painchaud · ‎05-12-2016

I was hoping for some clue as to what might cause incorrect results. I see one item in update 5, but it does not tell me much.

DPD200369981

Fortran

Incorrect results disappear if unused variables renamed or removed

TimP · ‎05-12-2016

I see differences even in 15.0.6 (Intel64) according to whether I build inside VS or from command line, with the 15.0.6 GUI compilation being the most bullet-proof of any recent versions . I'd be happy if I could just sit back and avoid problems by using that version.

15.0.4 doesn't work with current VS2015, so I'm not inclined to try to revert to it.

I suspect some differences may be associated with the degree of inter-procedural optimization, which doesn't appear always to respond to the Qip nor Ob options. VS properties show Qip being off by default. In some versions, subscript range checking (which is enabled by default in debug build), combined with normal optimizations for release mode, will flag some problems (which I haven't been able to relate to any fault of my customer's source code).

Interprocedural optimizations will aggravate problems with range over-run or failure to set SAVE; in the obvious cases, the fault will show up with range checks or Inspector even without Qip.

Stephen_Painchaud · ‎05-12-2016

I think I found the source of the problem in 15.4. The CDEXP function is giving incorrect answers in my code. I can run the 15.6 and 15.4 versions side by side and see a big difference. I tried to isolate the problem in a smaller piece of code below.

    program TestBadFortran
    implicit none
    COMPLEX*16 P, EXZ
    DOUBLE PRECISION z1, z2, pr, pi
    COMPLEX*16 CDEXP
    pr = Z'402EA029A106D663'
    pi = Z'BF7564F8D446D22A'
    P = DCMPLX(pr,pi)
    z1 = Z'40A1A687BBCB6316'
    z2 = Z'40A14728F84CA97A'    
    EXZ=CDEXP(-P*(z1-z2))
    WRITE(*,*) 'answer = ', EXZ
    end program

I used hexadecimal to get the exact numbers used in the modeling code that failed. The correct answer is

EXZ = (7.372096780634534E-318,1.875112523691908E-318)

In my modeling code I was getting

EXZ = (2.619391604368174E-310,9.019330909918942E-310)

Unfortunately this example does not give the incorrect result when run under 15.4, so I guess there must be some other complication in the full code.

Steven_L_Intel1 · ‎05-12-2016

You're into denormal territory and losing precision. How can you claim that either of those results are "correct"?

Math library changes don't make it into the fixes list, but so far you haven't shown that it is a math library issue.

TimP · ‎05-13-2016

Among the measures needed to get predictable results in this example are setting Qftz- or an equivalent option (fp:precise?) (for compilation of main program) or call ieee_set_underflow_mode(.true.). -stand throws complaints.

I might have thought such a setting could be part of -standard-semantics, since the most serious performance implications were fixed several years ago along with introduction of AVX.

Stephen_Painchaud · ‎05-13-2016

It is true that I have not proved that one answer is correct vs. the other. I suppose I could do the calculation, but it is not necessary. The example program I show always gives what I claim is the correct answer, no matter what compiler I use. It is only in the context of the complete Fortran DLL that the code provides a different answer depending on compiler used. In the complete Fortran DLL this piece of code is looped over 29 times, each time accumulating a dozen values, and it is only on the 29th loop that the two versions of the DLL differ, and only for 4 of those dozen values.

So I am not saying that there is something wrong with the CDEXP function in 15.4. If there was a problem my group would have seen many more numerical codes fail as CDEXP is used in all of them.

What I am saying is that I have a Fortran project that can be compiled with two versions of Intel Fortran and that 15.4 sometimes gives results different from 11.x and 15.6. In the context of a numerical simulation we will see a few isolated points with large deviations from the correct answer. These bad points are clustered in an area where the calculations are "sensitive". I need to understand if this is a problem with that version of the compiler, in which case we will upgrade, or a defect in the code, which has been revealed after many years of use and millions of simulations. Running the two compilations side by side revealed the first difference occurring in a line of code that have already been looped over 28 times. I cannot replicate the problem with my test code above.

It would be nice to know the reason for this issue, which is why I am mentioning my findings here. I don't see a problem with the code and I don't think there is a problem with CDEXP, so that leaves a mystery. I will recommend to my bosses that we upgrade to 15.6.

Steven_L_Intel1 · ‎05-13-2016

As a point of information, the versions you are referring to are 15.0.4 and 15.0.6. 16.0.3 is current.

What I usually do in such cases is "instrument" the code and run it both ways, finding which key calculation differs between the versions. It's entirely possible that the difference is due to an optimization change that orders things differently. See the attached document.

443546