Access Violation WIN32(Debug), X64(Debug) & x64(Rel) ok but not WIN32(Rel)

Moore__John1 · ‎03-22-2020

I've just upgraded the 2020.0.166 version and on testing this version on existing code I found some unexplained behavior.

The offending code section is shown below. The arrays are all dynamic and this code is located in the main program.

The compiles program runs ok for both WIN32 debug, x64 debug and x64 release but fails for WIN32 release.

The WIN32 release fails due to: forrt: severe(157): Program Exception - access violation

This is the problem section of code.

DO 305 ICASE=1,NLCASE
TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
DO 305 I=1,NXE
DO 305 J=1,12
305 FRC(I,J)=FRC(I,J)+EXFRC(I,J)+CUFRC(I,J,ICASE)*TFAC(LCURVE(ICASE))
&*CFACT(ICASE)

Re-arranging the code has shown that the issue is addressing the 3-D array. If a print statement is placed in either of the two inner loop (I or J) the code run ok. Just adding print* and it runs.

Any explanation for this behavior.

jimdempseyatthecove · ‎03-24-2020

I've seen some issues with nested "DO nnn ..." where all the nests use the same tag.
Also, while the compiler optimization should be able to swap the I and J loops, it won't hurt to help it make the decision.

       DO ICASE=1,NLCASE
         TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
         DO J=1,12
           DO I=1,NXE
             FRC(I,J)=FRC(I,J)+EXFRC(I,J)+CUFRC(I,J,ICASE)*TFAC(LCURVE(ICASE))
     &*CFACT(ICASE)
           END DO
         END DO
       END DO

Jim Dempsey

Moore__John1 · ‎03-24-2020

I've tried numerous arrangements of the loops all with the same result. Even including dummy variables.

It only works when a diagnostic print statements are included in an attempt to catch the error that then fails to occur. Works just printin a blank line.

DO 305 ICASE=1,NLCASE
TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
DO 305 I=1,NXE
print*
DO 305 J=1,12
305 FRC(I,J)=FRC(I,J)+EXFRC(I,J)+CUFRC(I,J,ICASE)*TFAC(LCURVE(ICASE))
&*CFACT(ICASE)

jimdempseyatthecove · ‎03-24-2020

Did you try the change of the nested DO 305 to do without 305 and the addition of the 3 END DO's after 305 FRC(...

(the 305 on that line can be removed)

If this does not work then if you can make a simplified reproducer and post it on the Bug page (button on forum section page)

Jim Dempsey

mecej4 · ‎03-24-2020

If the array FRC is zero before entering the lines of code shown in #1, you could try using array expressions:

frc(1:nxe,1:12) = nlcase * exfrc(1:nxe,1:12)
totfact(1:nlcase) = tfac(lcurve(1:nlcase))*cfact(1:nlcase)

do icase = 1, nlcase
   frc(1:nxe,1:12) = frc(1:nxe,1:12) + cufrc(1:nxe,1:12,icase) * totfact(icase)
end do

GVautier · ‎03-25-2020

Moore, John wrote:
It only works when a diagnostic print statements are included in an attempt to catch the error that then fails to occur. Works just printin a blank line.

It is often symptomatic of a stack corruption due to array overflow or invalid function or subroutine calls somewhere in the program.

Moore__John1 · ‎03-25-2020

There is definitely no issue with the specific code as I tried so many variations.

It works ok if I set the Fortran optimization to Minimum Size and Favor Fat Code. With the previous compile version I used Maximum Speed and Favor Fast Code with this program. However the newer compiler would appear to produce faster code with the Minimum Size option - so all good.

Just hope there's not a ticking time bomb because in my experience irregular behaviour is usually due to dubious coding.

GVautier · ‎03-26-2020

If changing optimization options "solve the problem", your problem lies elsewhere.

LCURVE, CFACT, EXFRC,CUFRC,TFAC : Are they arrays or functions?

I persist to think that you have an array overflow probably somewhere in a subroutine or function call.

Try to enable all runtime check options.

Moore__John1 · ‎03-26-2020

All variables are dynamic arrays and nothing comes up when all runtime checks are enabled.

After a bit more probing, cause and effect indicate to me an fast code option optimisation issue.

Having identified that not activating the speed optimisation enabled the solution to run and the fact that it always ran in debug mode with no speed optimisation does indicate that speed optimisation is a controlling factor. Additionally the fact that a simple print* statement enables the program to run would imply that the presence of the print* statement prevents the optimisation of this section of code.

With this in mind I re-arranged i.e. eliminated the inner loop to that shown below and the program runs ok.

DO ICASE=1,NLCASE
TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
CUFAC=TOTFACT(ICASE)
DO I=1,NXE
! DO J=1,12
FRC(I,1:12)=FRC(I,1:12)+EXFRC(I,1:12)+CUFRC(I,1:12,ICASE)*CUFAC
! ENDDO
ENDDO
ENDDO

Logic would indicate that there is an issue with the speed optimisation of CUFRC, a dynamic single precision 3D array, in this section of code (using Do constructs). Is it a 3D dynamic array issue? There are about 120 dynamic arrays in the program but only one 3D array.

Replacing CUFRC with a static array scufrc in the above code by using scufrc(1:nxe,1:12,1:1)=cufrc(1:nxe,1:12,1:1) beforehand shows that a static array works and the solution runs ok with the J loop active. Then surprisingly with the J loop active cufrc worked providing scufrc(1:nxe,1:12,1:1)=cufrc(1:nxe,1:12,1:1) is still present beforehand.

Conclusion: I shall avoid where possible using Do constructs to equate arrays in future code. e.g. the original 5 line code section is now in the following 5 line format:

DO ICASE=1,NLCASE
TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
CUFAC=TOTFACT(ICASE)
FRC(1:NXE,1:12)=FRC(1:NXE,1:12)+EXFRC(1:NXE,1:12)+CUFRC(1:NXE,1:12,ICASE)*CUFAC
ENDDO

jimdempseyatthecove · ‎03-26-2020

John,

the reason that I suggested you swap the i and j loops to thus:

  DO ICASE=1,NLCASE
    TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
    DO J=1,12
      DO I=1,NXE
        FRC(I,J)=FRC(I,J)+EXFRC(I,J)+CUFRC(I,J,ICASE)*TFAC(LCURVE(ICASE))
&*CFACT(ICASE)
      END DO
    END DO
  END DO

is because the varying of the 1st index advances one cell in memory. IOW

array(I+1,J) immediately follows array(I,J)
whereas array(I,J+1) follows array(I,J) by the size of the 1st dimension

Your original code was using strided references. This complicates opportunities for vectorization, which when compiling with optimizations, the compiler assesses, and when opertune, generates vectorized code. And in this case.... bad code.

By rearanging the indexing, you would have eliminated the strided reference, and thus may have not only eliminated the error, but also improved performance.

Your latest format, eliminating the I and J loops, will permit the compiler to see it can favorably vectorize the code.

The reason I mention this here, while this specific problem is resolved, other places in your code may be using unfavorable loop nest level order. Loop order is something you should be paying attention to.

Jim Dempsey

Steve_Lionel · ‎03-26-2020

I will throw out the usual caution that a behavior change under optimization is not necessarily an indication of invalid optimization. More often it reveals a program source bug that was hidden otherwise. The same goes for changes when you insert a print statement.

What I would sometimes do is capture the inputs to the suspect code and create a standalone test case that uses the input to perform whatever the operation was. Ideally this would be at the subroutine/function level with a "driver" main program that sets things up.

It is inappropriate to assume that some particular construct should be avoided just because, in your program that construct related to a problem.

As always, if you can't identify a program bug, send support a test case and let them poke at it.

Moore__John1 · ‎03-27-2020

Thanks for advice and suggestions, much appreciated.

Jim, Yes changing the loop order did eliminate the conflict. The array does nothing special other than act as a fast memory dump for time dependent data so the loop order is not significant. But this is rarely the case and more often the loop order cannot be changed when e.g. manipulating matrices. What you're implying is that if the code cannot be optimised the compiler can/will produce junk if optimisation is active. This would be very worrying.

Steve, Your comment regarding the non avoidance of legal constructs is very true and thankfully so. The original Do construct works perfectly well if speed optimisation is not active or when the fixes blow are implemented.

I have two compiler side by side. Both use the same optimisation settings on the same code. When Maximum Speed Optimisation is active the latter version does not run but the earlier version does. This indicates a change in compiler behaviour with identical perfectly legal code - a bug?

The latter version would appear to be attempting to optimise a section of code that it should not whereas the earlier version is not. Each the following measures have prevented optimisation with the new compiler and enabled the original Do construct code to run.

Adding a print* statement in either of the inner loops
Changing the loop order
Switching the index order of 3D array CUFRC(I,J,ICASE) to CUFRC(ICASE,I,J)
Using array subscript expressions
Removing the EXFRC array in line 305

In my book this indicates a compiler bug with but a simple workaround.

Although always a pain the upgrade, the latest compile is worth the effort. Indications are that the 64 bit compilation produces code that is 40% faster than the 32 bit version (both with fast code optimisation) . The older compiler did not produced such improvements.

GVautier · ‎03-28-2020

If it is a compiler bug, the following simple code completed by the arrays and parameters declarations

DO 305 ICASE=1,NLCASE
TOTFACT(ICASE)=TFAC(LCURVE(ICASE))*CFACT(ICASE)
DO 305 I=1,NXE
DO 305 J=1,12
305 FRC(I,J)=FRC(I,J)+EXFRC(I,J)+CUFRC(I,J,ICASE)*TFAC(LCURVE(ICASE))
&*CFACT(ICASE)

should reproduce the error. Does it?

What are the values of NLCASE and NXE? Is NXE greater than 12? If it is, what happen if you set NXE to 12?

Moore__John1 · ‎03-29-2020

Rather than compiler bug a more appropriate description would be irregular behaviour under a specific range of circumstances. In other words "I can't demonstrate the problem in a sample of simple code".

Providing the values of nxe(1-32000) and nlcase(1-999) are within the bounds of the array declarations this code would be expected to run and it always does except when Speed optimisation is active in the latest compiler and with this particular program. Just to provide a bit of background this section of code was added to this program in the late 90's (Watcom 77 compiler) and has never been touched since that time. Other parts of the code has obviously been changed but not these specific lines. The program has since been compiled on Compaq/Intel compilers.

I have re-created this section of code in a small program using the same data declarations and even reading the data from the original program prior to the crash. This test program run without issue - perfect behaviour. Further more the difference between Speed and Code Size optimisation is very evident from cpu timings. Contrary to an earlier opinion that this section of code may not be optimised, it is. 2.68, 0.32 & 0.125, non, size and speed respectively - on fire!

Here is another couple of fixes I discovered prevented the access violation error. Consider the J loop. If J is 2bit it will run providing 12 is replaced by variable. If J is 4bit it runs with either the 12 or a variable. That makes 7 minor changes that make this section of code behave as expected. Is this not strange?

However, this behaviour is only on one particular model. On another validation model I was surprised to see that the model's solution did not converge. No violation error, the program ran, but no valid solution.

In my experience WIN32 compiled with Maximum Speed Optimisation is not reliable.

Speed optimised WIN32 will not be used again. 64 bit for future release code (long overdue).

GVautier · ‎03-29-2020

If there is a stack corruption somewhere, the error message may be not pertinent or an array descriptor containing the bounds may be altered.

If you cannot reproduce the problem in a simple case, that reinforces the suspicion that the problem come from elsewhere.

That kind of problems are very difficult to fix.

GVautier · ‎03-29-2020

I encountered a similar problem 25 years ago. The only way I found was to analyze the assembly source code. I have been able to determine it was a compiler bug.

But today, I don't know if the assembly source code generated by optimization can be understood .

jimdempseyatthecove · ‎03-29-2020

/Qipo-S generate a multi-file assembly file (ipo_out.asm)

Jim Dempsey

Moore__John1 · ‎04-01-2020

The non-convergence I commented on is not related to the compilation of the speed optimised 32 bit code it was data related.

The bottom line is that simply removing the EXFRC array out of the original nested loops eliminates the runtime access violation error. Using static array i.e. making EXFRC static within the loop also causing no conflict. This behavior cannot be demonstrated on smaller sample code.

Reliable results are obtained for both 32 and 64 bit release code across a wide range of validation solutions when EXFRC removed indicating the code fix is stable.