Moving code segment changes results?

chstoyer · ‎05-19-2011

I am working on a 3-D time-domain electromagneticfinite difference program using VS 2010 and Composer XE (v 12 IVF). I have a code that runs just fine in a stand alone short program that just reads one ASCII file, performs the calculationsand writes another. When I put it into a much larger program with Winteracter user interface, it gives garbage results. I tried moving the code over twice, getting what looks like the same bad results. Second time was after reorganizing the stand alone code to include some of the code from the larger program and getting that to work. I also compiled this second attempt under CVF and got the same bad results.

The last thing I tried was to move some code within the stand alone program. Basically, I moved some code from a subroutine and removed the subroutine and call:

...
...
CALL SUB
....
....

SUBROUTINE SUB

CODE...
RETURN
end

Replacing CALL SUB with CODE... (the headers are the same) changes the results in the fifth significant figure in some cases by 1 digit. For this test the grid is 131 x 131 x 64 and there are 1379 time steps, so to be sure there is a lot of room for roundoff error, but I am wondering why it depends on the location of some of the code? Code was originally in different files.

The code I moved does not do any of the FD calcuations, just sets up some stuff. I am not sure which result is "better".

I tried the same experiment in CVF. The resultsbefore the move were different from the IVF results, about 3-4 digits in the fifth place (no big surprise here), and there was no difference between the results before and after the code was moved.

I am running IVF on an HP Win 7 Home Premium machine with 6 Gb RAM and CVF on a machine with Win XP pro and 2Gb RAM.

Comments welcome. Thanks for any advice.

Charles

jimdempseyatthecove · ‎05-19-2011

If the problem relates to a location dependency then check for uninitialized variables.
Otherwise, compiler option differencesrelating to FP may be an issue. Is default REAL type (4) or (8)?

Jim Dempsey

chstoyer · ‎05-19-2011

Jim,

Thanks for your reply.

I have set the compiler options to save all variables and init them to zero. But I will check that again.

All allocated arrays are set by the compiler to a large negative number, all the same. I am sure I am initializing them, I have been very careful about this.

I am using single precision, REAL 4. I found that for practical applications, double precision is not necessary. It is better, but not necessary.

Charles

jimdempseyatthecove · ‎05-20-2011

First thing to do is to verify that you are working with the same input data files.

Add diagnostic code to both the working stand alone app and the integrated app such that it prints out the input data as it reads it. What you want to check for is if you are running in different folders and/or with different default directories. Also, if you are running with updated versions of very old code, the old code may be using externaly defined I/O units and the integration process may have redefined these (resulting in wrong files being read).

Does your integration add some degree of multi-threadedness? My guess it does even if it is a simple dialog box or GUI window dressing. Is your code multi-thread safe?

Next: Does your code use COMMONs?

If so, you may be experiencing a name collision, especially with unnammed commons.
In this case, create a new project that builds a copy of your working applicaiton using copies of the original code. Build and verify that this produces the correct results. It should, and if it does, now modify the code to convert the COMMON's into MODULES. Use a MODULE name that you know will not conflict with anyting in the application that you wish to integrate the stand alone program into.

Once this is running correctly, then using the standalone project convert the PROGRAM into SUBROUTINE then add a stub PROGRAM that calls the SUBROUTINE (use seperate files). Compile and test the new stand alone program. What you are checking for is a subtlety between local variables declared in a PROGRAM section as opposed to within a SUBROUTINE. Test, if broke - figure out why, if not broke then incorporate the SUBROUTINE into your larger code (no code changes).

If this fails, then suspect the code you are integrating the SUBROUTINE into is stomping on data "owned" by your subroutine. This type of situation is harder to detect since any changes to your code (diagnostic asserts) would change the location of the symptoms. In this situation note that the same SUBROUTINE is now portable between the standalone app and the integrated app. Add diagnostic code in various places in the SUBROUTINE that computes HASH values of the input and working data. Run the standalone app and produce a list of hash values. Now run the integrated app and produce its list of hash values. Compare the lists. If difference found then the error is occuring (timewise) between the two hash code intervals. Run the test again to see if the error in hash occures at the same time interval (consistent error). My guess is it may not.

Jim Dempsey

chstoyer · ‎05-20-2011

There is only one data input file.

I am running in the same folder. I compile, run, move code, compile, run, undo move code, compile run, difference is 1 digit sometimes in the fifth significant figure like 5.8743E-08 vs. 5.8744E-08. Seems to me that if I am stepping on something, differences should be much greater.

There is a window open and a dialog that shows grid size and calculation progress. This could be it?

There are no COMMON blocks. Data (only) in MODULES or passed as arguments.

I ran in the debugger and all variables are initialized to something (in a data statement in the MODULE) or initialized to zero by the compiler.

I am pushing ahead to get this so it is easy to separate and move the guts to the bigger program, working step by step. Thanks for your help. If you have any more ideas, let me know. I will let you know if I discover anything.

Charles

jimdempseyatthecove · ‎05-20-2011

Make your standalone version as:

PROGRAM SHELL
CALL OldProgramAsSubroutine
END

With your OldProgramAsSubroutine as a seperate project as astatic library. Have this with interprocedural optimizations disabled.

Then build your integrated application using this static library also with interprocedural optimizations disabled. IOW same object files - no recompilation into new app using IPO. Verify that the library does not rebuild between building each application.

See if this produces different results.

Note, I assume you are not making library calls that will affect the results. An example of which would be calling a random number generator in both the parts of the integrated application. Also, if your application uses a timer for convergence (e.g. exit routine on convergence or when nnn ms have expired).

Code placement should make no difference except in multi-threaded applications and where you have a bug in your code relating to shared variables (not properly protected). Is your stand alone app multi-threaded?

Jim Dempsey

Steven_L_Intel1 · ‎05-21-2011

In my experience, code that changes behavior as to where it is placed in the source has one of the following two issues:

1. It changes the relative layout of variables in memory (because it changes the "lifetime" of a variable), masking or revealing data corruption elsewhere in the program
2. The program is compiled for x87 arithmetic and depends on higher precision intermediate results being carried in registers. This latter is much less frequent nowadays as the default is to not use x87 code.

jimdempseyatthecove · ‎05-22-2011

>>or initialized to zero by the compiler

This option initializes static data not dynamic (stack) data.

Depending on compiler implementation (which I haven't checked), the local variables in the PROGRAM _may_ be static as opposed to dynamic (stack). This is an implementation issue.Therefore, if you have any variables in the former PROGRAM section that require zeroing, then when moving this code to a SUBROUTINE you will need to explicitly initialize these variables.

Error in the 5th place? 5.8743E-08 vs. 5.8744E-08
Are there remaining digits not printed by your format statement?
Try using F15.10, alarger precision than necesary, but you will see 5.8743998xxxxE-08 vs. 5.8744001xxxxxE-08.

Since you are compiling the two versions of the program, presumably with the same options, using the same compiler the results should be the same. Especially since if you followed my suggestion to make a static library out of the body of your console app you will be using the same object files. If you are running on the same system this leaves:

a) you have multi-thread issues that do not show up in the console app
b) you are linking in a different runtime library that is used by the computational section
c) you have a latent uninitialized variable issue with both programs

Jim Dempsey

chstoyer · ‎05-23-2011

I have been slowly changing the code to see when something changes and what I have discovered is bizzarre. I will try to answer some questions from steve as well,

If I understand your "x87"question correctly, the code is a finite difference code. The math (co)processor is 80 bits wide, as I understand it. If I were running this in total single precision (4 byte float), the results would likely deteriorate very quickly. The fact that the math is done in 80 bits and stuff is retained in the registers allows these calculations to be done in single precision. Double precision does give better results (valid for later times) but real fieldmeasurements never can be made beyond what single precision can produce due to ambient noise problems so for practical purposes, double precision is not necessary.

Yes, these are 4-byte numbers that have more significant figures than I am writing out in a 1PE13.4 format.

I have replaced all the stuff that was being read from an ASCII file with data statements. Eventually this will become an interactive program and these initial data would be the defaults and all data would be stored in binary files and edited with a user interface using interactivedialogs and graphical methods. I sucessfully eliminated every READ statement and still produced identical results.

Things changed when I eliminated the OPEN statement. Now the differences are in some cases in the third significant figure.I cannot figure out why because nothing is being READ. Through trial and error, I have put the OPEN statement back and, by trialand error,determined where a CLOSE statement can be placed without affecting the results. It is after an ALLOCATE statement.

This is in the section of code that generates the finite difference grid in x, y, z. The dynamically allocated data are immediately filled with zeros before the grid values themselves are inserted. Shouldn't matter anyway because the complier always fills allocated data with something (in CVF it was BAAD F00D Hex).

I do not know if my code is multi-threaded. (How do I find out?)I know it is significantly faster (than the clock speeds suggest) on an Intel duo core processor than on an AMD so I am thinking there is some parallel processing going on here.

To reiterate, the code is not in two separate projects or directories, I am comparing results before I start modifying code with results after modification, so everything is compiled with the same options, etc.

I am attaching a code segment and the comparison of results from WinDiff. The OPEN statement is in the callingprogram. I first narrowed down the location of the CLOSE by finding it worked if it came after the CALL to GridGen3D but not if it came before. It works if it comes after the allocate statement but not if it comes before. The present position isthe last place the CLOSE can be placed and still getdifferent results.

Suggestions welcome, I am not sure where to go from here.

Steven_L_Intel1 · ‎05-23-2011

Are you using the /arch:ia32 compiler option? If not, then x87 code has nothing to do with it. But that also suggests to me that SSE2 vectorization can change results and if the compiler decided to vectorize the code differently in different parts of the program, that could introduce subtle differences. Try adding the /fp:strict option.

chstoyer · ‎05-23-2011

As far asI know, I am not. I looked it up andpoked around and found it in Enable Enhanced Instruction Set under Code Generation. There arefour options there: SSE2, SSE3, IA32 and "Not Set". I am using "Not Set".My command line is

/nologo /Qsave /Qzero /module:"Release\" /object:"Release\" /Fd"Release\vc100.pdb" /libs:static /threads /winapp /c

Most of the options say Fast, Not Setor None. I have specifically added the Qsave and Qzero options for compatiblity with CVF as the code was originally developed there. Most of the others are whatever the default was.

I have been modifying the code and what I have right now does not behave differently when I comment out the OPEN statement. I tried the fp:strict option and it does not seem to make any difference.

But if I run into some funky behavior further along, I will try it and see what it does.

Thanks,

Charles

jimdempseyatthecove · ‎05-24-2011

Charles,

I downloaded your sample code. It was all mashed onto one line - you need to figure out how to post code on this forum.

This asside, editing your file to insert line breaks (at least white space remained to seperate lines), I find that the code has mis-matched IF and ENDIF's

[fortran]    IF(XWidth.GT.YWidth)THEN
        IF(YWidth/XWidth.LT.SkinnyFact)THEN
            WIDTH=XWidth*SkinnyFact 
            FixFactor=WIDTH/YWidth 
            GYAvg=0.5*(GYMin+GYMax) 
            GYMax=(GYMax-GYAvg)*FixFactor+GYAvg 
            GYMin=(GYMin-GYAvg)*FixFactor+GYAvg 
            YWidth=GYMax-GYMin 
        endif 
    ELSE IF(XWidth/YWidth.LT.SkinnyFact)THEN 
            WIDTH=YWidth*SkinnyFact 
            FixFactor=WIDTH/XWidth 
            GXAvg=0.5*(GXMin+GXMax) 
            GXMax=(GXMax-GXAvg)*FixFactor+GXAvg 
            GXMin=(GXMin-GXAvg)*FixFactor+GXAvg 
            XWidth=GXMax-GXMin 
    endif 
    endif 
[/fortran]

and later

[fortran]    !EdgeSig(1:NX,1:NY,1:NZ,1:3), IF(IALERR.NE.0)THEN LCANCEL=.TRUE. 
    WRITE(CNUM,"(I5)",IOSTAT=IOERR),IALERR 
    CALL WINMSG('Allocation of x, y, z grid specs failed: '//CNUM) 
    RETURN 
    endif 
    ! in order to avoid index checking, these arrays are defind beyond where they are specified and used. 
    ! set all elements to zero before generating values so all are defined. 
    DX(0:NX)=0. 
[/fortran]

I am surprised your code compiled without error (I get errors here)
Also missing was your GridData module file (which I dummied up)

Jim Dempsey

IanH · ‎05-24-2011

Jim - you have far more patience than me if you've reconstituted that file without line breaks! That said, the download has come good here - the "ELSE IF (condition)" bit on line ten (and maybe elsewhere) should have been two separate statements, which swallows the stray end ifs. But without the module it's impossible to diagnose further.

(Steve - if you are reading - why do we (now ?) "lose" the file name when out brower prompts for the directory to save the file in (we see the forum software's internal numeric ID instead)? I don't recall this behaviour previously.).

This statement about allocated variables in a previous post bothers me a bit: "Shouldn't matter anyway because the complier always fills allocated data with something (in CVF it was BAAD F00D Hex).". This isn't true. I can see where you zero initialise in the posted code, but if you are (accidentally) counting on this elsewhere then trouble is not far away.

Silly question, have you done a run with all the runtime diagnostics enabled and seen what happens? Even if the slower debug version of the program takes a week or so to run, it can be useful to see if it trips up over anything.

This bit also worries me: "The math (co)processor is 80 bits wide, as I understand it. If I were running this in total single precision (4 byte float), the results would likely deteriorate very quickly. The fact that the math is done in 80 bits and stuff is retained in the registers allows these calculations to be done in single precision."

If you need better than single precision for your calculations then you need to ask for it by appropriate typing (kinding) of your REAL variables.

(The 80 bit thing was sort of true for x87 code, but even then, you as programmer had very little control over when the compiler moved things to and from the x87 floating point registers to memory (at any stage during the evaluation of an expression the results could be truncated back to 32 (or maybe 64) bit precision). Add a variable here, move a bit of code there, compile on a Tuesday while facing south, and suddenly you've got different results. Worse, these days (for over a decade?) there's a whole heap of silicon on the chips dedicated to doing lots of floating point calculations relatively quickly in 32 bit and 64 bit precision, which any optimising compiler worth its salt is going to use (you asked for single precision, you get it!) and so the whole 80 bit precision thing goes completely out the window.

Hence the previous questions about the IA32 option - which prevents the compiler from using all that fancy, dedicated, fast, floating point stuff. You need to explictly enable this with recent versions of the compiler if you want it ("Not set" on the Enhanced Instruction Set option still allows the compiler to use some of the vectorised floating point capability - I suspect PC's without the required capability would be getting rather ancient). But you don't want to go down this path.)