32-bit versus 64-bit binary data files

gregfi04 · ‎11-27-2008

Should there be any difference between the binary output of a code compiled with the 32-bit compiler and the 64-bit compiler?

My problem is this: Code A is compiled using the 32-bit version of ifort (10.1.017). It doesn't pass its V&V suite when compiled with the 64-bit compiler. Code A is rather ugly old Fortran, and I have no desire to chase down exactly what's wrong with it.

Code B creates binary data files that are read by Code A. When Code B is compiled and executed with the 64-bit version of ifort (10.1.015), Code A starts behaving in a way that is obviously not correct (spitting out lots of "NaN"s). When Code B is compiled and executed with the 32-bit version of ifort (10.1.017), things look normal. Absolutely nothing is changing in the input or Makefile of Code B between compile/executions. The compile flags are (-traceback, -vec-report0, and -O1).

Any ideas as to what may be happening? How different are these two compilers?

(I've noticed that the 64-bit version of ifort tends to be a little more picky about things. I found a bug in Code B where it was attempting to write beyond the allocated bounds of an ALLOCATABLE array. The 32-bit version of ifort tolerated this, but the 64-bit version caused a crash. Same compile flags and everything. There's a slight difference between the versions of the compilers: 10.1.015 vs 10.1.017. Could that be causing it?)

Thanks,

Greg

Steven_L_Intel1 · ‎11-28-2008

Data files should be identical assuming that you don't have data items such as pointers which vary in size. As for being picky, it's really that whether you get an error on writing out of bounds depends on memory contents and instruction sequences - it isn't a deliberate thing. If you turn on bounds checking (-check bounds) then there should be no difference.

It sounds as if you're skating on thin ice here if your application is writing out of bounds. You need to resolve that first before worrying about data differences.

gregfi04 · ‎11-28-2008

Quoting - Steve Lionel (Intel)

Data files should be identical assuming that you don't have data items such as pointers which vary in size. As for being picky, it's really that whether you get an error on writing out of bounds depends on memory contents and instruction sequences - it isn't a deliberate thing. If you turn on bounds checking (-check bounds) then there should be no difference.

It sounds as if you're skating on thin ice here if your application is writing out of bounds. You need to resolve that first before worrying about data differences.

Steve,

I've done the "-check bounds" thing, cleaned up the out-of-bounds issues, but I'm still having the same problem. Could you elaborate a little more on what else may be causing the problem?

The code does use pointers, of sorts. All of the real data is read into a huge container array. The array size, in this particular case that's failing, is around 1/2 GB. Locations of where various data elements are stored are tracked with KIND=8 integers. But all of that is internal to Code B. I can't imagine how it would affect the binary output.

The binary output simply consists of a single, 4-dimensional array. There are a series of do loops that loop through each of the subscripts, but that's really all there is to it.

Greg

Steven_L_Intel1 · ‎11-28-2008

No, I can't elaborate as I don't have your application. Probably the first thing I'd do is a "od -x4" of the file on both systems and see what's different. I'd then find the spots in the program that write the data where the difference is and figure out why it's different. The run-time errors also need debugging. It's not a problem amenable to general recommendations.

roddur · ‎11-30-2008

well, as far as i have understood,with high cahnce of going wrong, that you are trying to run a binary data file(FMT=unformatted) generated in 32 bit machine in a 64 bit one!! ami i right? if i am,it is not simply possible.

Steven_L_Intel1 · ‎12-01-2008

Quoting - roddur

well, as far as i have understood,with high cahnce of going wrong, that you are trying to run a binary data file(FMT=unformatted) generated in 32 bit machine in a 64 bit one!! ami i right? if i am,it is not simply possible.

Sorry, I don't agree. It is not only possible but expected and you should not have to do anything special to make it work.

gregfi04 · ‎12-01-2008

Quoting - Steve Lionel (Intel)

Sorry, I don't agree. It is not only possible but expected and you should not have to do anything special to make it work.

As an addendum, I've been fiddling with the compiler flags for Code A, and found that when all of the optimization is turned off ( -O0 ), the problem goes away. Bizarre.

The downside is that the code executes at about 35-40% of the optimized speed (i.e. the speed observed with -O2 and 32-bit-generated data files from Code B). I think I'm willing to live with that in exchange for not having to track down the real problem in Code A.

Interestingly, Code C, which is closely related to Code A and also reads data files from Code B, was working just fine with 64-bit-generated data files from Code B. Given the problems with Code A, I thought it might be prudent to back off the optimization level for Code C, also. When I go from -O2 to -O1, Code C starts doing the same thing that Code A did with 32-bit-generated data files. If I go all the way back to -O0, it works, but again, I pay a very heavy performance penalty.

Greg

Steven_L_Intel1 · ‎12-01-2008

You either have errors in the code or there is a compiler bug. If you can provide a complete test case along with any data files needed, we'd be glad to take a look.

gregfi04 · ‎12-01-2008

Quoting - Steve Lionel (Intel)

You either have errors in the code or there is a compiler bug. If you can provide a complete test case along with any data files needed, we'd be glad to take a look.

I'm fairly sure it's the former, but I'm not at liberty to transmit the code or the data.

I am able to get Code C to run with -O1, -O2, and -O3 now by using the "-save" and "-zero" compiler flags, which, I believe, I was supposed to do originally. By including -save and -zero in Code A, I can successfully run -O1, but not -O2 or -O3. I'm pretty satisfied with this.

Steven_L_Intel1 · ‎12-01-2008

You may want to play with the Static Verifier feature - -diag-enable sv2 . For Fortran code I find it gives a lot of false errors and warnings, but every once in a while it turns up something interesting. Read more about it in the documentation.

roddur · ‎12-01-2008

Quoting - Steve Lionel (Intel)

Quoting - roddur

well, as far as i have understood,with high cahnce of going wrong, that you are trying to run a binary data file(FMT=unformatted) generated in 32 bit machine in a 64 bit one!! ami i right? if i am,it is not simply possible.

Sorry, I don't agree. It is not only possible but expected and you should not have to do anything special to make it work.

hello steve, probably i am running more into my own query then actuly help to solve the thread, but as in many books(offhand, chapman's book), it is clearly written that the unformatted files can not be "moved between different type of procs."

So , am i missing something?

Steven_L_Intel1 · ‎12-02-2008

The books are being overly cautious. There is no standard for the on-disk structure of a Fortran unformatted file. Different compilers on various platforms implement these differently. You also need to worry about issues such as byte ordering and floating point types.

That said, when using the Intel compilers, the unformatted data formats are the same across all the platforms we support, 32 or 64-bit, Windows, Linux and MacOS X. The format we use is the same as many (but not all) other compilers on those platforms. An exception is g77, early versions of gfortran and perhaps Sun's compiler when it comes to 64-bit platforms. These use a different on-disk structure that Intel Fortran does not support. gfortran adopted Intel's structure a couple of years ago, where the record lengths are always 32 bits except when the record is large (over 1GB), in which case a special flag indicates that the record length is larger. (This format is supported on both 32 and 64-bit platforms.) Those other compilers always used a 64-bit length on 64-bit platforms no matter what the record size. Last I heard, Sun either used 32-bit lengths only or also used 64-bit lengths - the irony here is that the format we use was suggested by a Sun engineer about 10 years ago but he couldn't convince his own company to use it!