32-bit to 64-bit Fortan Conversion. Runs Slower??

yood · ‎11-23-2004

Hi, I'm converting a large base of fortran code from 32-bit to 64-bit Windows. I have Intel Fortran Compiler 8.1 on an HP zx6000 Itanium2 workstation.

In comparing the execution times (via ETime), 64-bit code on an Itanium2 runs about 3 times faster than 32-bit code.However, I notice that there are cases where 64-bit code runs about 3 times slower than on a 32-bit machine.

For example, one such case I'm investigating deals with a Read loop:

open(3,file=datafile,err=9020,buffered='YES')

loop here
read(3,*,iostat=ios) (z(j), j=1,ncol)
end loop

compiled thusly:
ifort myProg.for /O3 /G2 /Qparallel /assume:buffered_io /link /out:myProg.exe

The 32-bit code is generated using MS Visual Suite with Compaq Visual Fortran Professional Edition6.6A. (with Windows 2000 OS)

Below are 2 charts. They both show the execution times (as retireved by ETime()) for execution about the read statement in a loop. 32-bit and 64-bit execution times are depicted, respectively.

32 bit total time user time system time
- open data file, 0.062500000, 0.015625000, 0.046875000
- before read loop, 0.078125000, 0.015625000, 0.062500000

- before read - b, 0.093750000, 0.015625000, 0.078125000
- after read - b, 0.109375000, 0.015625000, 0.093750000

- before read - b, 0.125000000, 0.015625000, 0.109375000
- after read - b, 0.125000000, 0.015625000, 0.109375000

- before read - b, 0.156250000, 0.015625000, 0.140625000
- after read - b, 0.156250000, 0.015625000, 0.140625000

64 bit total time user time system time
- open data file, 0.029999999, 0.000000000, 0.029999999
- before read loop, 0.050000001, 0.000000000, 0.050000001

- before read - a, 0.050000001, 0.000000000, 0.050000001
- after read - b, 0.059999999, 0.000000000, 0.059999999

- before read - a, 0.070000000, 0.000000000, 0.070000000
- after read - b, 0.079999998, 0.000000000, 0.079999998

- before read - a, 0.089999996, 0.010000000, 0.079999998
- after read - b, 0.100000001, 0.010000000, 0.090000004

It seems to me that the 32-bit code is reading in a block initially and then successive reads take effectively no time(?). The 64-bit code seems to cause a physical read each time thru the read loop. I've tried setting the blocksize and buffercount specifiers in the Open statement, but the results do not improve.

Any thoughts as to what I'm seeing here?
Any way to improve my reading time?
Any other ways to diagnose what is happening here?

Thankx,
Mark Wood