Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

slow I/O on AMD Athlon MP2800+

tsaue
Beginner
863 Views
I have been comparing the performance of g77 and ifort on an
AMD Athlon MP2800+. I note that ifort gives generally slightly
faster code, but is incredibly slow on I/O ! To test this further
I wrote the following little FORTRAN code:

PROGRAM IOTST
IMPLICIT REAL*8(A-H,O-Z)
INTEGER*4 TIME
REAL*4 ETIME, TARRAY(2)
PARAMETER(N=10000)
DIMENSION A(N,N)
CPU1 = ETIME(TARRAY)
WALL1 = TIME()
WRITE(1,*) A
CPU2 = ETIME(TARRAY)
WALL2 = TIME()
CPUT = CPU2-CPU1
WALLT = WALL2-WALL1
WRITE(6,*) 'CPU: ',CPUT, 'Wall: ',WALLT
END

and then I do

compute-0-1.local 66>g77 -O3 -ffast-math -fautomatic -fno-f2c
-fno-globals -Wno-globals io.f
compute-0-1.local 67>time a.out
CPU: 16.5900002Wall: 17.
13.820u 4.030s 0:18.81 94.8% 0+0k 0+0io 137pf+0w
compute-0-1.local 68>ifort -O3 -ip -w -tpp6 io.f
compute-0-1.local 69>time a.out
CPU: 279.440000305176 Wall: 326.000000000000
61.430u 218.030s 5:26.16 85.6% 0+0k 0+0io 178pf+0w

There is clearly a quite dramatic difference in timings ! What is the origin of this and can it be fixed ? How can I investigate this further ?

Best regards,
Trond Saue
0 Kudos
5 Replies
tsaue
Beginner
863 Views
I found part of the answer to the problem that I posted. The code
writes formatted files and if one compares file sizes one finds
that the file produced by g77 is 0.4GB whereas the file from ifort
is 2.3GB. The difference is due to the fact that ifort writes a huge number of decimals by default, whereas g77 only writes 0.0. Can this be modified ?

I next tested performance for unformatted output using the program:

PROGRAM IOTST
IMPLICIT REAL*8(A-H,O-Z)
INTEGER*4 TIME
REAL*4 ETIME, TARRAY(2)
PARAMETER(N=10000)
DIMENSION A(N,N)
OPEN(1,STATUS='UNKNOWN',FORM='UNFORMATTED',FILE='FILE')
CPU1 = ETIME(TARRAY)
WALL1 = TIME()
WRITE(1) A
CPU2 = ETIME(TARRAY)
WALL2 = TIME()
CPUT = CPU2-CPU1
WALLT = WALL2-WALL1
WRITE(6,*) 'CPU: ',CPUT, 'Wall: ',WALLT
END

Now I get

compute-0-1.local 41>g77 -O3 -ffast-math -fautomatic -fno-f2c -fno-globals -Wno-globals io2.f
compute-0-1.local 42>time a.out
CPU: 5.21999979Wall: 8.
0.000u 5.240s 0:08.53 61.4% 0+0k 0+0io 137pf+0w
compute-0-1.local 45>ll FILE
-rw-rw-r-- 1 saue saue 800000008 Jul 7 15:27 FILE
compute-0-1.local 46>ifort -O3 -ip -w -tpp6 io2.f
compute-0-1.local 47>time a.out
CPU: 2.91000000000000 Wall: 8.00000000000000
0.000u 2.930s 0:08.20 35.7% 0+0k 0+0io 180pf+0w
compute-0-1.local 48>ll FILE
-rw-rw-r-- 1 saue saue 800000008 Jul 7 15:28 FILE

and the performance of ifort is quite satisfactory !
0 Kudos
Steven_L_Intel1
Employee
863 Views
You are using unformatted I/O, which does not have the concept of "number of decimals".
I am confused by your second post, as the correct size of the file should be 0.8GB . Neither 0.4GB nor 3.2GB is correct, and this should not change due to the compiler.
Be aware that Linux caches file writes and the program may complete before all the writing is actually done.
0 Kudos
Steven_L_Intel1
Employee
863 Views
The difference is that the array is uninitialized. When you ran with g77, you were apparentlty writing zeroes. With ifort, the data was not zero.
If you initialized the data, the file sizes should be comparable.
As you found, using unformatted I/O is a better approach for large volumes of data.
0 Kudos
tsaue
Beginner
863 Views
I tired your suggestion, that is initializing the matrix, using the code:

PROGRAM IOTST
IMPLICIT REAL*8(A-H,O-Z)
PARAMETER (D0=0.0D0)
INTEGER*4 TIME
REAL*4 ETIME, TARRAY(2)
PARAMETER(N=10000)
DIMENSION A(N,N)
DO J = 1,N
DO I = 1,N
A(I,J)=D0
ENDDO
ENDDO
CPU1 = ETIME(TARRAY)
WALL1 = TIME()
WRITE(1,*) A
CPU2 = ETIME(TARRAY)
WALL2 = TIME()
CPUT = CPU2-CPU1
WALLT = WALL2-WALL1
WRITE(6,*) 'CPU: ',CPUT, 'Wall: ',WALLT
END

However, I get the same result as before:

compute-0-11.local 26>g77 -O3 -ffast-math -fautomatic -fno-f2c -fno-globals -Wno-globals io.f
compute-0-11.local 27>time a.out
CPU: 17.2200003Wall: 18.
15.600u 4.200s 0:20.38 97.1% 0+0k 0+0io 136pf+0w
compute-0-11.local 28>ll fort.1
-rw-rw-r-- 1 saue saue 405263158 Jul 8 13:01 fort.1
compute-0-11.local 29>ifort -O3 -ip -w -tpp6 io.f
compute-0-11.local 30>time a.out
CPU: 245.780001306534 Wall: 295.000000000000
63.720u 184.500s 4:57.09 83.5% 0+0k 0+0io 177pf+0w
compute-0-11.local 31>ll fort.1
-rw-rw-r-- 1 saue saue 2433333334 Jul 8 13:07 fort.1

g77 write 0.0, ifort writes 0.000000000000000E+000

All the best,
Trond Saue
0 Kudos
Steven_L_Intel1
Employee
863 Views
Interesting. g77 is violating the Fortran 77 standard, which requires use of an E format for a value of zero. (The Fortran 95 standard's wording in this area is unchanged.)
0 Kudos
Reply