Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
67 Views

Binary read/write

I am attempting to optimise a binary file read/write using an unformatted, direct access binary file.  The data set to be written/retreived is of size 22x7826x8086.  Current read times are of the order 50 minutes.  A comparative time using MATLAB fread function is 157 seconds.  Why is the MATLAB binary read/write function more efficient than Fortran?  Any suggestions very much appreciated.  Regards  Craig

0 Kudos
12 Replies
Highlighted
Black Belt
67 Views

Did you change the default

Did you change the default single record to buffered_io, either with the -assume:buffered_io compile option or equivalent OPEN or environment variable keyword?  Fortran standard doesn't specify this, and compilers other than ifort have different defaults.

The matlab file format is more likely to resemble Fortran access='stream' than direct access.
 

0 Kudos
Highlighted
New Contributor II
67 Views

You did not indicate the byte

You did not indicate the byte size of the array, but you appear to be writing an array of 5 to 10 gb.  Depending on the disk type SSD, HDD or networked, 157 seconds could be reasonable, but 50 minutes is not.

I would expect you have broken down the write statement into too many (small) or too few (huge) records. You should list the open and write statement you are using.

Also, if you are using access='direct', check that for recl=length, length is given in the appropriate length unit: It could be expected to be as bytes or 4-byte words, depending on compiler options. (check the file size after it has been written).

Tim has mentioned buffering options, but I would expect that default buffering settings should not cause a problem like this, assuming you have not changed these.

Finally, if real(8) array, this is a 10 gb array and a 10gb file. I first tested this on a pc with only 8gb of memory, which stoped for a while.

The following example should perform ok. It ran in 48 seconds on a SSD and sufficient memory.

  real*8, allocatable :: array(:,:,:)
!
  integer*4 wl, length, i,j,k, iostat
  integer*4 ni,nj,nk
  real*4    dtime, t
  external  dtime
!
  ni = 22
  nj = 7826
  nk = 8086
!
! *** Compiler dependent ***
  wl     = 2           ! number of 4-byte units in a single real(8) array
!
  length = ni*nj*wl    ! the length of each record
!
  open (unit=11, file='binary_data.bin', access='direct', recl=length, iostat=iostat)

  write (*,*) 'Allocating array', dtime()
  allocate ( array(ni,nj,nk) )
!
  write (*,*) 'Initialising array', dtime()
  array = 1
!  
  write (*,*) 'writing array', dtime()
  do K = 1, nk
    write (11, rec=k) ((array(i,j,k),i=1,ni), j=1,nj)
  end do
!  
  write (*,*) 'reading array', dtime()
  do K = 1, nk
    read (11, rec=k) ((array(i,j,k),i=1,ni), j=1,nj)
  end do
!
  write (*,*) 'end reading array', dtime()
  end

real*4 function dtime ()
  integer*8 :: clock, tick
  integer*8 :: last = -1
!
  call system_clock ( clock, tick )
!
  if ( last > 0 ) then
    dtime = dble (clock-last) / dble (tick)
  else
    dtime = 0
  end if
  last = clock    
end function dtime

 

0 Kudos
Highlighted
Beginner
67 Views

John your code worked very

John your code worked very well - many thanks.  My only comment is why is wl=2 and not 8 if the I explicitly compile with /assume:byterecl? Once again thank you for any feedback.  Craig

0 Kudos
Highlighted
New Contributor II
67 Views

I investigated further and

I investigated further and tried 3 record sizes for:
 write (11, rec=rec) array(i,j,k)
 write (11, rec=rec) (array(i,j,k),i=1,ni)
 write (11, rec=rec) ((array(i,j,k),i=1,ni),j=1,nj)

The larger the record, ie fewer records in the file, the faster is the expected write, but even the first had nothing like the problem you report.

I also found that if the binary file already exists, then it can be slower to overwrite, than writing a new file. Probably because the existing file has to be read in before being overwriting. However no tests approached 50 minutes.

I also did the test on 2 different pc's, both with Win 7, but one with 32 gb of memory. The extra memory provided more disk cache and ran faster. For the 8gb pc, I reduced the test to half the file, so that I did not go to virtual memory with the array. (If this is the case it is a very noticeable problem, but again nothing like 50 minutes.)

All tests involved sequential writing or reading of records.

All I can suspect is you have a very slow disk or you are not processing the disk records sequentially. I have attached the ifort versions of the 3 tests I have carried out. All ran in 2-4 minutes. I would not use the approach in write_test_1.f90 as I consider the record length to be too small, (too many records) but the others should be ok. Perhaps you have something else different to what I have assumed.

John

0 Kudos
Highlighted
New Contributor II
67 Views

Quote:Craig B. wrote:

Craig B. wrote:

John your code worked very well - many thanks.  My only comment is why is wl=2 and not 8 if the I explicitly compile with /assume:byterecl? Once again thank you for any feedback.  Craig

Craig, 

You are clearly modifying the compiler options. You should review what other options are selected as perhaps that is the cause of the problem.

/assume:byterecl is not the default for ifort, but is what most other compilers assume. wl=8 is correct for your selection.

0 Kudos
Highlighted
Black Belt
67 Views

Note that you can use an

Note that you can use an INQUIRE statement to get the record length required for a particular input/output item list.  This avoids any compiler dependent (or compiler option dependent) behaviour to do with the size of a file storage unit.

INTEGER :: length
REAL(xx), ALLOCATABLE :: array(:,:,:)
...
ALLOCATE(array(a,b,c))
INQUIRE (IOLENGTH=length) array(:,:,1)    ! 2D slice, for example
OPEN(unit, ... RECL=length)
...
WRITE(unit, REC=k) array(:,:,k) 

 

 

0 Kudos
Highlighted
Valued Contributor II
67 Views

INQUIRE (IOLENGTH=length)

INQUIRE (IOLENGTH=length)

How useful! Something new to learn every day :-)

0 Kudos
Highlighted
67 Views

IanH,

IanH,

#7 How is the record length determined without a format for how the data is written.
If (only applies to unformatted i/o) then why not use SIZE(array(:,:,1))

Jim Dempsey

0 Kudos
Highlighted
Black Belt
67 Views

The IOLENGTH value only

The IOLENGTH value only applies to unformatted transfers of the given input/output list.

SIZE gives you the number of elements in that array section argument.  You need to multiply that number of elements with the file storage units required per element to get the total number of file storage units required for the array section.  The file storage units per element will depend on the type and kind of the array, and the compiler and compile options in use, hence the existence of this particular form of INQUIRE.

0 Kudos
Highlighted
67 Views

Misspoke, SIZEOF...

Misspoke, SIZEOF...

It might be handy to have

! ****** hypothetical *****
INQUIRE(IOLENGTH=len, FORMAT=1234) array(:,:,1)
or
INQUIRE(IOLENGTH=len, FORMAT='(I5,*(",",F5.2))')  iRow, array(:,:,iRow)

Jim Dempsey

0 Kudos
Highlighted
Black Belt
67 Views

A file storage unit may not

A file storage unit may not be the same thing as a byte (and it isn't, by default, with ifort).

Variable length format specifiers, like "G0", aside, you can typically figure out the number of characters required for a particular combination of format and item list manually, and while that calculation might be fiddly, it isn't processor dependent. 

If you are using G0 and friends, then the number of characters depends on the specific value of the corresponding item anyway.

0 Kudos
Highlighted
67 Views

John Campbell thank you a lot

John Campbell thank you a lot  for the code. Can somebody please help me plot the 3D array (say) "array(i,j,k) for all values of i, j at a particular k from the binary file where the 3D array "array(i,j,k) is stored in the manner presented by John Campbell. I tried using gnuplot using the following commands:

  • "splot 'filename.bin' binary array=(128,128) w pm3d " this plots array(i,j, k) for all i,j at a particular unknown k(probably k  =1). Please kindly if possible let me know how one could specify "k" so that it plot array(i,j,k) at that particular k. The aforementioned command in gnuplot generates the following plot: Image_1
  • "splot 'filename.bin' binary array=(128,128) u 1:2 w pm3d" this command generates the following plot:   Image_2  
  •  
0 Kudos