- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have two chunks of code here that are reading the same ~800 MB unformatted binary file. The file in question is sequentially-accessed and has variable record lengths. Both codes are compiled at the same optimization level with the 11.1 version of the Intel Fortran compiler. Both codes have something to the effect of:
write(*,*) "reading"
do g1=1,igm
do k1=1,km
read(read_unit) data(file)%flux_3d(g1,:,:,k1)
enddo
enddo
write(*,*) "done reading"
However, one code appears to be completing this operation about 10-20 times faster than the other. (Determined using a series of unscientific "stopwatch" tests.) The one that seems to be moving more slowly is compiled with -fPIC, loaded into a shared object, and reads the data into a derived type. Could any of this be causing the performance discrepancy, or is something else probably at work here?
Thanks,
Greg
I have two chunks of code here that are reading the same ~800 MB unformatted binary file. The file in question is sequentially-accessed and has variable record lengths. Both codes are compiled at the same optimization level with the 11.1 version of the Intel Fortran compiler. Both codes have something to the effect of:
write(*,*) "reading"
do g1=1,igm
do k1=1,km
read(read_unit) data(file)%flux_3d(g1,:,:,k1)
enddo
enddo
write(*,*) "done reading"
However, one code appears to be completing this operation about 10-20 times faster than the other. (Determined using a series of unscientific "stopwatch" tests.) The one that seems to be moving more slowly is compiled with -fPIC, loaded into a shared object, and reads the data into a derived type. Could any of this be causing the performance discrepancy, or is something else probably at work here?
Thanks,
Greg
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greg
>>read(read_unit) data(file)%flux_3d(g1,:,:,k1)
allocate a temporary array of the size of the (:,:) in data(file)%flux_3d(g1,:,:,k1)
read into that array
Then copy that temporary array back into the data(file)%flux_3d(g1,:,:,k1)
That is essentially what your original read is doing excepting it may be using smaller buffers (and more partial reads of the data). The main performance problem is each element you read in (or copy from temporary array) will be place into the memory block for data(file)%flux_3d(g1,:,:,k1) using a stride of dimension of the first index in your array (possibly igm).
If you can rework you index scheme to use data(file)%flux_3d(:,:,k1,g1) then the read/write portion will see a performance boost (however your computation section of the code may or may not be affected).
Jim Dempsey
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - gregfi04
I have two chunks of code here that are reading the same ~800 MB unformatted binary file. The file in question is sequentially-accessed and has variable record lengths. Both codes are compiled at the same optimization level with the 11.1 version of the Intel Fortran compiler. Both codes have something to the effect of:
write(*,*) "reading"
do g1=1,igm
do k1=1,km
read(read_unit) data(file)%flux_3d(g1,:,:,k1)
enddo
enddo
write(*,*) "done reading"
However, one code appears to be completing this operation about 10-20 times faster than the other. (Determined using a series of unscientific "stopwatch" tests.) The one that seems to be moving more slowly is compiled with -fPIC, loaded into a shared object, and reads the data into a derived type. Could any of this be causing the performance discrepancy, or is something else probably at work here?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
On the more popular platforms, the penalty for -fPIC and shared objects is unlikely to reach 5%. I would guess that a large stride (data storage interval) could be a problem, particularly if the faster one stores with better data locality.
Is there any way to mitigate this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - gregfi04
Is there any way to mitigate this?
Bump. Is there anything I can do to improve performance in this situation?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Greg
>>read(read_unit) data(file)%flux_3d(g1,:,:,k1)
allocate a temporary array of the size of the (:,:) in data(file)%flux_3d(g1,:,:,k1)
read into that array
Then copy that temporary array back into the data(file)%flux_3d(g1,:,:,k1)
That is essentially what your original read is doing excepting it may be using smaller buffers (and more partial reads of the data). The main performance problem is each element you read in (or copy from temporary array) will be place into the memory block for data(file)%flux_3d(g1,:,:,k1) using a stride of dimension of the first index in your array (possibly igm).
If you can rework you index scheme to use data(file)%flux_3d(:,:,k1,g1) then the read/write portion will see a performance boost (however your computation section of the code may or may not be affected).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim,
Awesome, thanks! Reading the data into a temporary array didn't make much of a difference, but restructuring the array made one hell of a big one. I'll need to take a fresh look at some of my more performance-sensitive codes.
Greg
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page