Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29487 Discussions

performance penalty for -fPIC, -shared, or derived data types?

gregfi04
Beginner
2,446 Views
Hello,

I have two chunks of code here that are reading the same ~800 MB unformatted binary file. The file in question is sequentially-accessed and has variable record lengths. Both codes are compiled at the same optimization level with the 11.1 version of the Intel Fortran compiler. Both codes have something to the effect of:

write(*,*) "reading"
do g1=1,igm
do k1=1,km
read(read_unit) data(file)%flux_3d(g1,:,:,k1)
enddo
enddo
write(*,*) "done reading"

However, one code appears to be completing this operation about 10-20 times faster than the other. (Determined using a series of unscientific "stopwatch" tests.) The one that seems to be moving more slowly is compiled with -fPIC, loaded into a shared object, and reads the data into a derived type. Could any of this be causing the performance discrepancy, or is something else probably at work here?

Thanks,
Greg
0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
2,446 Views

Greg

>>read(read_unit) data(file)%flux_3d(g1,:,:,k1)

allocate a temporary array of the size of the (:,:) in data(file)%flux_3d(g1,:,:,k1)
read into that array
Then copy that temporary array back into the data(file)%flux_3d(g1,:,:,k1)

That is essentially what your original read is doing excepting it may be using smaller buffers (and more partial reads of the data). The main performance problem is each element you read in (or copy from temporary array) will be place into the memory block for data(file)%flux_3d(g1,:,:,k1) using a stride of dimension of the first index in your array (possibly igm).

If you can rework you index scheme to use data(file)%flux_3d(:,:,k1,g1) then the read/write portion will see a performance boost (however your computation section of the code may or may not be affected).

Jim Dempsey



View solution in original post

0 Kudos
5 Replies
TimP
Honored Contributor III
2,446 Views
Quoting - gregfi04

I have two chunks of code here that are reading the same ~800 MB unformatted binary file. The file in question is sequentially-accessed and has variable record lengths. Both codes are compiled at the same optimization level with the 11.1 version of the Intel Fortran compiler. Both codes have something to the effect of:

write(*,*) "reading"
do g1=1,igm
do k1=1,km
read(read_unit) data(file)%flux_3d(g1,:,:,k1)
enddo
enddo
write(*,*) "done reading"

However, one code appears to be completing this operation about 10-20 times faster than the other. (Determined using a series of unscientific "stopwatch" tests.) The one that seems to be moving more slowly is compiled with -fPIC, loaded into a shared object, and reads the data into a derived type. Could any of this be causing the performance discrepancy, or is something else probably at work here?

On the more popular platforms, the penalty for -fPIC and shared objects is unlikely to reach 5%. I would guess that a large stride (data storage interval) could be a problem, particularly if the faster one stores with better data locality.
0 Kudos
gregfi04
Beginner
2,446 Views
Quoting - tim18
On the more popular platforms, the penalty for -fPIC and shared objects is unlikely to reach 5%. I would guess that a large stride (data storage interval) could be a problem, particularly if the faster one stores with better data locality.

Is there any way to mitigate this?
0 Kudos
gregfi04
Beginner
2,446 Views
Quoting - gregfi04

Is there any way to mitigate this?

Bump. Is there anything I can do to improve performance in this situation?
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,447 Views

Greg

>>read(read_unit) data(file)%flux_3d(g1,:,:,k1)

allocate a temporary array of the size of the (:,:) in data(file)%flux_3d(g1,:,:,k1)
read into that array
Then copy that temporary array back into the data(file)%flux_3d(g1,:,:,k1)

That is essentially what your original read is doing excepting it may be using smaller buffers (and more partial reads of the data). The main performance problem is each element you read in (or copy from temporary array) will be place into the memory block for data(file)%flux_3d(g1,:,:,k1) using a stride of dimension of the first index in your array (possibly igm).

If you can rework you index scheme to use data(file)%flux_3d(:,:,k1,g1) then the read/write portion will see a performance boost (however your computation section of the code may or may not be affected).

Jim Dempsey



0 Kudos
gregfi04
Beginner
2,446 Views

Jim,

Awesome, thanks! Reading the data into a temporary array didn't make much of a difference, but restructuring the array made one hell of a big one. I'll need to take a fresh look at some of my more performance-sensitive codes.

Greg
0 Kudos
Reply