- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm running Intel Fortran 14 on Linux.
In my code I have three allocatable 2D arrays (one integer and the other two logical) that are defined in a module header, then allocated before calling subroutine X. The arrays are then initialised (to zero and false) at the start of X. Profiling tools like gprof and callgrind are indicating that the time to initialise the arrays is significantly longer that the actual runtime of the calculation in X, which seems like nonsense seeing that the arrays are small (50x50). I don't see any performance issues with array initialisation elsewhere in the code.
Could anyone offer any suggestions as to what may be happening?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Should have added that the array initialisation is simple, e.g.
A=0
B=.false.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens if you combine allocation and initialization via the sourced allocation facility in Fortran 2003:
allocate( A(n,m), source=0, stat=istat, ..) .. allocate( B(n,m), source=.false., stat=istat, ..)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you show us the code and compile options for this section.
The fact that you say performance is not a problem for other initialisations in your code indicates something else may also be occurring.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I should also add that in my experience that doing this
allocate( A(n), source=0)
is slower than this
allocate( A(n) ) A = 0
at least it was in the quick test I just ran now to confirm this. compiler is 15.0.3.187 on Linux. If speed is an issue, perhaps parallelise ?
Again in my experience, once your vector dimension exceeds about 15,000 and 4 cpus, an openmp loop can do this faster. No idea why those values either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "problem" with array initialization has to deal with at process start, while the Virtual Machine of the process may have all the addresses to use, these addresses are not mapped until "first touched". This means, the first time allocation of a region of addresses for the processes (at page granularity) are not mapped to physical RAM and/or page file, until you first access the memory (typically by write first). First touch takes a relatively long time: fault to the O/S for accessing memory not mapped, O/S locating available page in RAM, possibly swapping out something else, O/S optionally may wipe the page to circumvent inter-process snooping, page file remapping may be required as well, then return the user application.
Take a look at the early part of the video (right side) on http://www.lotsofcores.com/. The initial combing effect of the display reflects the first touch overhead. The left side the effect is there but visually not apparent (other than time lag).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As the last 2 responses suggest, with a large array , openmp default schedule should show advantage on a multi CPU numa platform, particularly if subsequent use of the array is scheduled consistent with memory locality. Intel compilers may engage opt-streaming-stores auto if it appears appropriate.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page