- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I came across a problem with the following statement ( simplified for clarity) in a F95 code:
A%vector(:) = A%vector(:) * value
The array component is a vector (1D array) of size 3,200,000 elements of double complex values (~50MB of data). Using the array assignment above to scale the vector values....results in a segmentation fault....whereas when I re-coded the assigment as an explicit do loop i.e.
do i=1,size(A%vector)
A%vector(i) = A%vector(i) * value
end do
the code executed without error and gave the correct results. Incidentally this code is run on a high performance shared-memory system with 256GB of global RAM.
Could anyone explain this behaviour? I prefer the F90 array assignment syntax but I have now lost confidence in the compilers ability to handle it (certainly for large array sizes).
Thanks in advance for any feedback,
Regards,
Tim Stitt
HPC Support Scientist
ICHEC
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I saw this in comp.lang.fortran. In the first case, the compiler did not detect that the left and right sides overlap completely so it created a temporary array on the stack to hold the result. This exceeded your stacksize limit and you got the segfault.
In the next update to the compiler, due next week, this is fixed. There may still be cases where the compiler cannot prove an array temp is not required and it will still generate the temp. In many cases like this, you can use the -heap-arrays switch to tell the compiler to allocate these temps on the heap. It is better, of course, not to have the temp at all.
By the way, the use of (:) in the first example is not a good coding practice and should be avoided.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just a few questions in return though:
1. Is it a general principle that temporaries of allocated arrays are placed on the stack? I would have thought that compilers would be "smart" enough to use heap memory for array temps particularly if they are of substantial size. I assume there is greater overhead in this approach than just a stack-based solution.
2. I found your comment on the use of A(:)=A(:)*value interesting. Is this due to style or performance reasons. I thought I remember reading somewhere that the compiler can recognize potential optimisation better when using this syntax. Is A=A*value a better alternative or the explicit do loop?
Thanks again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1. The default is that temps are created on the stack, as this is the fastest method. There is an option to have them created on the heap, -heap-arrays, added within the past few months. You can specify a size threshold, but this is used only when the compiler knows the fixed size to be allocated - if the size is not known till runtime, it always uses one or the other depending on the switch. We don't generate a run-time test for this.
2. The use of (:) turns the reference into an array slice, which complicates optimization. The compiler has to go to extra lengths to recognize that the (:) can be discarded. For a long time, our compiler did not do that, though it does now. Another use I see sometimes is array(1:N) where N is the upper bound. This is even harder for a compiler to recognize as being discardable. The simpler you can make things for the compiler, the easier time it has. A whole array reference is always preferable to an array slice reference.
In the early days of F90 compilers, a DO loop was almost always faster. An array assignment has different semantics to a DO loop in that the right hand side is completely evaluated before any of the left hand side is modified. Earlier compilers did not know how to analyze the cases to avoid creating a temp and two loops. Nowadays, most compilers are pretty good at this, at least for the straightforward cases. When you start adding pointer and allocatable components, things get tougher again, but compilers keep getting smarter.
My usual advice is to write the code with what looks the most natural for the language, which would be an array assignment, and let the compiler figure it out. If you find that performance is bad, let the compiler vendor know.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page