- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in the main body of the code:
...
CALL MySubroutine(NZONES)
...
and then a subroutine:
MySubroutine(NZONES)
IMPLICIT NONE
INTEGER NZONES,AnArray(NZONES)
...
END SUBROUTINE MySubroutine
This seems like a really dangerous construct and I was surprised to note that it actually works at all! I was under the impression that memory was allocated either at compile time in the case of a static array or when an ALLOCATE() statement is ececuted during run time. This seems like a hybrid that could lead to two subroutines accessing the same memory addresses without knowing it. Can anyone offer some clarification as to what is going on?
Thanks in advance,
David
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
in the main body of the code:
...
CALL MySubroutine(NZONES)
...
and then a subroutine:
MySubroutine(NZONES)
IMPLICIT NONE
INTEGER NZONES,AnArray(NZONES)
...
END SUBROUTINE MySubroutine
This seems like a really dangerous construct
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also for Intel Fortran, I believe, the automatic arrays will go on stack unless /heap-arrays is used. For large sizes, this may give stack overflow.
Abhi
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also for Intel Fortran, I believe, the automatic arrays will go on stack unless /heap-arrays is used. For large sizes, this may give stack overflow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually I thought that this was the standard way of using arrays as arguments in subroutines, i.e.
subroutine Asub(Array1,n,m)
integer ::n, m
real :: Array1(n,m)
..Perform calculations..
end subroutine
I always use this with my subroutines. It is mentioned above that a possible drawback (well, is it a drawback at all?)is that the Array1 goes on the stack. Are there any other drawbacks? Or maybe some advantages over other methods, e.g. with regards to speed, etc.? What is the preferred syntax for calling a subroutine with e.g. a two-dimensional array with known shape and size as an argument? (i.e.Array1 in the above example)
j_clausen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Actually I thought that this was the standard way of using arrays as arguments in subroutines, i.e.
subroutine Asub(Array1,n,m)
integer ::n, m
real :: Array1(n,m)
..Perform calculations..
end subroutine
I always use this with my subroutines. It is mentioned above that a possible drawback (well, is it a drawback at all?)is that the Array1 goes on the stack. Are there any other drawbacks? Or maybe some advantages over other methods, e.g. with regards to speed, etc.? What is the preferred syntax for calling a subroutine with e.g. a two-dimensional array with known shape and size as an argument? (i.e.Array1 in the above example)
j_clausen
You appear to be confusing two things.
1. automatic local arrays
subroutine mysub(n)
...
integer :: n
integer :: array(n)
will automatically make space for a localarray of size (n) on the stack or heap, depending on compiler settings.
subroutine mysub2(array,n)
...
integer :: n
integer :: array(n)
means the subroutine will accept any (suitable) array as the actual argument along with its size 'n'
integer :: array1(20)
integer :: array2(100)
...
call mysub2(array1, 20)
call mysub2(array2, 100)
Basically the address of the array is what goes on the stack.
There are other things, (generally to do with array slices or contiguous memory)which may complicate things as to whether a copy of the array is made - known as copy-in/copy-out - and whether the copy goes on the stack.
If, as you say, you pass the array and its size(s) as subroutine arguments then thereshould beno problem.
Les
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all very much for you insights, and thank you Les for clearifying.Appearently I can relax, as it seems that I am not writing bad code. At least not when it comes topassing arrays as arguments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
J,
There is a potential gotcha lurking around should you eventually convert the code to multi-threaded. Per Les's notes
1. automatic local arrays
subroutine mysub(n)
...
integer :: n
integer :: array(n)
will create the array either on heap or on stack depending on option switch
*** however here is the gotcha ***
Depending on options, that array may require an array descriptor. and depending on options, that array descriptor could be placed in static memory OR place on stack. Under most circumstances when compiling for OpenMP the array descriptor will go on the stack (good), but not always (bug in mt programming). Should the array descriptor be placed in static memory, then regardless of data being allocated on stack or on heap, the last threads array descriptor initialization will be used by all threads.
To correct for this, include "automatic" on the array declaration
integer, automatic :: array(n)
also use automatic on temp arrays
integer, automatic :: colors(3)
When you are certain that the code is, and forever will be, single threaded then remove the "automatic" since the static arrays have slightly faster access.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
J,
There is a potential gotcha lurking around should you eventually convert the code to multi-threaded. Per Les's notes
1. automatic local arrays
subroutine mysub(n)
...
integer :: n
integer :: array(n)
will create the array either on heap or on stack depending on option switch
*** however here is the gotcha ***
Depending on options, that array may require an array descriptor. and depending on options, that array descriptor could be placed in static memory OR place on stack. Under most circumstances when compiling for OpenMP the array descriptor will go on the stack (good), but not always (bug in mt programming). Should the array descriptor be placed in static memory, then regardless of data being allocated on stack or on heap, the last threads array descriptor initialization will be used by all threads.
To correct for this, include "automatic" on the array declaration
integer, automatic :: array(n)
also use automatic on temp arrays
integer, automatic :: colors(3)
When you are certain that the code is, and forever will be, single threaded then remove the "automatic" since the static arrays have slightly faster access.
Jim Dempsey
Best regards, J.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
integer, automatic :: array(n)
also use automatic on temp arrays
integer, automatic :: colors(3)
When you are certain that the code is, and forever will be, single threaded then remove the "automatic" since the static arrays have slightly faster access.
Jim, I can't agree with recommendation of a non-standard feature when there's a perfectly good standard alternative. Even Steve does not recommend all the bells and whistles supported by Ifort (even if it were for a practical reason-- their life would be much simpler had they not have to support quite a number of archaic features and extensions, to maintain compatibility with customers' code base). At least, let's not propagate them where they're not needed.
I refer to the AUTOMATIC attribute -- RECURSIVE procedures are guaranteed to provide the same semantics in a portable manner.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
J,
Virtually every recent desktop and notebook have at least 2 cores, many have 4 cores. Multi-threading using OpenMP is relatively easy to do. Once you get the conversion from MatLab to Fortran working to your satisfaction I suggest you use a profiler (VTune or PTU from Intel or CodeAnalyst from AMD) to look for the program hot spots. Then examine the hot spot to see if the code (or generally, one, two or more call levels up) is suitable to be run in parallel. FE code often is quite easy to parallelize. From my experience in parallelizing a Fortran FE application it is not unusual to hit 85% and better efficiency. i.e. a 4 core desktop might be expected to run 3.4 x faster running the FE in parallel than it will serial. On SMP sytsems (shared memory systems) the OpenMP work you do for your desktop will port as-is.
For large clusters (or small clusters), the systems are generally configured as a bunch of SMP systems wired together with a high speed interconnect. To use the other processors outside the SMP system in which your application launches you must use a different programming paradigm. The usual programming paradigm is MPI (Message Passing Interface). MPI has significantly higher overhead. Applications that can run different parts independently (with intermittant reconcilliation) can take advantage of MPI across high speed interconnect (you can run MPI on a single SMP system but OpenMP will be better).
As you complete phases of your conversion, don't be afraid to sit down and review your code with "could I have done better". The answer is usually yes. Get the conversion working first. Do not waste time optimizing or threading non-working or incomplete code. Until you have a working conversion you will not have "the whole picture" of the application in action.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
"Automatic arrays" is a Fortran 90 concept (though popularly implemented as an extension to F77.) It means that the array becomes allocated at run-time on entry to the routine, and goes away at exit. There is nothing in the standard that says this means stack-allocated, though in practice it usually is.
I definitely agree that if there is a standard way to say something, then that is preferable over an extension. Automatic arrays (F90 sense) are a fine thing to use, except that you can run the risk of a stack overflow. The /heap-arrays option will cause these to be allocated on the heap, though if you think this is going to be an issue for you my preference is to make the local variable ALLOCATABLE and to allocate it to the proper size.
We just discovered an interesting "hole" in the implementation of /heap-arrays. The documentation says that this is applied to "automatic arrays and temporary arrays". There is also an optional [size] value, specified in KB, which is used in determining whether or not to allocate the array on the heap (default is everything.)
There are two catches here. The first is that [size] is used only if the compiler knows, at compile-time, what the size is going to be. There is no run-time test. So under what circumstances would this happen? Consider this:
[plain]recursive subroutine sub real x(1000) call sub2 (x(1:100:3) end[/plain]Array x is, by default, allocated on the stack because it's a recursive routine. But is it an "automatic array"? No! It does not have run-time bounds! So, the compiler knows how big it is. Will /heap-arrays cause this to be allocated on the heap? No! It's not an "automatic array"!
What about the temp created for the non-contiguous array slice in the call to sub2? (Assume no explicit interface.) Yes, /heap-arrays will take effect here and, more interestingly, this is the only situation where [size] is used, because the compiler knows how big the temp needs to be. In the case of all automatic arrays, [size] is ignored because the compiler does not know how large the array will be.
In my opinion, it was the original intent that /heap-arrays apply to all arrays that would otherwise be stack-allocated, as the idea was to eliminate stack overflows. I have asked that this be changed and that the documentation be clarified as respect to [size].
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I can't agree with recommendation of a non-standard feature when there's a perfectly good standard alternative. Even Steve does not recommend all the bells and whistles supported by Ifort (even if it were for a practical reason-- their life would be much simpler had they not have to support quite a number of archaic features and extensions, to maintain compatibility with customers' code base). At least, let's not propagate them where they're not needed.
I refer to the AUTOMATIC attribute -- RECURSIVE procedures are guaranteed to provide the same semantics in a portable manner.
Jugoslav,
You are correct (Steve L.mentioned this in a different thread). I made a few test compilations permutating a subroutine using local array with: FPU no qualification, FPU automatic, FPU recursive, SSE no qualification, SSE automatic, SSE recursive then all the preceeding with/without optimization (/O3).
The no qualifications produced code using static temporary arrays (not thread safe)and the automatic generated (for what I could see) identical code with the recursive. That is the subroutine was identical but the startup called an initialization routine to set a recursive flag. I do not know what effect this has on the runtime system when compared with non-recursive with OpenMP enabled.
I will reform my ways and use RECURSIVE from here on.
Other than for the "call _for_set_reentrancy" I could see no negative impact in using RECURSIVE.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
J,
Virtually every recent desktop and notebook have at least 2 cores, many have 4 cores. Multi-threading using OpenMP is relatively easy to do. Once you get the conversion from MatLab to Fortran working to your satisfaction I suggest you use a profiler (VTune or PTU from Intel or CodeAnalyst from AMD) to look for the program hot spots. Then examine the hot spot to see if the code (or generally, one, two or more call levels up) is suitable to be run in parallel. FE code often is quite easy to parallelize. From my experience in parallelizing a Fortran FE application it is not unusual to hit 85% and better efficiency. i.e. a 4 core desktop might be expected to run 3.4 x faster running the FE in parallel than it will serial. On SMP sytsems (shared memory systems) the OpenMP work you do for your desktop will port as-is.
For large clusters (or small clusters), the systems are generally configured as a bunch of SMP systems wired together with a high speed interconnect. To use the other processors outside the SMP system in which your application launches you must use a different programming paradigm. The usual programming paradigm is MPI (Message Passing Interface). MPI has significantly higher overhead. Applications that can run different parts independently (with intermittant reconcilliation) can take advantage of MPI across high speed interconnect (you can run MPI on a single SMP system but OpenMP will be better).
As you complete phases of your conversion, don't be afraid to sit down and review your code with "could I have done better". The answer is usually yes. Get the conversion working first. Do not waste time optimizing or threading non-working or incomplete code. Until you have a working conversion you will not have "the whole picture" of the application in action.
Jim Dempsey
Thank you very much for the advice, Jim. From my MatLab code I know that a particular loop over all the Gauss points (plastic stress update) takes up roughly 80% of the running time. According to your post it seems that this loop would be especially well-suited for parallellization as each entry in the loop is independent of the others. (sorry for straying from the subject of this thread). I am sure that I will return to this forum later with more questions as the work on my FE-code progresses.
J.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
J
One consideration you will/may need to take in programmingis when you divide you element domain up n-ways for n-threads you may/will have boundary conditions that require atomic operations. The elements within the bounds can safely be manipulated without atomic operations. One technique is to do the interiors and then divide up the perimeters (as opposed to using atomic operations on the perimeters).
With a little Google searching you will be able to find papers on this subject.
Jim
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page