Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Are arrays whose dimension is parameter considered to be automatic arrays?

efnacy
New Contributor I
500 Views

I have a moderate size code and I will run into seg fault if I don't set ulimit -s unlimited. From this observation I guess it's stack overflow issue. I make sure that no array temps were created by using -check arg_temp_created in all compilation lines. Now what I cannot 100% tell is whether there are any automatic arrays. AFAIK, automatic arrays are array in a subroutine or function whose dimensions cannot be determined during compile time, e.g. if the dimension is itself one of the dummy arguments of that subroutine. I have not yet gone through my source code entirely to see if there are such arrays but I am almost sure there aren't. However most of my local arrays and dummy array arguments have dimensions which are constant expression, i.e. the dimensions are declared in another module with parameter attribute. Are these arrays also classified as automatic? If not what could possibly cause stack overflow? Is it just because my data are too large, just FYI some arrays are 3D complex*16 arrays with dimension of about 700x700x40. Also my codes are compiled with openmp directives, if that is worth considering. I need to not be reliant on unlimiting the stack because this program of mine will need to be made public and I don't want the user to need to set ulimit -s unlimited them self when executing my program.

0 Kudos
15 Replies
jimdempseyatthecove
Honored Contributor III
500 Views

Automatic arrays are arrays can be allocated from stack or from heap depending on compiler options and/or attributes of function/subroutine.

If you are using OpenMP, the reserved area (address space) for main thread and additional thread(s) is not necessarily the same. Use an OMP_... environment variable and/or omp_set... API to specify additional thread(s) stack requirement before your first parallel region.

You could consider making the large arrays SAVE (if room in 2GB static data area) or allocatable.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
500 Views

As you use OpenMP, you would be responsible yourself to avoid creating a race condition by adding SAVE on arrays which require an independent copy for each thread.  

Regardless of whether the size of an array is known at compile time due to (constant) PARAMETER extents, use of stack or heap is determined elsewhere.

If the PARAMETER in another procedure determines the size, it is still technically an automatic array when the size is passed in the procedure arguments.  By inter-procedural analysis, the compiler might make it effectively the same as a local fixed size array, but you shouldn't be concerned about that.

0 Kudos
jimdempseyatthecove
Honored Contributor III
500 Views

Tim P>>If the PARAMETER in another procedure determines the size, it is still technically an automatic array when the size is passed in the procedure arguments.

Correct, the point I am making is the placement (heap or stack) in this case is dependent upon compiler options (or attributes).

Jim Dempsey

0 Kudos
efnacy
New Contributor I
500 Views

Yes I know that I can control the placement of automatic arrays whether on stack or heap with compiler option, especially for ifort by default automatic arrays are placed on stack but can be altered to be placed on heap with -heap-arrays option. In fact I have tried this and I still got seg fault.

Tim P. wrote:
If the PARAMETER in another procedure determines the size, it is still technically an automatic array when the size is passed in the procedure arguments.  By inter-procedural analysis, the compiler might make it effectively the same as a local fixed size array, but you shouldn't be concerned about that.

No, the sizes are not among the arguments. They are simply used to declare the size of array arguments in the procedure, so in such procedure the declaration section goes like

subroutine mysub(arr1, arr2)
use data_mod
implicit none
complex*16, intent(in) :: arr1(nn), arr2(nn)

...

end subroutine

where nn is declared with parameter attribute in another module file data_mod.f90.

jimdempseyatthecove wrote:
You could consider making the large arrays SAVE (if room in 2GB static data area) or allocatable.

Tim P. wrote:
As you use OpenMP, you would be responsible yourself to avoid creating a race condition by adding SAVE on arrays which require an independent copy for each thread.

AFAIK local variables in a procedure declared with save attribute means that their values will be retained upon successive calls on that procedure. Can you explain how this property have anything to do with a parallel section within that procedure in which the saved arrays are arguments to the PRIVATE clause (hence copies of these arrays are created)? I do have some 1D arrays set as PRIVATE in one parallel section though.

jimdempseyatthecove wrote:
Use an OMP_... environment variable and/or omp_set... API to specify additional thread(s) stack requirement before your first parallel region.

Do you mean OMP_STACKSIZE environment variable? If yes I have also tried setting OMP_STACKSIZE=700M (is it not big enough?) before compiling anything, without ulimit -s unlimited, and the seg fault still prevails.

0 Kudos
TimP
Honored Contributor III
500 Views

700m would be too large for omp_stacksize.  The largest I have seen used successfully is 40m.  There are reasons for defaults 4m (2m for 32bit mode).   Even reasonable increase in omp_stacksize may require corresponding ulimit adjustment. Increase to 9m, for example, consumes 5m times num_threads .

as you imply, a typical reason for requiring omp_stacksize adjustment would be private arrays. If those would amount to hundreds of megabytes, you probably need a better way.

0 Kudos
mecej4
Honored Contributor III
500 Views

Let's clear the air regarding "automatic". The following is a quote from the F2003 standard:

If the data object being declared depends on the value of a specification-expr that is not an initialization expression, and it is not a dummy argument, such an object is called an automatic data object.
There is a rather long section that describes what a specification-expr is supposed to be, and another section that describes initialization expressions. In short, an initialization expression is some sort of constant that is known to the compiler to be a constant and whose value is available at compile time.
 
By this definition, an array variable such as those that you described, that is not a dummy argument and whose size is a named or literal constant, either given locally or by association (use- and host-), is not an automatic data object.
 

P.S. Edited wording based on IanH's comments below. Sorry about the mix-up!

0 Kudos
efnacy
New Contributor I
500 Views

Ok thanks for the replies everyone. I have just discovered that even without openmp parallelization my program crashes due to seg fault somewhere around the middle of the code (again without setting stack limit to unlimited), with openmp it crashes earlier. Then I modified some parts such that some arrays are changed to allocatable, the program runs (openmp not yet activated). I will try later declaring all big arrays as allocatable and see whether enabling parallelization will pose problems.

Anyway, my question which arises following one of the comments, can someone explain how arrays with SAVE attribute may be affected by parallelism (see post#5)?

0 Kudos
mecej4
Honored Contributor III
500 Views

I'll make an attempt, though I am no parallelization expert. If your subprogram has local variables with the SAVE attribute, module variables or common block variables, or reads data from files into variables, it is not PURE. The values of the variables depend on factors other than the values of the subprogram arguments. The SAVE feature has been with us since the 1960s, and it was the default on old mainframes and standard on machines without stacks.

Suppose we have a subroutine S that has a saved variable V, and that your program runs two threads T1 and T2 (or uses two CPUs C1 and C2, or nodes N1 and N2). Thread T1 calls S, and the value of V is changed to V11 in S. The next time T1 reenters S, it expects V to have that value, V11. Unfortunately for it, thread T2 also entered S after T1 returned from its first call to S and before T1's second call to S, and sneaky T2 changed V to V21 and, as a result, T1 finds the value of V during its second run into S to be V21, which is not what was intended.

Once a variable has been changed in an unforeseen way, all kind of bad consequences can follow. If V is used as an index into a local array, and V21 is outside the bounds of the array, an access violation may occur. If V is a file unit number, the wrong file may be read or written, or the file may not be connected. If V is a real variable, and its value is < 0 instead of positive, as expected, and is raised to a real exponent, a floating point exception occurs.

An often used analogy (in the context of databases) is the following: you charge an online purchase to a credit card and, a few minutes later, find the same article available for less from another source. You cancel the first order, and place a new order with the second source. The charge is refused as "over limit".

 

0 Kudos
IanH
Honored Contributor II
500 Views

mecej4 wrote:

Let's clear the air regarding "automatic". The following is a quote from the F2003 standard:

If the data object being declared depends on the value of a specification-expr that is not an initialization expression, and it is not a dummy argument, such an object is called an automatic data object.

There is a rather long section that describes what a specification-expr is supposed to be. In short, it is some sort of constant that is known to the compiler to be a constant and whose value is available at compile time.

By this definition, an array variable such as those that you described, that is not a dummy argument and whose size is a named or literal constant, either given locally or by association (use- and host-), is an automatic data object.

Something is mixed up in the last two paragraphs.  Specification expressions don't have to be constant, and automatic objects are non-dummy objects who have such non-constant expressions in their declaration. 

The array declarations upthread are not automatic objects.

(F2003 initialization expression == F2008 constant expression)

0 Kudos
jimdempseyatthecove
Honored Contributor III
500 Views

SAVE has two properties:

a) As the name implies, stored values are saved (kept) upon RETURN for potential reuse upon subsequent call.
b) As the name does not imply, forces the compiler to place the data (and/or array descriptor) in the private static data area of the procedure.

For serial programming sections of code, and/or shared arrays in parallel sections, SAVE'd data (or array descriptors) can conserve stack space (at expense of loss in other capacity in the static data segment of the program), as well as permit once-only allocations from heap to arrays (which can reduce heap memory fragmentation). As to if SAVE is desirable or not is a design issue.

The OP pined that he was having stack overflow issues. I suspect the the segment fault is something entirely different.

The OP also did not state the Fortran Standard being imposed upon the compiler/code. Default placement of local arrays differs amongst standards.

The OP should have, by now, run his program with full runtime diagnostics, then later with stack check enabled. This will confirm if the seg fault is due to stack overflow or not.

Jim Dempsey

0 Kudos
efnacy
New Contributor I
500 Views

jimdempseyatthecove wrote:

The OP also did not state the Fortran Standard being imposed upon the compiler/code. Default placement of local arrays differs amongst standards.

The OP should have, by now, run his program with full runtime diagnostics, then later with stack check enabled. This will confirm if the seg fault is due to stack overflow or not.

Actually I am not sure which standard I am imposing ifort. I compiled it with

ifort -openmp -check arg_temp_created -i8 myprog.f90 -L/opt/intel/composerxe-2011/mkl/10.2.3.029/lib/em64t -lmkl_solver_ilp64_sequential -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lpthread -lm

The train of options following myprog.f90 is produced by link advisor.

This is going to be my first attempt compiling with runtime diagnostics, will the diagnostic be the closest to being complete if I follow this

https://software.intel.com/en-us/articles/tips-for-debugging-run-time-failures-in-intel-fortran-applications

if there are more I should try could you please tell me which compiler options I should use for this purpose? As for the stack checking, do you mean -fp-stack-check?

0 Kudos
Juergen_R_R
Valued Contributor I
500 Views

This looks like an awfully old ifort version.

0 Kudos
jimdempseyatthecove
Honored Contributor III
500 Views

For suspected stack overflow situation/confirmation add:

-check stack

If stack overflow is not the cause of the Seg Fault, then replace the -check... with

-check all

This will enable all runtime error checking. This will not find all errors in your program, it will find most.

*** Check your compiler options with ifort -help your option names may differ ***

Also, it is advised to perform at least once, and again later after major revisions:

-warn all

*** earlier versions of the compiler may require:

-gen-interfaces

Newer versions imply -gen-interfaces with -warn interfaces (-warn all).

2011 is quite old (but likely quite usable).

Jim Dempsey

 

 

0 Kudos
TimP
Honored Contributor III
500 Views

The problem with SAVE when not using   recent features like BLOCK occurs in procedures  which may modify data when called in parallel regions. For COMMON in such situations , openmp requires use of the threadprivate which can be somewhat more difficult. If the SAVE is used only outside the parallel and the array can be designated firstprivate, it should be ok.  Bugs due to race conditions caused by mistakes in these areas can be difficult to track down.  Intel parallel Inspector is designed to help, but is prone to omissions and false positives.

0 Kudos
mecej4
Honored Contributor III
500 Views

Efnacy, -fp-stack-check applies to the use of the 80x87 coprocessor and its 8-register stack, and has no use for 64-bit program compilation.

0 Kudos
Reply