Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28474 Discussions

Difference in allocating array in subroutine make a code works or breaks

Wee_Beng_T_
Beginner
308 Views

Hi,

I am running a mpi CFD code in my sch's cluster using 100 cpu.

Due to my program's structure, this is the max cpu I can use. At the same time, due to the number of grids using in my code, I am reaching the available memory limit.

The code ran and hang at a spot. After debugging, I found that it hangs in one of the subroutines. In this subroutines, I have to update the values of different variables across all cpu using mpi.

There are some local array which I need to create. If I declare them using:

subroutine mpi_var_...

real(8) :: var_ksta(row_num*size_x*size_y), ...

... 

end subroutine 

The code hangs.

However, if I do this:

subroutine mpi_var_...

real(8), allocatable :: var_ksta(:) ...

allocate (var_ksta(row_num*size_x*size_y)

...

deallocate (var_ksta, STAT=status(1))

end subroutine

The code works. So how different is memory allocated in these 2 situations?

If I am not tied down by memory limit, is the 1st subroutine faster or the same as the 2nd one (with allocation / deallocation slowing it down)?

Thanks!

 

 

 

 

0 Kudos
4 Replies
Xiaoping_D_Intel
Employee
308 Views

In situation 1 the array is an automatic array which will be allocated on  stack by default. If its size is larger than your stack size setting it will cause runtime stack overflow error. The stack size can be check by shell command "ulimit -s". Compiler option "-heap-array[:size]" can be used to let compiler put automatic arrays larger than a given size on heap.

In situation 2 the array will be allocated on heap which is much larger then default stack size.

Regarding performance the allocate/deallocate calls will introduce some overhead.

Thanks,

Xiaoping

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
308 Views

If the subroutine is NOT intended to be called concurrently by multiple threads then consider

real(8), allocatable, save :: var_ksta(:)...
...
size_needed = row_num*size_x*size_y
if(.not.allocated(var_ksta) then
  allocate(var_ksta(size_needed))
else
  if(size(var_ksta) .lt. size_needed) then
    deallocate(var_ksta)
    allocate(var_ksta(size_needed))
  endif
endif

Note, the newer feature for reallocation of left hand size likely won't be effective when consolidating ranks. While it can be done, it would then require the creation of a temporary array (IOW and an unnecessary copy operation).

Also, be mindful that var_ksta could potentially be larger than size_needed.

Jim Dempsey

0 Kudos
Wee_Beng_T_
Beginner
308 Views

Hi Xiaoping,

I added in :

ulimit -s unlimited

and now the code works. Thanks for your suggestion.

 

Hi Jim,

I'm using MPI with domain decomposition so each cpu has its own region of interest. So can I still use your subroutine?

If my "size_needed" is always the same in the code, I will not need to allocate and decallocate, right?

Also, what the new subroutine does is to do create an array once, after which it always stay in memory until the code ends, is that so?

 

I

0 Kudos
jimdempseyatthecove
Honored Contributor III
308 Views

The code snip I presented was for single thread use where the allocated array is allocated once only on first call to the subroutine. It may also be used in multi-threaded code where the array is shared.

Notes: The allocated array is not deallocated upon termination of the program (process). On most modern systems this is of no consequence. The advantage of this technique is the size need not be known at compile time. However, if the size is known at compile time, the sizeof (in bytes) is permitted to be larger than 2GB.

An alternate method is to create a module, that contains the array in the data section, and allocation, deallocation, and optionally manipulation routines in the CONTAINS section. Then on program start you call the allocation routine, during program run you call the manipulation routines (or directly manipulate the array), and on program end (or when done with the array) call the deallocation/cleanup routine. The main PROGRAM (where call to allocation routine occurs) and any routine using the data/functions/subroutines will have to  USE this module.

The module route eliminates the test for allocated and size_needed on each entry (when size_needed does not change).

The ulimit is fine provided that you never intend to also enable multi-threading with private use of the entire array.

Jim Dempsey

0 Kudos
Reply