Solved: Re: Allocatable arrays of different type and EQUIVALENCE

Petr_Parik · ‎01-18-2021

I work on a very old code which relies on two big working arrays IW and RW, which are equivalenced in the main program and then used directly or propagated to called subroutines like this:

      program test
        implicit none
        external inner
        integer(4),parameter :: li = 100
        integer(4) iw(li)
        real(8)    rw(li/2)
        equivalence (iw,rw)
        rw = -10.20
        print *,iw
        print *,rw
!       call someroutine(iw,li,rw,lr) will also work
      end

When I replace the static arrays with dynamically allocated, then I of course cannot equivalence the arrays. I came up with this code:

      program test
        implicit none
        external inner,outer
        integer(4),parameter :: li = 100
        call outer(inner,li)
      end
      subroutine outer(proc,li)
        implicit none
        external proc
        integer(4),intent(in)  :: li
        integer(4),allocatable :: iw(:)
        allocate(iw(li))
        call proc(iw,li,iw,li/2)
        deallocate(iw)
      end
      subroutine inner(iw,li,rw,lr)
        implicit none
        integer(4),intent(in)    :: li,lr
        integer(4),intent(inout) :: iw(li)
        real(8),intent(inout)    :: rw(lr)
        rw = -10.20
        print *,iw
        print *,rw
!       call someroutine(iw,li,rw,lr) will also work
      end

So here the main program is reduced to a call to a helper subroutine outer, which allocates a single array and then passes it as two arrays to subroutine inner, which in turn contains the code of the original main program. The reason the compiler does not complain about the third parameter to inner not being a real(8) array is that in outer, inner is hidden in proc as an external subroutine with arbitrary arguments.

Is there any other, maybe more simpler or clean, way?

FortranFan · ‎01-19-2021

@Petr_Parik ,

Also, if your interest in Fortran is limited to doing the least with this legacy codebase to get it to work with a larger problem size beyond the stack limits, you are essentially done by using the approach with the IW allocatable+target attributes, RW as pointer, and C_F_POINTER memory mapping shown above. No need to read any further, just "let the sleeping dogs lie" with the legacy code.

But if that ain't the case and you need to do more with Fortran, you may want to first start with a close review of a book such as this:

https://www.amazon.com/Fortran-Scientists-Engineers-Stephen-Chapman-ebook/dp/B06XCTY8KR

And pick up on modern Fortran concepts that can open up a world of options to write better new code and improve upon existing/legacy codebases for efficient and manageable applications in scientific and technical computing.

View solution in original post

Arjen_Markus · ‎01-18-2021

One thing you should ask yourself is whether the equivalence is necessary or not. In old code this sort of things was often used to save on the (back then very precious) memory. If that is not the case, then simply allocate two different arrays.

If the equivalence is used to actually transfer information back and forth between the two arrays, you are probably out of luck and, modulo a rewrite of the parts that rely on that, you are stuck with the solution you found. Document it properly, so that future programmers have a clue as to why it was arranged in this way.

If that transfer is taking place in well identifiable parts, then you might use the transfer() function explicitly to demonstrate what is going on.

Petr_Parik · ‎01-18-2021

Well, it is not feasible to rewrite the code due to its complexity. Allocating two separate arrays might work, but due to the way the integer/real data is interleaved in those arrays there would be a lot of wasted space.

Statically allocated arrays have a limit of 2 GB and that is primary reason why I want dynamic allocation without changing anything else.

I'm just wondering if there are other options for this heresy

mecej4 · ‎01-18-2021

Your code in Subroutine Inner is in violation of the Fortran Standard:

15.5.2.13 Restrictions on entities associated with dummy arguments
31 1 While an entity is associated with a dummy argument, the following restrictions hold.
...
37 (3) Action that affects the value of the entity or any subobject of it shall be taken only through the
38 dummy argument unless ...

You can read through the section in more detail to see if the exceptions following "unless ..." are applicable to your code.

What the compiler will do when given such code may vary with the compiler options in effect, and the behavior may change when a new version of the compiler is used in place of an older one.

Petr_Parik · ‎01-18-2021

I should clarify that the code never accesses the same region of memory from both IW and RW, nor is data ever copied between IW and RW. The code just assigns (slices) the memory to integer or real data, keeping start and end indexes of each region. That was the only way to keep memory requirements as low as possible in FORTRAN IV/77 while allowing arbitrary assignment of integer and real data.

The PRINT statement in the example is just for checking the equivalence; the arrays are never used in this way in the code.

jimdempseyatthecove · ‎01-18-2021

>>I should clarify that the code never accesses the same region of memory from both IW and RW, nor is data ever copied between IW and RW

Because your example illustrates the equivalence of the entirety of each array, and both with starting index==1, it is reasonable to assume one of IW or RW takes the front part of the memory and the other takes the back part of the memory. IOW one used the memory indexes starting at an offset.

When IW is in the front half

integer :: N, iRWbegin, iRWend
integer, allocatable :: IW(:)
real(8), allocatable :: RW(:)
...
iRWbegin = (N+1)/2 + 1
iRWend = iRWbegin + N - 1
allocate(IW(N),RW(iRWbegin:iRWend))

I'd also suggest rigorous testing with bounds checking enabled verify your assumption of "never accesses the same region of memory"

Jim Dempsey

Petr_Parik · ‎01-18-2021

It seems I have skipped a significant portion of the explanation.

This code illustrates the assignment of memory and why the regions never access each other (unless the programmer messes it up). Let's note this was designed in '80s and the authors quite knew what they were doing.

      PROGRAM SAMPLE
      IMPLICIT INTEGER*4 (I-N), REAL*8 (A-H,O-Z)
      PARAMETER (LI = 300 000 000)
      DIMENSION IW(LI),RW(LI/2)
      EQUIVALENCE (IW(1),RW(1)) ! core arrays share memory
      ICL = 2
C     determine some actual array sizes
      LINET = ...
      LNNET = ...
      LNNDF = ...
      LFIXV = ...
      NFIXV = ...
C     assign memory
      LBINE = 1
      LBNNE = LBINE+LINET
      LBNND = LBNNE+LNNET
      LBFIX = LBNND+LNNDF
      LBFIX = (LBFIX-1)/ICL+1 ! convert index IW->RW
      LBIFI = LBFIX+LFIXV
      LBIFI = (LBIFI-1)*ICL+1 ! convert index RW->IW
      LIUS  = LBIFI+NFIXV-1   ! calculate the size of used memory after this assignment
C     load data into sub-arrays or whatever
C     ...
C     do some work
      CALL STUFF(IW,LI,RW,LR,LIUS,IW(LBINE),LINET,IW(LBNNE),LNNET,
     &IW(LBNND),LNNDF,RW(LBFIX),LFIXV,IW(LBIFI),NFIXV,)
      END

Each program starts with the first element of IW or RW and contiguously assigns sub-arrays as needed (possibly leaving small gaps of 1 integer). This process can be repeated inside subroutine STUFF, because it receives both core arrays including the core size already used (LIUS). Most subroutines just receive the sub-arrays and work on them, but some subroutines may need additional memory, so they can use the remaining parts of IW and RW (or stop if there is not enough space).

Because the whole system is designed this way, the code is oblivious to the way the core arrays IW and RW are allocated in the main program as long as they share memory. But you cannot say in advance how much integer data and real data you will need for a particular program. You can finish one part of a computation, move the results to the beginning of IW (RW), then start another part of computation with maximum available space. This level of efficiency is impossible with dynamic allocation as random allocation and freeing of arrays inevitably creates fragmentation and subsequent request for large allocation might fail even if the sum of free memory blocks would be sufficient.

Also because of this design it is relatively simple to extend the original core size LI to INTEGER*8 and work with 64-bit memory. This is why I try to replace the original static allocation with dynamic, because all statically allocated array are placed in a specific program segment and collectively limited to 2 GiB (or something like that, I am no expert).

If I am not mistaken this problem is nonexistent in C where you can simply recast any variable to the desired type while passing it (by reference) to a function (subroutine).

jimdempseyatthecove · ‎01-18-2021

So your system has a blob of memory that it is treating as a heap.

I would imagine (hope) that you have something (function or subroutine) that amounts to an allocate and deallocate. Where the analog to allocate, examines tell tails in the blob (aka your heap) for unused space, if found, identifies it as in use, then returns the starting index. Your deallocate does the inverse.

Should this be the case (central procedure for allocate/deallocate), you could still use the two allocatable arrays, and then reallocate them as they need to grow (all the indexes would remain). All the existing code that calls your allocate/deallocate need not change.

For performance purposes, instead of having what amounts to linked lists of free nodes, that you maintain tables of allocations by size. Only when no prior allocation and deallocation of same size exists in the table, you would then realloc the blob (or split larger free subsections).

Jim Dempsey

Petr_Parik · ‎01-18-2021

Yes, it is essentially a blob of memory, but it is not a heap, more like a queue. It is as simple as in the sample program posted above, the only required datum for more partitioning is the size of the already 'allocated' blob. The allocation and deallocation is really only in the mind of the programmer; it is their task to correctly partition the blob for the particular problem and ensure there is no overlapping. But it is really simple.

We are talking thousands of lines of code, NASTRAN-level complexity. I am no expert, but I think that when I have tens of arrays of tens of gigabytes randomly allocated and deallocated (or enlarged) through a program, I could quickly end with no memory due to fragmentation.

I am really interested in other ways how to solve this, but it seems that my 'hack' is so far the simplest, albeit non-standard compiler-dependent solution.

jimdempseyatthecove · ‎01-18-2021

>>but it is not a heap, more like a queue.

Do you mead to say only allocations are performed? If this is the case, then the problem is much simpler.

You use two allocatable arrays to an initial size, have an index indicating next available index and use the size(RW or IW) of the array to indicate last available index. The (your) allocate checks remaining size (size(RW) - nextRWindex +1) against the allocation request, and when available, return the current nextRWindex, and advance nextRWindex by the "allocation" request. Should the size remaining be insufficient, you would then reallocate (larger than what is requested), then redo your allocation from unallocated space of the blob.

A potential gotcha would be if your application is multi-threaded, then you would have to be careful as to when you perform the allocations.

There are other less savory methods such as using allocate(), then using LOC() to obtain the location of a pseudo array ( say LOC(RW(1)) and the LOC(newAlloc(1))), compute the difference and divide by sizeof(RW(1)). This will produce an out-of bounds index of RW that can be used to reference the data in the newAlloc array. You would have to .NOT. use bounds checking for this to work. You'd be playing with matches so to speak.

Third option.

Allocations occur in two phases:

At time of allocate() virtual address space is obtained from the process heap. (barring debug build wiping the allocation) this results it possibly two virtual memory pages being referenced, that at the front of the node, and that at the back of the node.

Resource allocation does not occur until "first touch", iow, while the virtual addresses are allocated from the heap, likely only 1 page of resource has been allocated (resource == one page of RAM and/or one page of pagefile).

This is to say, limited to available page file space (under your control), you can allocate a humongous blob (of virtual address space), and then let the "first touch" take care of the actual resource allocation.

Jim Dempsey

Petr_Parik · ‎01-18-2021

@jimdempseyatthecove, thank you for your suggestions. You have mentioned LOC(); actually, my first solution was to use %REF(), see below.

@FortranFan, thank you, C_F_POINTER looks interesting, it seems it would not require moving the code to an extra subroutine.

Please don't be mad at me, but I'm gonna stick with "if it ain't broken, don't fix it" approach. The original code worked flawlessly 40+ years with EQUIVALENCEd arrays (which are still standard Fortran AFAIK) so there is little need to change the internal memory partitioning. The problem is that such arrays are limited in size by the compiler, which actually does matter on modern memory-abundant computers. This discussion is about how to enlarge these equivalenced arrays. I know I have to allocate them dynamically, but I need to retain the equivalence somehow.

Frankly, if tricking the compiler with a few lines of code works, why rewrite hundreds of lines of code while risking introducing possibly serious bugs? I am an engineer and the extra work and time is simply not worth it.

Right now I see three possibilities:

%REF: tell the compiler not to check the argument type by passing %REF(IW).
EXTERNAL: trick the compiler into not checking the argument type by passing IW to an external subroutine. I admit this is somewhat complicated, requiring two extra subroutines.
C_F_POINTER: Set RW to the address of IW (?). I need to investigate this tomorrow. Once, I played with pointers and targets but RW => IW obviously did not work due to type mismatch.

So can any Fortran expert please comment on whether %REF is a 'cleaner' approach than the EXTERNAL CALL approach or the C_F_POINTER approach?

FortranFan · ‎01-18-2021

Note the Fortran standard requires "An equivalence-object shall not be ... an allocatable variable .."

..
35 C8106 (R872) An equivalence-object shall not be a designator with a base object that is a dummy argument, a function result,
36              a pointer, an allocatable variable, a derived-type object that has an allocatable or pointer ultimate component, an object
37              of a nonsequence derived type, an automatic data object, a coarray, a variable with the BIND attribute, a variable in a
38              common block that has the BIND attribute, or a named constant.
..

If you want to make the least change to your code "architecture" whilst using dynamic memory, I suggest the approach I showed above using C interoperability facilities in Fortran. Play around with the modified 'SAMPLE' program I pasted above and see how far you can get.

Note you cannot pointer associate your 'RW' directly with your 'IW' target. The interoperability option allows you a workaround this via the C processor structure where objects can all be seen as references to some arbitrary base type, say a void * pointer.

FortranFan · ‎01-18-2021

@Petr_Parik wrote:
.. So can any Fortran expert please comment on whether %REF is a 'cleaner' approach than the EXTERNAL CALL approach or the C_F_POINTER approach?

@Petr_Parik ,

Unless your interest is in an exclusively Intel-specific solution, you may also want to inquire at the Fortran Discourse site for wider feedback:

https://fortran-lang.discourse.group/

Petr_Parik · ‎01-19-2021

Ok, so first here's a simple solution with %REF(). It has one drawback though - you have to recalculate the starting indices LBRA1, LBRA2 to match the indices of IW. And the compiler will possibly hurl warnings at you for the use of %REF().

      PROGRAM TEST_REF
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      PARAMETER (LI = 100)
      ALLOCATABLE IW(:)
      ALLOCATE(IW(LI))
      LBIA1 =  1; LIA1 = 10
      LBIA2 = 21; LIA2 = 10
      LBRA1 = 11; LRA1 = 5
      LBRA2 = 31; LRA2 = 5
      CALL WORK1(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           %REF(IW(LBRA1)),LRA1,%REF(IW(LBRA2)),LRA2)
      CALL WORK2(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           %REF(IW(LBRA1)),LRA1,%REF(IW(LBRA2)),LRA2)
      END
      SUBROUTINE WORK1(IA1,LIA1,IA2,LIA2,RA1,LRA1,RA2,LRA2)
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      DIMENSION IA1(LIA1),IA2(LIA2),RA1(LRA1),RA2(LRA2)
      IA1 = 1
      IA2 = 2
      RA1 = .123456789
      RA2 = .987654321
      END
      SUBROUTINE WORK2(IA1,LIA1,IA2,LIA2,RA1,LRA1,RA2,LRA2)
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      DIMENSION IA1(LIA1),IA2(LIA2),RA1(LRA1),RA2(LRA2)
      PRINT *,"IA1 ="
      PRINT *,IA1
      PRINT *,"IA2 ="
      PRINT *,IA2
      PRINT *,"RA1 ="
      PRINT *,RA1
      PRINT *,"RA2 ="
      PRINT *,RA2
      END

And finally here's a second, extremely neat solution with C_F_POINTER(). Subroutines WORK1 and WORK2 are the same as above.

      PROGRAM TEST_PTR
      USE ISO_C_BINDING
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      PARAMETER (LI = 100)
      ALLOCATABLE IW(:),RW(:)
      ALLOCATE(IW(LI))
      CALL C_F_POINTER(C_LOC(IW),RW,(/LI/2/)) ! RW => IW
      LBIA1 =  1; LIA1 = 10
      LBIA2 = 21; LIA2 = 10
      LBRA1 =  6; LRA1 = 5
      LBRA2 = 16; LRA2 = 5
      CALL WORK1(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           RW(LBRA1),LRA1,RW(LBRA2),LRA2)
      CALL WORK2(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           RW(LBRA1),LRA1,RW(LBRA2),LRA2)
      END

Note that it does not work if RW is declared POINTER RW(:) - you then get errors on subroutine calls. I have no idea what's the problem here (ifort 19.1.3.304 20200925):

test3.f(13): error #8284: If the actual argument is scalar, the dummy argument shall be scalar unless the actual argument is of type character or is an element of an array that is not assumed shape, pointer, or polymorphic. [RA1]
CALL WORK1(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
-----------^

@FortranFan thank you so much! I am very well aware that we are dealing with dark magic here... but hey, it works!

FortranFan · ‎01-19-2021

@Petr_Parik wrote:
.. @FortranFan thank you so much! I am very well aware that we are dealing with dark magic here... but hey, it works!

@Petr_Parik ,

Oddly enough, it's a compiler bug that allowed your tweak to work. Please do take note:

There is no "dark magic" here really. What is coming into play are 2 aspects: a) the Fortran processor is interoperating with a companion C processor which stipulates certain requirements on it and b) the simply contiguous nature of Fortran arrays as per the Fortran standard,
ALLOCTABLE attribute on RW does not make sense, rather it's the POINTER attribute that is called for
On the other hand, IW does require a TARGET attribute

Note when you wrote, "it does not work if RW is declared POINTER RW(:) - you then get errors on subroutine calls. I have no idea what's the problem here (ifort 19.1.3.304 20200925)", it was because of interface checking. In your code, a scalar [e.g., an array element arr(iarr)] is passed as actual argument to a subprogram where the dummy argument is declared as an array. This does not conform to the Fortran standard even though it was a rather common practice in legacy FORTRAN code. With Intel Fortran and with interface generation turned on, this leads to an error - see this:

https://software.intel.com/content/www/us/en/develop/documentation/fortran-compiler-oneapi-dev-guide-and-reference/top/compiler-reference/compiler-options/compiler-option-details/compiler-diagnostic-options/gen-interfaces.html

This is what you can try:

      PROGRAM TEST_PTR
      USE ISO_C_BINDING
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      PARAMETER (LI = 100)
      ALLOCATABLE IW(:)
      TARGET IW
      REAL(8), POINTER :: RW(:)
      ALLOCATE(IW(LI))
      CALL C_F_POINTER(C_LOC(IW),RW,(/LI/2/)) ! RW => IW
      LBIA1 =  1; LIA1 = 10
      LBIA2 = 21; LIA2 = 10
      LBRA1 =  6; LRA1 = 5
      LBRA2 = 16; LRA2 = 5
      CALL WORK1(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           RW(LBRA1),LRA1,RW(LBRA2),LRA2)
      CALL WORK2(IW(LBIA1),LIA1,IW(LBIA2),LIA2,
     &           RW(LBRA1),LRA1,RW(LBRA2),LRA2)
      END
      SUBROUTINE WORK1(IA1,LIA1,IA2,LIA2,RA1,LRA1,RA2,LRA2)
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      DIMENSION IA1(LIA1),IA2(LIA2),RA1(LRA1),RA2(LRA2)
      IA1 = 1
      IA2 = 2
      RA1 = .123456789
      RA2 = .987654321
      END
      SUBROUTINE WORK2(IA1,LIA1,IA2,LIA2,RA1,LRA1,RA2,LRA2)
      IMPLICIT INTEGER(4) (I-N), REAL(8) (A-H,O-Z)
      DIMENSION IA1(LIA1),IA2(LIA2),RA1(LRA1),RA2(LRA2)
      PRINT *,"IA1 ="
      PRINT *,IA1
      PRINT *,"IA2 ="
      PRINT *,IA2
      PRINT *,"RA1 ="
      PRINT *,RA1
      PRINT *,"RA2 ="
      PRINT *,RA2
      END

C:\Temp>ifort /nogen-interfaces test_ptr.for
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.1.2 Build 20201208_000000
Copyright (C) 1985-2020 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.26.28806.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test_ptr.exe
-subsystem:console
test_ptr.obj

C:\Temp>test_ptr.exe
 IA1 =
           1           1           1           1           1           1
           1           1           1           1
 IA2 =
           2           2           2           2           2           2
           2           2           2           2
 RA1 =
  0.123456791043282       0.123456791043282       0.123456791043282
  0.123456791043282       0.123456791043282
 RA2 =
  0.987654328346252       0.987654328346252       0.987654328346252
  0.987654328346252       0.987654328346252

C:\Temp>

FortranFan · ‎01-19-2021

@Petr_Parik ,

Also, if your interest in Fortran is limited to doing the least with this legacy codebase to get it to work with a larger problem size beyond the stack limits, you are essentially done by using the approach with the IW allocatable+target attributes, RW as pointer, and C_F_POINTER memory mapping shown above. No need to read any further, just "let the sleeping dogs lie" with the legacy code.

But if that ain't the case and you need to do more with Fortran, you may want to first start with a close review of a book such as this:

https://www.amazon.com/Fortran-Scientists-Engineers-Stephen-Chapman-ebook/dp/B06XCTY8KR

And pick up on modern Fortran concepts that can open up a world of options to write better new code and improve upon existing/legacy codebases for efficient and manageable applications in scientific and technical computing.

Petr_Parik · ‎01-20-2021

@FortranFan , thank you. With TARGET IW and POINTER RW it works both in ifort and gfortran, but still I need to change RW(index) to RW(index:index) in CALLs to make the argument an array. I suppose there is no way to avoid the "the argument must be an array" requirement...?

ALLOCATABLE RW is some bug in ifort, gfortran does not allow it as fptr in C_F_POINTER must be a POINTER.

FortranFan · ‎01-20-2021

@Petr_Parik wrote:
.. With TARGET IW and POINTER RW it works both in ifort and gfortran, but still I need to change RW(index) to RW(index:index) in CALLs to make the argument an array. I suppose there is no way to avoid the "the argument must be an array" requirement...?

@Petr_Parik ,

Re: "I suppose there is no way to avoid the "the argument must be an array" requirement...?" -

As I wrote earlier, do take note if you use the Fortran compilers of interest suitably enough to suppress the modern Fortran aspect of interface checking (e.g., with /nongen-interfaces option with Intel Fortran compiler), then you should not have any issues with continuing to proceed with your non-standard legacy FORTRAN-like usage of `RW(index)` references in subroutine CALLs.

Note this may require you to compile the subroutines/functions of your legacy code separately. Meaning if all the code is in one file (or in combined file units), it may automatically trigger the interface checking since the compiler is then able to see both the callee and caller and will complain about a scalar passed to an array dummy.

So you can look into that vis-a-vis the compilers you intent to use; you may need to consult their reference guides.

FortranFan · ‎01-20-2021

Re: "I need to change RW(index) to RW(index:index) in CALLs" - CAUTION: this is array section notation which has its own semantics as explained in the Fortran standard. With the code you are working with, this can issues - you may run to segmentation faults.

If you want to make the least amount of changes, my suggestion to proceed with no interface checking remains.

Petr_Parik · ‎01-21-2021

@FortranFan , thank you. I think have enough information on extending the arrays.

FortranFan · ‎01-21-2021

@Petr_Parik ,

If you were helped with an answer to your inquiry on this thread, can you please mark it as a "solution" per this Intel community terminology so other readers can benefit from knowing how things worked out for you?