Passing complex Type array variables as parameters without requiring temporary arrays

lanrebr · ‎04-01-2021

Dear community:

I would like to see if there is any recommendation on how to optimize the scenario in where I need to pass a column of a type array element as a parameter to a function while avoiding the creation of temporary arrays. The situation emerges when processing business-like data structures from data blobs that I cant easily modify.

In the simplified attached program, "data" is defined as an array of the complex type "TableA", and we need to pass the non-stride-1 array "data%A" as a simple array to the search function VSRCH:

TYPE TableA

CHARACTER(5) A

CHARACTER(8) B

....

END TYPE

TYPE(TableA) data(10)

IDX = VSRCH(a, DATA%A, 10)

where the VSRC sample function uses a simple search to find the index of the string STR=a in the passed array LST=DATA%A of length L=10:

FUNCTION VSRCH(STR, LST, L)

CHARACTER(*) STR

CHARACTER(*) LST(L)

...

END FUNCTION

Once compiled with options -check all, the sample program reports the performance warning:

forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2

This is very similar to the situation for non-continous arrays that was discussed in https://community.intel.com/t5/Intel-Fortran-Compiler/An-array-temporary-was-created-for-argument-x/td-p/1186619 and it is understandable that in cases where the user arbitrary choses what entries from the array are passed to the function. However, in this case, the structure of the Type is known ahead of time, and there should be enough information for the compiler in the Type TableA static definitions to avoid copying the data%A into a temporary array.

I have tried several ways to define the Type or define how the method VSRCH receives the data, (including adding an INTERFACE and explicilty defining the INTENT(IN) for all the parameters as recommended in the same thread by @jimdempseyatthecove) but none of these seem to help.

Can you think of a way to improve the situation and avoid the copying the data?

Any help/guidance is much appreciated.

Thanks

lanrebr

Arjen_Markus · ‎04-01-2021

I first thought turning the argument LST into:

charcter(len=*) :: lst(:)

would do the trick, but that is not the case.

So I went another route:

module searching
    implicit none

contains
integer function vsrch2( str, list, get )
    character(len=*), intent(in) :: str
    class(*), dimension(:), intent(in) :: list

    interface
        function get(elem)
            character(len=:), allocatable :: get
            class(*), intent(in)          :: elem
        end function get
    end interface

    integer :: i

    vsrch2 = -1
    do i = 1,size(list)
        if ( get(list(i)) == str) then
            vsrch2 = i
            exit
        endif
    enddo
end function vsrch2
end module searching

! main
program test_searching
    use searching
    implicit none

    type :: data_type
        character(len=5) :: a
        character(len=8) :: b
    end type

    type(data_type), dimension(100) :: list

    list%a = 'AA'
    list(4)%a = 'A'

    write(*,*) vsrch2( 'A', list, get_a )
contains
function get_a( elem )
    character(len=:), allocatable :: get_a
    class(*), intent(in)          :: elem

    select type (elem)
        type is (data_type)
            get_a = elem%a
    end select
end function get_a
end program test_searching

Rather than selecting the element you want to search in the call to the function, you provide an auxiliary access function, so that the search function can be as ignorant about the data type as possible.

I checked it with gfortran 10 as well as Intel Fortran oneAPI.

lanrebr · ‎04-02-2021

@Arjen_Markus thank you.

I tried extending your solution a little bit to allow dynamic access to any of the variables in the type using an index (see attached):

select type (elem)

type is (data_type)

if (i==1) then

get_a = elem%a

else

get_a = elem%b

end if

end select

I wish there was a way to create a framework to enable a generic solution for any arbitrary type and variable so that we dont have to hardcode the auxiliary access function each time, some new # expression to indicate indirect access required :

idx = vsrch3('A', list#a)

But for now, this solves my immediate problem. Thank you so much for a great solution!

jimdempseyatthecove · ‎04-02-2021

In the OP, one of (possibly the primary) complaint was a temporary was created. Your code suggestion is performing the equivalent with the allocation performed in the get (get_a) function.

It might be better (more code execution efficient) to use the C-interop capability to "cast" from one type to another. Note, this can be source code clean through use of a generic interface where the "funny" UDT can be used, then cast, then reused efficiently to perform an in situ search.

Jim Dempsey

lanrebr · ‎04-02-2021

I must report that I went back to the original code to retrofit the proposed solution and during the process I discovered that we can apparenlty eliminate the creation of the temporary arrays in the old code by only using some of the constructs that @Arjen_Markus proposed in his solution but without having to resort the proposed auxiliary access function at all, or even the C mapping interoperability constructs to cast the types proposed.

So it looks like the original problem can be resolved if we upgrade the old code to:

Use Module to encapsulate the search functions, and then use the Module in the main program
Use INTENT in the search function variable definitions
Avoid reshaping the search function arguments, by removing any assumptions that the input parameters are vectors of strings

You can check that the above are the only differences between the original.f90 and the refactor.f90, attached.

When compiled with check all, the results indicates that the original issue is now gone:

$>original.exe
forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2

Image PC Routine Line Source
original.exe 00007FF61754508C Unknown Unknown Unknown
original.exe 00007FF61754147A Unknown Unknown Unknown
original.exe 00007FF617590D3E Unknown Unknown Unknown
original.exe 00007FF617591758 Unknown Unknown Unknown
KERNEL32.DLL 00007FFD664D7974 Unknown Unknown Unknown
ntdll.dll 00007FFD67DEA2D1 Unknown Unknown Unknown
Found 5

$>refactor.exe
Found 5

So then, it looks like the compiler relies on the good use of the more modern MODULE, INTENT, and parameter passing approaches and other good practices recommended. Legacy old code written in the original style has problems and this reinforces the need to upgrade to use them.

Is this reasoning correct?

lanrebr · ‎04-08-2021

Dear comunity:

Any feedback / opinion on this?

Can we be sure that by using the approach highlited above, the compiler is producing effient code? Do you recomend to instead using accessor functions? Do you recommend rewrite the code to move the data from complex structures to simpler ones instead?

FortranFan · ‎04-08-2021

@lanrebr wrote:
Dear comunity:

    Any feedback / opinion on this?

    Can we be sure that by using the approach highlited above, the compiler is producing effient code? Do you recomend to instead using accessor functions? Do you recommend rewrite the code to move the data from complex structures to simpler ones instead?

@lanrebr ,

I suggest you also ask this question at the general Fortran community at https://fortran-lang.discourse.group/

My recommendation will be to get up to speed first on Fortran 2018 standard - note IFORT "classic" compiler in the free Intel oneAPI HPC toolkit is Fortran 2018 compliant.

Then strive for simplicity in your code: adopt whatever is easiest for you and anyone reading the code to understand and develop and maintain. Note the use of MODULEs, the INTENTs of dummy arguments, explicit interfaces, etc. all help you and your colleagues in this regard. Generally what helps you understand your code better also assists the compiler which then greatly increases the potential for better code optimization.

Also aim for Fortran standard-compliant code as much as possible - you can start toward this by using compiler options of warn and stand.

Then test, test, test:

using 2 or more Fortran compilers, if you can. gfortran and NAG are options to consider
check performance closely by instrumenting your code and/or unit and functional tests suitably

Looking at your refactor.f90, you have taken the right path.

lanrebr · ‎04-09-2021

Thank you for your input. Much appreciated.

I followed your recommendation and posted this question to the general fortran forum at https://fortran-lang.discourse.group/t/on-the-creation-of-temporary-arrays-when-passing-type-element-arguments/1015

I also ran some performance testing and the results seems to indicate that in deed this solution does performs better.

lanrebr · ‎04-10-2021

Based on the research, tests and discussions here and on the general fortran discourse forum, it is now evident that the proposed solution (in which the original f77 code is been modernized to use MODULE, INTENT, and DIMENSION constructs) avoids forcing a requirement for passing continuos (stride-1) arrays to the legacy function and allows the compiler to create optimal executable that does not require the creation or use of any additional temporary arrays in memory, that does NOT spend valuable CPU time copying data from the original structure into the function structures each time the method is run, and that runs much more faster than before, specially when large and complex data structures are in play.

The result above may be give significant improvements in both CPU usage, clock time and memory requirements and allow handling on even larger datasets.

The conclusion is that if you have old legacy fortran code and you have this scenario in where you need to pass columns of data structures to functions or methods that expect simple linear arrays and you are getting temporary array warnings from the compiler with -check all options, you may want to carefully look into this solution and invest a little bit of time upgrading your original fortran functions and methods to put them in Fortran Modules and updating their internal declaration of parameters to make sure the method accepts assumed array dimensions and ensuring that your methods explicitly define the INTENT of each parameter. This may not only remove your temporary array warnings, but it can also speed up your code by a factor of a 1000 as well as cut your memory use by half.

We also learned that this behaviour of the Intel Fortran, that allows you to get away to keep the old code humming without total rewrite, is not quite supported by other compilers, that do not seem to take advantage of the above constructs and still use temporary arrays.

For the purpose of cleaning this thread, I copy below the original, problem.f90 code, in is final test form:

INTEGER FUNCTION VSRCH(STR, LST, L)
   IMPLICIT NONE
   INTEGER L
   CHARACTER(*) STR
   CHARACTER(*) LST(L)
   INTEGER I
   VSRCH = -1
   DO I=1,L
       IF (LST(I) == STR) THEN
            VSRCH = LOC(LST(I))
            EXIT
       END IF
   END DO
END FUNCTION VSRCH

PROGRAM TEST

IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)

TYPE :: TableA
  CHARACTER(LEN=LNA) :: A
  CHARACTER(LEN=LNB) :: B
  INTEGER D
  REAL C
END TYPE

TYPE(TableA), DIMENSION(SIZ) :: list
INTEGER VSRCH
INTEGER IDX,I,N, OL, NL, CL
real T1,T2 
CHARACTER(100) :: num

list%A = 'AA'
list(5)%A = 'A'

N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
   CALL GET_COMMAND_ARGUMENT(1,num)
   READ(num,*)N 
END IF

OL = LOC(list(5)%A)
NL = OL
CL = 0
call cpu_time(T1)
DO I=1,N
   IDX = VSRCH("A", list%A, SIZ)
   IF (IDX .NE. NL) THEN
      CL = CL + 1
      NL = IDX
   END IF
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, OL, IDX, CL, T2-T1

END

When compiled with -check all the problem.f90 code produces warnings indicating that a temporary array is required and in deed, at run time the memory position allocated in the main program is different than the memory position found inside the function:

>ifort /check:all problem.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:problem.exe
-subsystem:console
problem.obj

>problem.exe 1
forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2

Image              PC                Routine            Line        Source
problem.exe        00007FF64154553C  Unknown               Unknown  Unknown
problem.exe        00007FF641541712  Unknown               Unknown  Unknown
problem.exe        00007FF6415A157E  Unknown               Unknown  Unknown
problem.exe        00007FF6415A1F98  Unknown               Unknown  Unknown
KERNEL32.DLL       00007FFD85917974  Unknown               Unknown  Unknown
ntdll.dll          00007FFD87E0A2D1  Unknown               Unknown  Unknown
 Found            2      100000  1096726720   -23021592           1
  3.1250000E-02

Also I copy here the final solution code, in where small changes to the are done to avoid the above problems:

MODULE SRCH
CONTAINS
INTEGER FUNCTION VSRCH(STR, LST, L)
   IMPLICIT NONE
   INTEGER, INTENT(IN) :: L
   CHARACTER(*), INTENT(IN):: STR
   CHARACTER(*), DIMENSION(:),INTENT(IN) :: LST
   INTEGER :: I
   VSRCH = -1
   DO I=1,L
       IF (LST(I) == STR) THEN
            VSRCH = LOC(LST(I))
            EXIT
       END IF
   END DO
END FUNCTION VSRCH
END MODULE

PROGRAM TEST

USE SRCH

IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)

TYPE :: TableA
  CHARACTER(LEN=LNA) :: A
  CHARACTER(LEN=LNB) :: B
  INTEGER D
  REAL C
END TYPE

TYPE(TableA), DIMENSION(SIZ) :: list

INTEGER IDX,I,N,OL,NL,CL
REAL T1,T2 
CHARACTER(10) :: num

list%A = 'AA'
list(5)%A = 'A'

N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
   CALL GET_COMMAND_ARGUMENT(1,num)
   READ(num,*)N 
END IF

OL = LOC(list(5)%A)
NL = OL
CL = 0
call cpu_time(T1)
DO I=1,N
   IDX = VSRCH("A", list%A, SIZ)
   IF (IDX .NE. NL) THEN
      CL = CL + 1
      NL = IDX
   END IF
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, OL, IDX, CL, T2-T1

END

When compiled with -check all the solution.f90 does not produce any warnings and in deed, the memory position allocated in the main program is the exactly the same position in memory that is later found inside the function:

>ifort /check:all solution.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:solution.exe
-subsystem:console
solution.obj

>solution.exe 1
 Found            2      100000   435796160   435796160           0
  0.0000000E+00

When the original problem.f90 is compiled with full optimization /O3, and executed 10k times using SIZ=100k array sizes, it takes 2.57 seconds to run, at it makes a copy of the data each time it invokes the method. Fortunatelly uses the exact same temporary memory location so there is not a lot of trashing, but it still needs twice as much memory as the originally allocated in the main program. The tests performed also confirm that the elapsed time grows linearly with the size of the memory required, so it takes 0.2 seconds if the array is reduced to SIZ=10k. Worst, if you try to run with SIZ=300k you get memory our of stack errors as the memory required is even larger than the CPU chache allows. Also the elapsed time grows linear with the number of executions, taking 25 seconds if you run 100k times:

>ifort /O3 problem.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:problem.exe
-subsystem:console
problem.obj

>problem.exe 10000
 Found        10001      100000   890218688   553694600           1
   2.578125

In contrast, when the proposed solution.f90 is compiled with full optimization /O3, and executed 10k times using SIZ=100k array sizes, it takes 0 seconds to run, makes no copies at all of the data each time it invokes the method. The tests performed also confirm that there is no copy done as that the elapsed time is constant with the memory required, so it still takes almost 0 seconds if the array is reduced to SIZ=10k. If you try to run with SIZ=300k it still runs as you are not duplicating the memory required:

>ifort /O3 solution.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:solution.exe
-subsystem:console
solution.obj

>solution.exe 10000
 Found        10001      100000  -456939328  -456939328           0
  0.0000000E+00

When running for 100k searches on an Intel i5 1.6Ghz laptop, one can see the difference in intensity of the CPU utilization in the original problem (33% for 24.8 secs) compared to practically nothing in the proposed solution:

>problem.exe 100000
 Found       100001      100000   890218688   287815272           1
   24.84375

>solution.exe 100000
 Found       100001      100000  -456939328  -456939328           0
  0.0000000E+00

Please note that the above results only measure the difference in the time wasted in moving data from the main program complex structure in memory into the function simple structure temporary memory, as the search itself inside the function was set up so that the string is found after only 5 iterations, which is the equivant to searching a string in a single array of 500k entries.

So this may be a significant extra time and memory requirements that you could shave of your program by doing a little upgrade to your old fortran code.

If people familiar with the Intel Fortran compiler behaviour could verify the above observations results, it would be most helpful. Thank you all in advance for your inputs on the above and thank you to all the people who has provided so much invaluable input on this issue.