- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear community:
I would like to see if there is any recommendation on how to optimize the scenario in where I need to pass a column of a type array element as a parameter to a function while avoiding the creation of temporary arrays. The situation emerges when processing business-like data structures from data blobs that I cant easily modify.
In the simplified attached program, "data" is defined as an array of the complex type "TableA", and we need to pass the non-stride-1 array "data%A" as a simple array to the search function VSRCH:
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I first thought turning the argument LST into:
charcter(len=*) :: lst(:)
would do the trick, but that is not the case.
So I went another route:
module searching
implicit none
contains
integer function vsrch2( str, list, get )
character(len=*), intent(in) :: str
class(*), dimension(:), intent(in) :: list
interface
function get(elem)
character(len=:), allocatable :: get
class(*), intent(in) :: elem
end function get
end interface
integer :: i
vsrch2 = -1
do i = 1,size(list)
if ( get(list(i)) == str) then
vsrch2 = i
exit
endif
enddo
end function vsrch2
end module searching
! main
program test_searching
use searching
implicit none
type :: data_type
character(len=5) :: a
character(len=8) :: b
end type
type(data_type), dimension(100) :: list
list%a = 'AA'
list(4)%a = 'A'
write(*,*) vsrch2( 'A', list, get_a )
contains
function get_a( elem )
character(len=:), allocatable :: get_a
class(*), intent(in) :: elem
select type (elem)
type is (data_type)
get_a = elem%a
end select
end function get_a
end program test_searching
Rather than selecting the element you want to search in the call to the function, you provide an auxiliary access function, so that the search function can be as ignorant about the data type as possible.
I checked it with gfortran 10 as well as Intel Fortran oneAPI.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Arjen_Markus thank you.
I tried extending your solution a little bit to allow dynamic access to any of the variables in the type using an index (see attached):
I wish there was a way to create a framework to enable a generic solution for any arbitrary type and variable so that we dont have to hardcode the auxiliary access function each time, some new # expression to indicate indirect access required :
idx = vsrch3('A', list#a)
But for now, this solves my immediate problem. Thank you so much for a great solution!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the OP, one of (possibly the primary) complaint was a temporary was created. Your code suggestion is performing the equivalent with the allocation performed in the get (get_a) function.
It might be better (more code execution efficient) to use the C-interop capability to "cast" from one type to another. Note, this can be source code clean through use of a generic interface where the "funny" UDT can be used, then cast, then reused efficiently to perform an in situ search.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I must report that I went back to the original code to retrofit the proposed solution and during the process I discovered that we can apparenlty eliminate the creation of the temporary arrays in the old code by only using some of the constructs that @Arjen_Markus proposed in his solution but without having to resort the proposed auxiliary access function at all, or even the C mapping interoperability constructs to cast the types proposed.
So it looks like the original problem can be resolved if we upgrade the old code to:
- Use Module to encapsulate the search functions, and then use the Module in the main program
- Use INTENT in the search function variable definitions
- Avoid reshaping the search function arguments, by removing any assumptions that the input parameters are vectors of strings
You can check that the above are the only differences between the original.f90 and the refactor.f90, attached.
When compiled with check all, the results indicates that the original issue is now gone:
$>original.exe
forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2
Image PC Routine Line Source
original.exe 00007FF61754508C Unknown Unknown Unknown
original.exe 00007FF61754147A Unknown Unknown Unknown
original.exe 00007FF617590D3E Unknown Unknown Unknown
original.exe 00007FF617591758 Unknown Unknown Unknown
KERNEL32.DLL 00007FFD664D7974 Unknown Unknown Unknown
ntdll.dll 00007FFD67DEA2D1 Unknown Unknown Unknown
Found 5
$>refactor.exe
Found 5
So then, it looks like the compiler relies on the good use of the more modern MODULE, INTENT, and parameter passing approaches and other good practices recommended. Legacy old code written in the original style has problems and this reinforces the need to upgrade to use them.
Is this reasoning correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear comunity:
Any feedback / opinion on this?
Can we be sure that by using the approach highlited above, the compiler is producing effient code? Do you recomend to instead using accessor functions? Do you recommend rewrite the code to move the data from complex structures to simpler ones instead?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@lanrebr wrote:
Dear comunity:
Any feedback / opinion on this?
Can we be sure that by using the approach highlited above, the compiler is producing effient code? Do you recomend to instead using accessor functions? Do you recommend rewrite the code to move the data from complex structures to simpler ones instead?
@lanrebr ,
I suggest you also ask this question at the general Fortran community at https://fortran-lang.discourse.group/
My recommendation will be to get up to speed first on Fortran 2018 standard - note IFORT "classic" compiler in the free Intel oneAPI HPC toolkit is Fortran 2018 compliant.
Then strive for simplicity in your code: adopt whatever is easiest for you and anyone reading the code to understand and develop and maintain. Note the use of MODULEs, the INTENTs of dummy arguments, explicit interfaces, etc. all help you and your colleagues in this regard. Generally what helps you understand your code better also assists the compiler which then greatly increases the potential for better code optimization.
Also aim for Fortran standard-compliant code as much as possible - you can start toward this by using compiler options of warn and stand.
Then test, test, test:
- using 2 or more Fortran compilers, if you can. gfortran and NAG are options to consider
- check performance closely by instrumenting your code and/or unit and functional tests suitably
Looking at your refactor.f90, you have taken the right path.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your input. Much appreciated.
I followed your recommendation and posted this question to the general fortran forum at https://fortran-lang.discourse.group/t/on-the-creation-of-temporary-arrays-when-passing-type-element-arguments/1015
I also ran some performance testing and the results seems to indicate that in deed this solution does performs better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Based on the research, tests and discussions here and on the general fortran discourse forum, it is now evident that the proposed solution (in which the original f77 code is been modernized to use MODULE, INTENT, and DIMENSION constructs) avoids forcing a requirement for passing continuos (stride-1) arrays to the legacy function and allows the compiler to create optimal executable that does not require the creation or use of any additional temporary arrays in memory, that does NOT spend valuable CPU time copying data from the original structure into the function structures each time the method is run, and that runs much more faster than before, specially when large and complex data structures are in play.
The result above may be give significant improvements in both CPU usage, clock time and memory requirements and allow handling on even larger datasets.
The conclusion is that if you have old legacy fortran code and you have this scenario in where you need to pass columns of data structures to functions or methods that expect simple linear arrays and you are getting temporary array warnings from the compiler with -check all options, you may want to carefully look into this solution and invest a little bit of time upgrading your original fortran functions and methods to put them in Fortran Modules and updating their internal declaration of parameters to make sure the method accepts assumed array dimensions and ensuring that your methods explicitly define the INTENT of each parameter. This may not only remove your temporary array warnings, but it can also speed up your code by a factor of a 1000 as well as cut your memory use by half.
We also learned that this behaviour of the Intel Fortran, that allows you to get away to keep the old code humming without total rewrite, is not quite supported by other compilers, that do not seem to take advantage of the above constructs and still use temporary arrays.
For the purpose of cleaning this thread, I copy below the original, problem.f90 code, in is final test form:
INTEGER FUNCTION VSRCH(STR, LST, L)
IMPLICIT NONE
INTEGER L
CHARACTER(*) STR
CHARACTER(*) LST(L)
INTEGER I
VSRCH = -1
DO I=1,L
IF (LST(I) == STR) THEN
VSRCH = LOC(LST(I))
EXIT
END IF
END DO
END FUNCTION VSRCH
PROGRAM TEST
IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)
TYPE :: TableA
CHARACTER(LEN=LNA) :: A
CHARACTER(LEN=LNB) :: B
INTEGER D
REAL C
END TYPE
TYPE(TableA), DIMENSION(SIZ) :: list
INTEGER VSRCH
INTEGER IDX,I,N, OL, NL, CL
real T1,T2
CHARACTER(100) :: num
list%A = 'AA'
list(5)%A = 'A'
N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
CALL GET_COMMAND_ARGUMENT(1,num)
READ(num,*)N
END IF
OL = LOC(list(5)%A)
NL = OL
CL = 0
call cpu_time(T1)
DO I=1,N
IDX = VSRCH("A", list%A, SIZ)
IF (IDX .NE. NL) THEN
CL = CL + 1
NL = IDX
END IF
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, OL, IDX, CL, T2-T1
END
When compiled with -check all the problem.f90 code produces warnings indicating that a temporary array is required and in deed, at run time the memory position allocated in the main program is different than the memory position found inside the function:
>ifort /check:all problem.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:problem.exe
-subsystem:console
problem.obj
>problem.exe 1
forrtl: warning (406): fort: (1): In call to VSRCH, an array temporary was created for argument #2
Image PC Routine Line Source
problem.exe 00007FF64154553C Unknown Unknown Unknown
problem.exe 00007FF641541712 Unknown Unknown Unknown
problem.exe 00007FF6415A157E Unknown Unknown Unknown
problem.exe 00007FF6415A1F98 Unknown Unknown Unknown
KERNEL32.DLL 00007FFD85917974 Unknown Unknown Unknown
ntdll.dll 00007FFD87E0A2D1 Unknown Unknown Unknown
Found 2 100000 1096726720 -23021592 1
3.1250000E-02
Also I copy here the final solution code, in where small changes to the are done to avoid the above problems:
MODULE SRCH
CONTAINS
INTEGER FUNCTION VSRCH(STR, LST, L)
IMPLICIT NONE
INTEGER, INTENT(IN) :: L
CHARACTER(*), INTENT(IN):: STR
CHARACTER(*), DIMENSION(:),INTENT(IN) :: LST
INTEGER :: I
VSRCH = -1
DO I=1,L
IF (LST(I) == STR) THEN
VSRCH = LOC(LST(I))
EXIT
END IF
END DO
END FUNCTION VSRCH
END MODULE
PROGRAM TEST
USE SRCH
IMPLICIT NONE
INTEGER SIZ, LNA, LNB
PARAMETER (LNA = 10)
PARAMETER (LNB = 20)
PARAMETER (SIZ = 100000)
TYPE :: TableA
CHARACTER(LEN=LNA) :: A
CHARACTER(LEN=LNB) :: B
INTEGER D
REAL C
END TYPE
TYPE(TableA), DIMENSION(SIZ) :: list
INTEGER IDX,I,N,OL,NL,CL
REAL T1,T2
CHARACTER(10) :: num
list%A = 'AA'
list(5)%A = 'A'
N=1
IF(COMMAND_ARGUMENT_COUNT().GE.1)THEN
CALL GET_COMMAND_ARGUMENT(1,num)
READ(num,*)N
END IF
OL = LOC(list(5)%A)
NL = OL
CL = 0
call cpu_time(T1)
DO I=1,N
IDX = VSRCH("A", list%A, SIZ)
IF (IDX .NE. NL) THEN
CL = CL + 1
NL = IDX
END IF
END DO
call cpu_time(T2)
PRINT *, "Found ", I, SIZ, OL, IDX, CL, T2-T1
END
When compiled with -check all the solution.f90 does not produce any warnings and in deed, the memory position allocated in the main program is the exactly the same position in memory that is later found inside the function:
>ifort /check:all solution.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:solution.exe
-subsystem:console
solution.obj
>solution.exe 1
Found 2 100000 435796160 435796160 0
0.0000000E+00
When the original problem.f90 is compiled with full optimization /O3, and executed 10k times using SIZ=100k array sizes, it takes 2.57 seconds to run, at it makes a copy of the data each time it invokes the method. Fortunatelly uses the exact same temporary memory location so there is not a lot of trashing, but it still needs twice as much memory as the originally allocated in the main program. The tests performed also confirm that the elapsed time grows linearly with the size of the memory required, so it takes 0.2 seconds if the array is reduced to SIZ=10k. Worst, if you try to run with SIZ=300k you get memory our of stack errors as the memory required is even larger than the CPU chache allows. Also the elapsed time grows linear with the number of executions, taking 25 seconds if you run 100k times:
>ifort /O3 problem.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:problem.exe
-subsystem:console
problem.obj
>problem.exe 10000
Found 10001 100000 890218688 553694600 1
2.578125
In contrast, when the proposed solution.f90 is compiled with full optimization /O3, and executed 10k times using SIZ=100k array sizes, it takes 0 seconds to run, makes no copies at all of the data each time it invokes the method. The tests performed also confirm that there is no copy done as that the elapsed time is constant with the memory required, so it still takes almost 0 seconds if the array is reduced to SIZ=10k. If you try to run with SIZ=300k it still runs as you are not duplicating the memory required:
>ifort /O3 solution.f90
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.2.0 Build 20210228_000000
Copyright (C) 1985-2021 Intel Corporation. All rights reserved.
Microsoft (R) Incremental Linker Version 14.28.29913.0
Copyright (C) Microsoft Corporation. All rights reserved.
-out:solution.exe
-subsystem:console
solution.obj
>solution.exe 10000
Found 10001 100000 -456939328 -456939328 0
0.0000000E+00
When running for 100k searches on an Intel i5 1.6Ghz laptop, one can see the difference in intensity of the CPU utilization in the original problem (33% for 24.8 secs) compared to practically nothing in the proposed solution:
>problem.exe 100000
Found 100001 100000 890218688 287815272 1
24.84375
>solution.exe 100000
Found 100001 100000 -456939328 -456939328 0
0.0000000E+00
Please note that the above results only measure the difference in the time wasted in moving data from the main program complex structure in memory into the function simple structure temporary memory, as the search itself inside the function was set up so that the string is found after only 5 iterations, which is the equivant to searching a string in a single array of 500k entries.
So this may be a significant extra time and memory requirements that you could shave of your program by doing a little upgrade to your old fortran code.
If people familiar with the Intel Fortran compiler behaviour could verify the above observations results, it would be most helpful. Thank you all in advance for your inputs on the above and thank you to all the people who has provided so much invaluable input on this issue.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page