- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I discovered the following behavior in ifort and ifx which I found strange. To me it seems like a missed optimization opportunity. The behavior is identical in ifort and ifx.
Given the following example:
SUBROUTINE stupid(kk, jj, ii, arr)
IMPLICIT NONE
INTEGER :: kk, jj, ii
REAL :: arr(kk, jj, ii)
WRITE(*, *) arr(1, 1, 1)
END SUBROUTINE stupid
PROGRAM main
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr
kk = 16
jj = 16
ii = 16
l = 1
arr_p => arr
WRITE(*,*) SIZE(arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
END PROGRAM main
With both recent 'ifx' and 'ifort' (on Linux), I get the following when I compile with '-check all':
forrtl: warning (406): fort: (1): In call to STUPID, an array temporary was created for argument #4
Image PC Routine Line Source
output.s 000000000040591C main 32 example.f90
output.s 000000000040515D Unknown Unknown Unknown
libc-2.31.so 00007FF451E6F083 __libc_start_main Unknown Unknown
output.s 000000000040507E Unknown Unknown Unknown
So it seems like an array temporary is generated. This is strictly speaking not necessary, since the slice is perfectly contiguous, it is unit stride. GFortran manage to compile this without generating any temporary array.
If the calling of the 'stupid' routine is just slightly simplified it seems to be fine without any temporaries:
ip = (l-1)*kk*jj*ii+1
CALL stupid(kk, jj, ii, arr_p(ip:l*kk*jj*ii))
I have experimented a bit with variations, and to me it seems to depend on what is in the start of the slice (i.e. the part before the colon : ). If the start of the slice selection is just a plain variable (like 'ip') no temporary is generated. If there is just a simple addition, like 'ip + 1' before : it also works. Thirdly, multiplication works... However, if there is a 'complicated' expression with parenthesizes like (l-1) in there, the temporary is generated. However, that logic is not the same on the part after the :, there it seems you can have complicated expressions with () without that influencing if the compiler generate a temporary or not.
Foe example, the following does not seem to generate a temporary:
l = 0
CALL stupid(kk, jj, ii, arr_p(l*kk*jj*ii+1:(l+1)*kk*jj*ii))
Where the only difference from the original example is that there are no () before :, but this time this is after the :, but that works fine.
See the compiler explorer with a side-by-side GFortran and Intel comparison: https://godbolt.org/z/rv5c7WKo5
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Link copiado
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I compiled that in windows and didn't get that warning for a temp (Intel® Fortran Compiler Classic 2021.7.0 [Intel(R) 64]) but I did get:
warning #8889: Explicit interface or EXTERNAL declaration is required. [STUPID]
If there is an explicit interface (ie the compiler knows more about STUPID) does the temp issue go away?
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
No, putting it in a module does not help:
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(kk, jj, ii, arr)
IMPLICIT NONE
INTEGER :: kk, jj, ii
REAL :: arr(kk, jj, ii)
WRITE(*, *) arr(1, 1, 1)
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr
kk = 16
jj = 16
ii = 16
l = 1
arr_p => arr
WRITE(*,*) SIZE(arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
END PROGRAM main
gives same message/warning.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I have seen many cases where slices cause creation of temps in the past the compiler does keep improving in this respect.
I think your usage case is not so clear, your dependencies on l, ii,jj,kk might be the thing that defeats the general case rules the compiler is applying as it needs to unpick those, maybe that is deferred to run-time.
I realise this is a demo/test case program but why us the arr_p pointer at all? And why specify the upper bound of the slice with this design you need some in-code bound checking anyway.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Well, consider the following, further simplified example:
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(n)
WRITE(*, *) arr
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid(n, arr_p((ip):ip+n-1))
END PROGRAM main
Compiler explorer link: https://godbolt.org/z/dMGYa4vxT
This trigger the generation of a temporary.
The funny thing is that it is the parenthesizes of the left hand side of the : that trigger this, the following does not generate a temporary:
CALL stupid(n, arr_p(ip:ip+n-1))
So putting the variable "ip" in a parenthesis like "(ip)" generate a temporary, while "ip" does not. On the right hand side this has no effect on the behavior, i.e. I can add as many patentheises as I wish...
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
This is related but a little off topic. Some years back we collected and documented how array passing methods affects optimization and vectorization. It includes discussions of when temps are created. It is a bit tangental to this thread but it allows insight into the compiler.
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I think the compiler is not clever enough, it looks at the slice defined by variable expressions and just gives up and assumes it is indirect and makes the temp. I guess working sub-optimally is better than risking not working at all. Maybe a better design would be to simplify, to maximise the possibility of avoiding a temp
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(:)
WRITE(*, *) arr(1:n)
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid( n, arr_p(ip:) )
END PROGRAM main
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Until you somehow manage to convince Intel team with your gfortran example to not invoke array temporaries here, you may consider options you can bring to bear to make it easier on the compiler and the users of your code which may primarily be you yourself even? Among others, the following is also one you can think about ..
PROGRAM main
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
REAL, POINTER, CONTIGUOUS :: arr_p_slice(:) !<-- use this for your slice?
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr !<-- perhaps use this object for the whole object reference?
kk = 16
jj = 16
ii = 16
l = 1
arr_p_slice => arr((l-1)*kk*jj*ii+1:l*kk*jj*ii)
WRITE(*,*) SIZE(arr((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p_slice)
END PROGRAM main
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Thanks for the comments, everyone. It's not about making it work, because I already found several ways to trick the compiler into generating code that does what I want without making a temporary (i.e. avoid parenthesizes).
I wrote the post here, because I was puzzled by the fact that a 1-D unit-stride slice of a contiguous rank-1 array, is always guaranteed to be contiguous, and no temporary should never be needed. Please correct me if I'm wrong here...
GFortran seems to get this right, I have not found any situations when it generate a temporary in this case. However, as soon as you make non-unit-strides or slice n-D arrays of higher ranks than 1, then temporaries are generated as required. For the Intel compiler(s), this just seems like a missed opportunity...
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
hakostra1, I think you did an excellent job of identifying an optimization issue.
Side comment:
It appears that the linear (1D) array arr is being partitioned into 3D tiles. This being the case, as long as all code uses the same values for ii, jj, kk at all times during run, then the slicing will be correct. Any change to any of the size values will either require non-unit stride .OR. cannot be described using a non-unit stride (and thus require a temporary).
Jim Dempsey
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
I agree that there is a chance to improve the compiler for this unneeded arg temp creation. I'll open a bug report on this.
I simply combined the 2 call types, shown below. Just to prove to the devs that the temp is only on the 2nd call
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(n)
WRITE(*, *) arr
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid(n, arr_p(ip:ip+n-1))
CALL stupid(n, arr_p((ip):ip+n-1))
END PROGRAM main
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
bug ID CMPLRLLVM-42546
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
this bug is fixed in the 2024.0 release.

- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora