- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I discovered the following behavior in ifort and ifx which I found strange. To me it seems like a missed optimization opportunity. The behavior is identical in ifort and ifx.
Given the following example:
SUBROUTINE stupid(kk, jj, ii, arr)
IMPLICIT NONE
INTEGER :: kk, jj, ii
REAL :: arr(kk, jj, ii)
WRITE(*, *) arr(1, 1, 1)
END SUBROUTINE stupid
PROGRAM main
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr
kk = 16
jj = 16
ii = 16
l = 1
arr_p => arr
WRITE(*,*) SIZE(arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
END PROGRAM main
With both recent 'ifx' and 'ifort' (on Linux), I get the following when I compile with '-check all':
forrtl: warning (406): fort: (1): In call to STUPID, an array temporary was created for argument #4
Image PC Routine Line Source
output.s 000000000040591C main 32 example.f90
output.s 000000000040515D Unknown Unknown Unknown
libc-2.31.so 00007FF451E6F083 __libc_start_main Unknown Unknown
output.s 000000000040507E Unknown Unknown Unknown
So it seems like an array temporary is generated. This is strictly speaking not necessary, since the slice is perfectly contiguous, it is unit stride. GFortran manage to compile this without generating any temporary array.
If the calling of the 'stupid' routine is just slightly simplified it seems to be fine without any temporaries:
ip = (l-1)*kk*jj*ii+1
CALL stupid(kk, jj, ii, arr_p(ip:l*kk*jj*ii))
I have experimented a bit with variations, and to me it seems to depend on what is in the start of the slice (i.e. the part before the colon : ). If the start of the slice selection is just a plain variable (like 'ip') no temporary is generated. If there is just a simple addition, like 'ip + 1' before : it also works. Thirdly, multiplication works... However, if there is a 'complicated' expression with parenthesizes like (l-1) in there, the temporary is generated. However, that logic is not the same on the part after the :, there it seems you can have complicated expressions with () without that influencing if the compiler generate a temporary or not.
Foe example, the following does not seem to generate a temporary:
l = 0
CALL stupid(kk, jj, ii, arr_p(l*kk*jj*ii+1:(l+1)*kk*jj*ii))
Where the only difference from the original example is that there are no () before :, but this time this is after the :, but that works fine.
See the compiler explorer with a side-by-side GFortran and Intel comparison: https://godbolt.org/z/rv5c7WKo5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I compiled that in windows and didn't get that warning for a temp (Intel® Fortran Compiler Classic 2021.7.0 [Intel(R) 64]) but I did get:
warning #8889: Explicit interface or EXTERNAL declaration is required. [STUPID]
If there is an explicit interface (ie the compiler knows more about STUPID) does the temp issue go away?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, putting it in a module does not help:
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(kk, jj, ii, arr)
IMPLICIT NONE
INTEGER :: kk, jj, ii
REAL :: arr(kk, jj, ii)
WRITE(*, *) arr(1, 1, 1)
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr
kk = 16
jj = 16
ii = 16
l = 1
arr_p => arr
WRITE(*,*) SIZE(arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p((l-1)*kk*jj*ii+1:l*kk*jj*ii))
END PROGRAM main
gives same message/warning.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have seen many cases where slices cause creation of temps in the past the compiler does keep improving in this respect.
I think your usage case is not so clear, your dependencies on l, ii,jj,kk might be the thing that defeats the general case rules the compiler is applying as it needs to unpick those, maybe that is deferred to run-time.
I realise this is a demo/test case program but why us the arr_p pointer at all? And why specify the upper bound of the slice with this design you need some in-code bound checking anyway.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, consider the following, further simplified example:
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(n)
WRITE(*, *) arr
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid(n, arr_p((ip):ip+n-1))
END PROGRAM main
Compiler explorer link: https://godbolt.org/z/dMGYa4vxT
This trigger the generation of a temporary.
The funny thing is that it is the parenthesizes of the left hand side of the : that trigger this, the following does not generate a temporary:
CALL stupid(n, arr_p(ip:ip+n-1))
So putting the variable "ip" in a parenthesis like "(ip)" generate a temporary, while "ip" does not. On the right hand side this has no effect on the behavior, i.e. I can add as many patentheises as I wish...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is related but a little off topic. Some years back we collected and documented how array passing methods affects optimization and vectorization. It includes discussions of when temps are created. It is a bit tangental to this thread but it allows insight into the compiler.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the compiler is not clever enough, it looks at the slice defined by variable expressions and just gives up and assumes it is indirect and makes the temp. I guess working sub-optimally is better than risking not working at all. Maybe a better design would be to simplify, to maximise the possibility of avoiding a temp
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(:)
WRITE(*, *) arr(1:n)
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid( n, arr_p(ip:) )
END PROGRAM main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Until you somehow manage to convince Intel team with your gfortran example to not invoke array temporaries here, you may consider options you can bring to bear to make it easier on the compiler and the users of your code which may primarily be you yourself even? Among others, the following is also one you can think about ..
PROGRAM main
IMPLICIT NONE
REAL, TARGET :: arr(32*32*32)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
REAL, POINTER, CONTIGUOUS :: arr_p_slice(:) !<-- use this for your slice?
INTEGER :: l, kk, jj, ii
arr = 0
arr_p => arr !<-- perhaps use this object for the whole object reference?
kk = 16
jj = 16
ii = 16
l = 1
arr_p_slice => arr((l-1)*kk*jj*ii+1:l*kk*jj*ii)
WRITE(*,*) SIZE(arr((l-1)*kk*jj*ii+1:l*kk*jj*ii))
WRITE(*,*) kk*jj*ii
CALL stupid(kk, jj, ii, arr_p_slice)
END PROGRAM main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the comments, everyone. It's not about making it work, because I already found several ways to trick the compiler into generating code that does what I want without making a temporary (i.e. avoid parenthesizes).
I wrote the post here, because I was puzzled by the fact that a 1-D unit-stride slice of a contiguous rank-1 array, is always guaranteed to be contiguous, and no temporary should never be needed. Please correct me if I'm wrong here...
GFortran seems to get this right, I have not found any situations when it generate a temporary in this case. However, as soon as you make non-unit-strides or slice n-D arrays of higher ranks than 1, then temporaries are generated as required. For the Intel compiler(s), this just seems like a missed opportunity...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
hakostra1, I think you did an excellent job of identifying an optimization issue.
Side comment:
It appears that the linear (1D) array arr is being partitioned into 3D tiles. This being the case, as long as all code uses the same values for ii, jj, kk at all times during run, then the slicing will be correct. Any change to any of the size values will either require non-unit stride .OR. cannot be described using a non-unit stride (and thus require a temporary).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree that there is a chance to improve the compiler for this unneeded arg temp creation. I'll open a bug report on this.
I simply combined the 2 call types, shown below. Just to prove to the devs that the temp is only on the 2nd call
MODULE stupid_mod
IMPLICIT NONE
CONTAINS
SUBROUTINE stupid(n, arr)
INTEGER :: n
REAL :: arr(n)
WRITE(*, *) arr
END SUBROUTINE stupid
END MODULE stupid_mod
PROGRAM main
USE stupid_mod
IMPLICIT NONE
REAL, TARGET :: arr(1000)
REAL, POINTER, CONTIGUOUS :: arr_p(:)
INTEGER :: ip, n
arr = 0
arr_p => arr
ip = 1
n = 10
CALL stupid(n, arr_p(ip:ip+n-1))
CALL stupid(n, arr_p((ip):ip+n-1))
END PROGRAM main
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bug ID CMPLRLLVM-42546
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
this bug is fixed in the 2024.0 release.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page