I've found an odd performance regression in ifx/ifort output, when a subroutine passes a contiguous array slice to a procedure argument, when the latter argument also defines the array parameter as contiguous.
Minimal complete example (with just enough nontrivial work to make sure the code is all executed as intended):
program qux implicit none integer, allocatable, dimension(:,:) :: foo integer ii, cc allocate(foo(100,100)) cc = 0 do ii=1,10000000 call test_a(baz) end do write(0,'(I0)') foo(1,1) contains subroutine baz(bar) implicit none integer, dimension(:), intent(inout), contiguous :: bar cc = cc+1 bar(1+mod(cc,100)) = cc end subroutine subroutine test_a(in_proc) implicit none interface subroutine in_proc(bar) implicit none integer, dimension(:), intent(inout), contiguous :: bar end subroutine end interface call in_proc(foo(1:,1)) end subroutine end program
When run on a Xeon Platinum 8380:
$ ifx --version ifx (IFORT) 2023.0.0 20221201 Copyright (C) 1985-2022 Intel Corporation. All rights reserved. $ ifx -O3 foo.F90 && time ./a.out 10000000 real 0m3.880s user 0m3.864s sys 0m0.003s
But after replacing in_proc(foo(1:,:)) with in_proc(foo(:,1)) at line 27:
$ ifx -O3 foo.F90 && time ./a.out 10000000 real 0m0.017s user 0m0.014s sys 0m0.003s
Or keeping in_proc(foo(1:,:)) but dropping the contiguous at line 24:
$ ifx -O3 foo.F90 && time ./a.out 10000000 real 0m0.019s user 0m0.015s sys 0m0.004s
A godbolt comparison of cases #1 and #2 shows that #1 does a great deal more work inside test_a, including several tests and a loop.
This behaviour was originally discovered when -qopt-report=5 on ifort 2021.8.0 surprisingly said "memcopy generated" for the in-an-actual-code predecessor of this sample. The generated memcopy seems to be related to the array descriptor rather than the array itself (creating a temporary copy); the running times do not change if the allocation of foo is enlarged.
Here is another issue with the same topic (compiler generate temporaries for contiguous slices):
If you compile your code with "-check arg_temp_created" or "-check all" you will get a message that tells you when a temporary is created. In that way it is easier to discover than waiting for performance bottlenecks to appear.
@Barbara_P_Intel , it would seem you have read my book, "The Wisdom of dealing with Engineers", published by Bantam Press in 1675, rule number 1313, always allow the engineer to decide, otherwise they cry foul to their mothers.
I had this last week when a contractor told me he never made engineering decisions that was for the engineer, I then asked him if we could do it that way and he said no. I laughed. Actually a true story.
Engineers are like two year old's, you cannot say anything negative or tell them how it is done, they know already, they were taught by the great professors and they will follow that bible till a new bible comes along.
When once asked what I did, I laughed and said, write new bibles.