Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28661 Discussions

Bad optimization when passing array slice to procedure argument w/ contiguous parameter

csubich
Novice
1,576 Views

Greetings,

 

I've found an odd performance regression in ifx/ifort output, when a subroutine passes a contiguous array slice to a procedure argument, when the latter argument also defines the array parameter as contiguous.

 

Minimal complete example (with just enough nontrivial work to make sure the code is all executed as intended):

 

program qux
    implicit none
    integer, allocatable, dimension(:,:) :: foo
    integer ii, cc

    allocate(foo(100,100))
    cc = 0
    do ii=1,10000000
      call test_a(baz)
    end do
    write(0,'(I0)') foo(1,1)
contains
    subroutine baz(bar)
        implicit none
        integer, dimension(:), intent(inout), contiguous :: bar
        cc = cc+1
        bar(1+mod(cc,100)) = cc
    end subroutine
    subroutine test_a(in_proc)
        implicit none
        interface
            subroutine in_proc(bar)
                implicit none
                integer, dimension(:), intent(inout), contiguous :: bar
            end subroutine
        end interface
        call in_proc(foo(1:,1))
    end subroutine
end program

 

When run on a Xeon Platinum 8380:

 

$ ifx --version
ifx (IFORT) 2023.0.0 20221201
Copyright (C) 1985-2022 Intel Corporation. All rights reserved.

$ ifx -O3 foo.F90 && time ./a.out
10000000

real    0m3.880s
user    0m3.864s
sys     0m0.003s

 

But after replacing in_proc(foo(1:,:)) with in_proc(foo(:,1)) at line 27:

$ ifx -O3 foo.F90 && time ./a.out
10000000

real    0m0.017s
user    0m0.014s
sys     0m0.003s

Or keeping in_proc(foo(1:,:)) but dropping the contiguous at line 24:

$ ifx -O3 foo.F90 && time ./a.out
10000000

real    0m0.019s
user    0m0.015s
sys     0m0.004s

godbolt comparison of cases #1 and #2 shows that #1 does a great deal more work inside test_a, including several tests and a loop.  

 

This behaviour was originally discovered when -qopt-report=5 on ifort 2021.8.0 surprisingly said "memcopy generated" for the in-an-actual-code predecessor of this sample.  The generated memcopy seems to be related to the array descriptor rather than the array itself (creating a temporary copy); the running times do not change if the allocation of foo is enlarged.

Labels (2)
1 Solution
Barbara_P_Intel
Employee
1,503 Views

@csubich, thank you for reporting this performance issue. I like this small reproducer. I filed a bug report, CMPLRLLVM-48930.

@hakostra1, thanks for pointing out the similar issue. I let the compiler engineers know that the 2 bug reports might be related. I'll let them decide.



View solution in original post

0 Kudos
5 Replies
hakostra1
New Contributor II
1,522 Views

Here is another issue with the same topic (compiler generate temporaries for contiguous slices):

https://community.intel.com/t5/Intel-Fortran-Compiler/Array-temporary-generated-for-contiguous-slice/m-p/1436454#M163914

 

If you compile your code with "-check arg_temp_created" or "-check all" you will get a message that tells you when a temporary is created. In that way it is easier to discover than waiting for performance bottlenecks to appear.

Barbara_P_Intel
Employee
1,503 Views

I tried "-check arg_temp_created" or "-check all" with the reproducer from @csubich. No messages.

 

0 Kudos
Barbara_P_Intel
Employee
1,504 Views

@csubich, thank you for reporting this performance issue. I like this small reproducer. I filed a bug report, CMPLRLLVM-48930.

@hakostra1, thanks for pointing out the similar issue. I let the compiler engineers know that the 2 bug reports might be related. I'll let them decide.



0 Kudos
JohnNichols
Valued Contributor III
1,484 Views

@Barbara_P_Intel , it would seem you have read my book, "The Wisdom of dealing with Engineers", published by Bantam Press in 1675, rule number 1313, always allow the engineer to decide, otherwise they cry foul to their mothers.  

I had this last week when a contractor told me he never made engineering decisions that was for the engineer, I then asked him if we could do it that way and he said no.  I laughed.  Actually a true story.  

Engineers are like two year old's, you cannot say anything negative or tell them how it is done, they know already, they were taught by the great professors and they will follow that bible till a new bible comes along. 

When once asked what I did, I laughed and said, write new bibles.  

John

Barbara_P_Intel
Employee
758 Views

I just checked the performance with an internal build of the next ifx compiler version 2024.2. The performance is MUCH improved.

Look for this release in mid-2024.



0 Kudos
Reply