<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: performance of passing allocatable array to pure function in Intel® Fortran Compiler</title>
    <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612536#M172869</link>
    <description>&lt;P&gt;"&lt;SPAN&gt;Note excessive white space and indentation can be extremely harmful&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;Back in the day I used to like a few blank lines because it made code more readable,&amp;nbsp; nowadays with editors that have syntax highlighting etc I hate blank lines, I can see less code and have to scroll more. I find it adds&amp;nbsp; pretty much nothing to readability. I suspect other opinions exist.......&lt;/P&gt;&lt;P&gt;As for indentation that is great but if you get beyond the 4th level maybe the code structure needs some thought. Readability and clarity is very important for support and maintenance of a code.&lt;/P&gt;</description>
    <pubDate>Fri, 05 Jul 2024 12:43:13 GMT</pubDate>
    <dc:creator>andrew_4619</dc:creator>
    <dc:date>2024-07-05T12:43:13Z</dc:date>
    <item>
      <title>performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612028#M172848</link>
      <description>&lt;P&gt;The following code works very slow when the arrays are declared allocatable&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; real , dimension (:,:), allocatable :: p,pp&lt;BR /&gt;&amp;gt; allocate ( p(x,y) , pp(x,y) )&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;allocatable test&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:tttt.exe
-subsystem:console
tttt.obj

&amp;gt;tttt
    Steps done  =          1000
    Steps done  =          2000
    Steps done  =          3000
    Steps done  =          4000
    Steps done  =          5000
    Steps done  =          6000
    Steps done  =          7000
    Steps done  =          8000
    Steps done  =          9000
    Steps done  =         10000
  Code time is     =    89.00400      seconds&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="6"&gt;&lt;STRONG&gt;The code&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;module testing_pure
    implicit none
    
    contains
    
    pure function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
        implicit none
        
        integer , intent ( in )  :: x,y
        real , dimension ( x, y), intent ( in ) :: r
        real , dimension ( x, y )               :: pure_function
        integer ,intent ( in ) :: i, j, pj, mj, pi,mi
        
        
        
        pure_function(i,j) =  r(pi,j) + r(mi,j) + r(i,mj) +  r(i,pj) - 4.0*r(i,j) 
        
        
    end function pure_function
    
    
    
end module testing_pure



program test_pure_function
    use testing_pure
    implicit none
    
    integer , parameter :: x = 100
    integer , parameter :: y = 100 
    integer             :: counter,i,j,pi,mi,pj,mj  
    integer             :: rate, stop_count, start_count
    real                :: computed_time
    
    
    !real, dimension (x,y) :: p,pp    
    real , dimension (:,:), allocatable :: p,pp 
    allocate ( p(x,y) , pp(x,y) )
    
    call random_number (p)
    
    ! ==================
    
    
    
    
    call system_clock (count=start_count , count_rate=rate)
    
    
    do counter = 1, 10000
        
        
        do j = 1, y
            do i =1, x 
                
                
                pj = j + 1
                mj = j - 1
                pi = i + 1
                mi = i - 1
                
                if ( mi == 0 ) mi = x
                if ( pi == ( x + 1 ) ) pi = 1
                if ( mj == 0 ) mj = y
                if ( pj == ( y + 1 ) ) pj = 1
                
                
                
                !some computation
                
                
                pp   = pure_function ( p , i , j, pi, pj, mi, mj, x, y )
                
                
                
            end do
        end do
        

        if ( mod( counter, 1000 ) .eq. 0 ) print *, '   Steps done  =  ', counter
        
        
    end do 
    
    
    call system_clock (count=stop_count)
    
    computed_time = real(max(stop_count - start_count , 1_8 )) /real(rate)
    
    
    
    !=========
    print*, ' Code time is     = ',  computed_time ,' seconds'
    
    
    
    
end program test_pure_function &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;without allocatable&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;gt; real, dimension (x,y) :: p,pp&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;it runs very fast&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:tttt.exe
-subsystem:console
tttt.obj

&amp;gt;tttt
    Steps done  =          1000
    Steps done  =          2000
    Steps done  =          3000
    Steps done  =          4000
    Steps done  =          5000
    Steps done  =          6000
    Steps done  =          7000
    Steps done  =          8000
    Steps done  =          9000
    Steps done  =         10000
  Code time is     =   6.1999999E-02  seconds&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="7" color="#1E2EB8"&gt;&lt;STRONG&gt;with gfortran compiler&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;there is no change in performance i.e.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;with allocatable&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;gfortran tttt.f90 -o t

&amp;gt;t
    Steps done  =          1000
    Steps done  =          2000
    Steps done  =          3000
    Steps done  =          4000
    Steps done  =          5000
    Steps done  =          6000
    Steps done  =          7000
    Steps done  =          8000
    Steps done  =          9000
    Steps done  =         10000
  Code time is     =   0.968999982      seconds&lt;/LI-CODE&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;without allocatable&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;gfortran tttt.f90 -o t

&amp;gt;t
    Steps done  =          1000
    Steps done  =          2000
    Steps done  =          3000
    Steps done  =          4000
    Steps done  =          5000
    Steps done  =          6000
    Steps done  =          7000
    Steps done  =          8000
    Steps done  =          9000
    Steps done  =         10000
  Code time is     =    1.07799995      seconds&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So what makes it slow with allocatable arrays ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 02:14:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612028#M172848</guid>
      <dc:creator>Fortran10</dc:creator>
      <dc:date>2024-07-04T02:14:28Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612161#M172851</link>
      <description>&lt;P&gt;Are the allocates withing the timed loop? Repeat allocate/deallocate will always take some time. The load times may be different but you are not timing that. The saving with allocate is that you can grab the memory you need not your best case guess of max memory that you think might be needed.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 08:20:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612161#M172851</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2024-07-04T08:20:40Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612173#M172852</link>
      <description>&lt;P&gt;The arrays are already allocated before going to the time loop. I do not know if the allocatable attribute remains active for the whole code duration.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Also does that mean intel compiler &lt;STRONG&gt;(ifx)&lt;/STRONG&gt; handles allocatable arrays differently than the &lt;STRONG&gt;gfortran&lt;/STRONG&gt; ? I do not see any performance issues with gfortran. If that is the case how to make a portable code with allocatables ?&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 08:34:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612173#M172852</guid>
      <dc:creator>Fortran10</dc:creator>
      <dc:date>2024-07-04T08:34:20Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612182#M172853</link>
      <description>&lt;P&gt;Do you use heap arrays option? Allocates are always on the heap.&amp;nbsp; Stack is usually faster but you are limited by stack size.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Something clearly goes wrong with the first test case! However I will note that having now looked at you code a smart optimiser could ditch your inner test loop&amp;nbsp; as it does nothing other than set counter to the loop exit value if I have read that correctly.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 08:57:24 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612182#M172853</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2024-07-04T08:57:24Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612198#M172854</link>
      <description>&lt;P&gt;I have a newer version of the previous code.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Modification&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I added pure subroutine and it was called from the time loop too.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="6"&gt;&lt;STRONG&gt;The code (version 2)&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;
!     ifx allocatable array test
!
!
!    version 2 :
!                pure subroutine is used here.


module testing_pure
    implicit none
    
    contains
    
    function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
        implicit none
        
        integer , intent ( in )  :: x,y
        real , dimension ( x, y), intent ( in ) :: r
        real , dimension ( x, y )               :: pure_function
        integer ,intent ( in ) :: i, j, pj, mj, pi,mi
        
        
        
        pure_function(i,j) =  r(pi,j) + r(mi,j) + r(i,mj) +  r(i,pj) - 4.0*r(i,j) 
        
        
    end function pure_function
    
    
    
    pure subroutine pure_test ( rr, r,i, j, pi, pj, mi, mj,x,y )
        implicit none
        
        integer , intent ( in )  :: x,y
        real , dimension ( x, y), intent ( in ) :: r
        real , dimension ( x, y ), intent (out) :: rr
        integer ,intent ( in ) :: i, j, pj, mj, pi,mi
        
        
        rr(i,j) =  r(pi,j) + r(mi,j) + r(i,mj) +  r(i,pj) - 4.0*r(i,j) 
        
        
    end  subroutine pure_test
    
end module testing_pure



program test_pure_function
    use testing_pure
    implicit none
    
    integer , parameter :: x = 100
    integer , parameter :: y = 100 
    integer             :: counter,i,j,pi,mi,pj,mj  
    integer             :: rate, stop_count, start_count
    real                :: computed_time
    
    
  !     real, dimension (x,y) :: p,pp    
    real , dimension (:,:), allocatable :: p,pp 
    allocate ( p(x,y) , pp(x,y) )
    
    call random_number (p)
    
    
    
    ! ==================
    
    
    
    
    call system_clock (count=start_count , count_rate=rate)
    
    
    do counter = 1, 10000
        
        
        do j = 1, y
            do i =1, x 
                
                
                pj = j + 1
                mj = j - 1
                pi = i + 1
                mi = i - 1
                
                if ( mi == 0 ) mi = x
                if ( pi == ( x + 1 ) ) pi = 1
                if ( mj == 0 ) mj = y
                if ( pj == ( y + 1 ) ) pj = 1
                
                
                
                !some computation
                
                
!                pp   = pure_function ( p , i , j, pi, pj, mi, mj, x, y )
                
                call pure_test ( pp, p,i, j, pi, pj, mi, mj,x,y )
                
                
                
            end do
        end do
        
        
        if ( mod( counter, 1000 ) .eq. 0 ) print *, '   Steps done  =  ', counter
        
        
    end do 
    
    
    call system_clock (count=stop_count)
    
    computed_time = real(max(stop_count - start_count , 1_8 )) /real(rate)
    
    
    
    !=========
    print*, ' Code time is     = ',  computed_time ,' seconds'
    
    
    

end program test_pure_function &lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="6"&gt;&lt;STRONG&gt;test 1&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="4"&gt;I run the test as before&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;&amp;gt;ifx tttt1.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.32.31332.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:tttt1.exe
-subsystem:console
tttt1.obj

&amp;gt;tttt1
    Steps done  =          1000
    Steps done  =          2000
    Steps done  =          3000
    Steps done  =          4000
    Steps done  =          5000
    Steps done  =          6000
    Steps done  =          7000
    Steps done  =          8000
    Steps done  =          9000
    Steps done  =         10000
  Code time is     =   0.1090000      seconds&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;It seems like the calling of &lt;STRONG&gt;pure routine&lt;/STRONG&gt; has no performance loss with the same allocatable arrays. So it seems like it has something to do with &lt;STRONG&gt;function&lt;/STRONG&gt;.&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 09:58:52 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612198#M172854</guid>
      <dc:creator>Fortran10</dc:creator>
      <dc:date>2024-07-04T09:58:52Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612229#M172855</link>
      <description>&lt;P&gt;OK I did your second test&amp;nbsp; and indeed the function and subroutine&amp;nbsp; versions&amp;nbsp; have run times that are 1000x different!!&lt;/P&gt;&lt;P&gt;Just to be sure I added a couple of lines at the end (outside the timer) to print a random element of the pp array and that made no difference so there is not some code elimination optimisation anomaly.&amp;nbsp; &amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dear compiler team there is indeed something very BAD happening!&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 11:12:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612229#M172855</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2024-07-04T11:12:48Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612265#M172856</link>
      <description>&lt;P&gt;I used VTune on the allocatable&amp;nbsp; version:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Andrew_Smith_0-1720100658379.png" style="width: 400px;"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/56593i484BD8AF75C3EFCC/image-size/medium/is-moderation-mode/true?v=v2&amp;amp;px=400&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="Andrew_Smith_0-1720100658379.png" alt="Andrew_Smith_0-1720100658379.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Does look like massive amounts of unnecessary memory copying. Its worrying that an allocatable array can cause this much performance loss. I expect many of us will be watching for a resolution.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 13:46:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612265#M172856</guid>
      <dc:creator>Andrew_Smith</dc:creator>
      <dc:date>2024-07-04T13:46:08Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612277#M172857</link>
      <description>&lt;P&gt;hmmm that last post made me look harder at the example. The function is returning a 100x100 real array but only making assignment to a single element within it, that doesn't seem to be a good thing to do. None the less it should be making 10000 assignments on function return. The loc of the array in the&amp;nbsp; function is not the same loc as the pp array in the caller as one would expect.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 15:33:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612277#M172857</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2024-07-04T15:33:48Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612305#M172860</link>
      <description>&lt;P&gt;The pure function coding style, while compiling the function, the compiler has no way of determining if the result of the function is possibly an alias of an input argument. Therefore, it copies the/an array. In the call coding style, the programmer's requirement is to not use argument aliases. I do not know if the Fortran Specification states anything about function output aliasing an input argument and therefore presumes it may be a possibility, thus requiring the copying of the array (use of array temporary).&lt;/P&gt;&lt;P&gt;This is presumption on my part.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note, in your first code example (using function), add ", intent(out)"&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; real , dimension ( x, y ), intent (out) :: pure_function&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;See if this makes a difference.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jul 2024 19:04:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612305#M172860</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2024-07-04T19:04:17Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612358#M172862</link>
      <description>&lt;P&gt;I tried and received &lt;FONT color="#FF6600"&gt;&lt;STRONG&gt;error message&lt;/STRONG&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;intel&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;ifx tttt.f90
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

tttt.f90(21): error #6413: This global name is invalid in this context.   [PURE_FUNCTION]
        real , dimension ( x, y ), intent ( out) :: pure_function
----------------------------------------------------^
compilation aborted for tttt.f90 (code 1)&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;gfortran&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;gfortran tttt.f90
tttt.f90:16:4:

   16 |     function pure_function ( r,i, j, pi, pj, mi, mj,x,y )
      |    1
Error: Symbol at (1) is not a DUMMY variable
tttt.f90:38:9:

   38 |     use testing_pure
      |         1
Fatal Error: Cannot open module file 'testing_pure.mod' for reading at (1): No such file or directory
compilation terminated.&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 00:41:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612358#M172862</guid>
      <dc:creator>Fortran10</dc:creator>
      <dc:date>2024-07-05T00:41:07Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612395#M172863</link>
      <description>&lt;P&gt;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/267609"&gt;@Fortran10&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;I suggest the following:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;If your work has any $$ attached to it (profit, funding, school research, etc.) and it's reliant on Intel Fortran in any way, procure Intel Premier Support and follow up on this case via Intel Online Service Center for detailed engagement on various aspects around this issue.&lt;/LI&gt;&lt;LI&gt;Look into using&amp;nbsp;&lt;A href="https://godbolt.org/" target="_blank" rel="noopener"&gt;Compiler Explorer (godbolt.org)&lt;/A&gt;&amp;nbsp;to evaluate using various compilers and to analyze compiler response by studying the generated Assembly output,&lt;/LI&gt;&lt;LI&gt;Note excessive white space and indentation can be extremely harmful, especially to any colleagues who have to work on your code,&lt;/LI&gt;&lt;LI&gt;Note much of the Fortran compilers now rely on C/C++ - frontend, lowering to backend, etc. - and compiler developers tend to be proficient in these other languages but perhaps not as much as in Fortran.&amp;nbsp; Yet the Fortran standard is somewhat weak when it comes for FUNCTION subprograms, specifically around the subject of &lt;STRONG&gt;&lt;A href="https://en.wikipedia.org/wiki/Copy_elision" target="_self"&gt;Copy Elision&lt;/A&gt;&lt;/STRONG&gt;&amp;nbsp;and &lt;STRONG&gt;&lt;A href="https://en.wikipedia.org/wiki/Copy_elision#Return_value_optimization" target="_self"&gt;Return Value Optimization&lt;/A&gt;&lt;/STRONG&gt; when it comes to function results.&amp;nbsp; This can have a huge impact with function results that involve a lot of data.&amp;nbsp; This can be critical in poor performance with Fortran compilers, especially due to special semantics in Fortran (RHS evaluation, allocation on assignment, etc.) that can come across as quite "foreign" to Fortran compiler writers and where the standard offers no guidance that can help with performance.&amp;nbsp; Suboptimal copying can come into play.&lt;/LI&gt;&lt;LI&gt;Given #4, consider using SUBROUTINE subprograms when objects with a lot of data are involved, meaning be careful with the data copy burden ,&lt;/LI&gt;&lt;LI&gt;Be careful with instrumentation of unit tests in Fortran to do "profiling", avoid the risk of the compiler optimizing away everything, and avoid measuring any IO.&amp;nbsp; This can all prove misleading.&amp;nbsp; Note a really smart compiler would have optimized away your entire code and not&amp;nbsp;done any computations and shown a time of 0 seconds in all cases because it would have recognized the computations do not affect any subsequent code instructions.&amp;nbsp; In the variant below, a subsequent PRINT statement toward pp(42,43) can help prevent a processor from doing so.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;In the meantime, consider reviewing a more simple-minded variant of your code and whether there are any ideas in it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="fortran"&gt;module m

contains
    
   pure function update( r_pi_j, r_mi_j, r_i_mj, r_i_pj, r_i_j ) result( r )

      ! Argument list
      real, intent(in) :: r_pi_j
      real, intent(in) :: r_mi_j
      real, intent(in) :: r_i_mj
      real, intent(in) :: r_i_pj
      real, intent(in) :: r_i_j
      ! Function result
      real :: r

      r =  r_pi_j + r_mi_j + r_i_mj +  r_i_pj - 4.0*r_i_j 
       
   end function

end module

program test

   use, intrinsic :: iso_fortran_env, only : I8 =&amp;gt; int64
   use m
    
   integer, parameter :: R8 = selected_real_kind( p=12 )
   integer , parameter :: x = 100
   integer , parameter :: y = 100 
   real(R8) :: t1, t2, t
   integer :: counter, i, j, pi, mi, pj, mj  
    
   real , dimension (:,:), allocatable :: p, pp
   allocate ( p(x,y) , pp(x,y) )
    
   ! ==================
   t = 0.0
   do counter = 1, 10000
        
      call random_number( p )
      call cpu_t( t1 ) 

      do j = 1, y
         do i = 1, x 
                
            pj = j + 1
            mj = j - 1
            pi = i + 1
            mi = i - 1
            
            if ( mi == 0 ) mi = x
            if ( pi == ( x + 1 ) ) pi = 1
            if ( mj == 0 ) mj = y
            if ( pj == ( y + 1 ) ) pj = 1
            
            !some computation
            pp(i,j) = update( p(pi,j), p(mi,j), p(i,mj), p(i,pj), p(i,i) )
                
            end do
        end do
        call cpu_t( t2 ) 
        t = t + (t2 - t1)

        if ( mod( counter, 1000 ) == 0 ) print *, '   Steps done  =  ', counter, '; pp(42,43): ', pp(42,43)
        
    end do 
    
   ! ==================
   print *, ' Code time is     = ', t ,' seconds'

contains

   subroutine cpu_t( time )

      ! Argument list
      real(R8), intent(inout) :: time

      ! Local variables
      integer(I8) :: tick
      integer(I8) :: rate

      call system_clock (tick, rate)

      time = real(tick, kind=kind(time) ) / real(rate, kind=kind(time) )

      return

   end subroutine
   
end program &lt;/LI-CODE&gt;&lt;LI-CODE lang="fortran"&gt;C:\temp&amp;gt;ifx /free /standard-semantics p.f
Intel(R) Fortran Compiler for applications running on Intel(R) 64, Version 2024.2.0 Build 20240602
Copyright (C) 1985-2024 Intel Corporation. All rights reserved.

Microsoft (R) Incremental Linker Version 14.36.32537.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp&amp;gt;p.exe
    Steps done  =   1000 ; pp(42,43):  -0.3779147
    Steps done  =   2000 ; pp(42,43):  1.6536832E-02
    Steps done  =   3000 ; pp(42,43):  1.551260
    Steps done  =   4000 ; pp(42,43):  0.2452765
    Steps done  =   5000 ; pp(42,43):  0.6278389
    Steps done  =   6000 ; pp(42,43):  -1.001162
    Steps done  =   7000 ; pp(42,43):  1.457952
    Steps done  =   8000 ; pp(42,43):  -0.3233254
    Steps done  =   9000 ; pp(42,43):  -0.7149973
    Steps done  =   10000 ; pp(42,43):  1.124510
  Code time is     =  8.800101280212402E-002  seconds

C:\temp&amp;gt;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 03:47:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612395#M172863</guid>
      <dc:creator>FortranFan</dc:creator>
      <dc:date>2024-07-05T03:47:32Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612536#M172869</link>
      <description>&lt;P&gt;"&lt;SPAN&gt;Note excessive white space and indentation can be extremely harmful&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;"&lt;/P&gt;&lt;P&gt;Back in the day I used to like a few blank lines because it made code more readable,&amp;nbsp; nowadays with editors that have syntax highlighting etc I hate blank lines, I can see less code and have to scroll more. I find it adds&amp;nbsp; pretty much nothing to readability. I suspect other opinions exist.......&lt;/P&gt;&lt;P&gt;As for indentation that is great but if you get beyond the 4th level maybe the code structure needs some thought. Readability and clarity is very important for support and maintenance of a code.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 12:43:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612536#M172869</guid>
      <dc:creator>andrew_4619</dc:creator>
      <dc:date>2024-07-05T12:43:13Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612816#M172885</link>
      <description>&lt;OL&gt;&lt;LI&gt;There is no $$ associated at this moment. Of course, that would be the best situation if there is!&lt;/LI&gt;&lt;LI&gt;Thanks for sharing the link. This looks interesting, never heard of it before.&lt;/LI&gt;&lt;LI&gt;This depends on the editor I think. Some editors show the space quite narrow, some fonts make it very widely spaced. Also with this markup, the preview is different, I feel. So it is still an evolving design architecture for me. Can't decide which way I should keep writing my codes.&lt;/LI&gt;&lt;LI&gt;I can't say much about it. It is something I have no knowledge about.&lt;/LI&gt;&lt;LI&gt;This is more relevant since Fortran is supposed to translate mathematical Formula. Hence quite natural to evaluate a mathematical function. The intuitive approach is to use Function subprograms. However, my experience (less than five years) is showing using function is not always a right way to solve problem. It may be solvable but at the expense of performance ( another concern and perhaps the main in some situations).&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;This I/O insight is nice.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;The code variant is useful. This is very tricky when dealing with whole array operations, scalar - array or with index array (my original code). Working with mixed-arrays (whole array and index array) requires some good experience otherwise I feel it is hard to debug.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2024 01:24:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612816#M172885</guid>
      <dc:creator>Fortran10</dc:creator>
      <dc:date>2024-07-08T01:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: performance of passing allocatable array to pure function</title>
      <link>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612999#M172890</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/267609"&gt;@Fortran10&lt;/a&gt;&amp;nbsp;wrote:&lt;P&gt;..&lt;/P&gt;&lt;P&gt;The code variant is useful. This is very tricky when dealing with whole array operations, scalar - array or with index array (my original code). Working with mixed-arrays (whole array and index array) requires some good experience otherwise I feel it is hard to debug.&amp;nbsp;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.intel.com/t5/user/viewprofilepage/user-id/267609"&gt;@Fortran10&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;First, in case you have not done so, look also into ELEMENTAL subclause of Fortran subprograms and the facilities provided by this:&lt;/P&gt;&lt;P&gt;&lt;A href="https://fortran-lang.org/learn/best_practices/element_operations/" target="_blank" rel="noopener"&gt;Element-wise Operations on Arrays — Fortran Programming Language (fortran-lang.org)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;I suggest you follow up further at this site, Fortran-lang.org:&lt;/P&gt;&lt;P&gt;&lt;A href="https://fortran-lang.discourse.group/" target="_blank" rel="noopener"&gt;Fortran Discourse - Fortran open source community (fortran-lang.discourse.group)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://fortran-lang.org/learn/" target="_blank" rel="noopener"&gt;Learn — Fortran Programming Language (fortran-lang.org)&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Re: "This is very tricky when dealing with whole array operations, scalar - array or with index array (my original code). Working with mixed-arrays (whole array and index array) requires some good experience otherwise I feel it is hard to debug," you will find good guidance from many other Fortran practitioners at the above two links.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 08 Jul 2024 12:57:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-Fortran-Compiler/performance-of-passing-allocatable-array-to-pure-function/m-p/1612999#M172890</guid>
      <dc:creator>FortranFan</dc:creator>
      <dc:date>2024-07-08T12:57:33Z</dc:date>
    </item>
  </channel>
</rss>

