Solved: how to get the fast operation for derived type. a comparison test show optimization do nothing - Page 2

Li_L_ · ‎04-08-2017

module constants
    integer,parameter:: ip = 4
    integer,parameter:: rp = 8
end module constants

module vector_
use constants
implicit none

    private
    public:: vector
    public:: operator(+),operator(*)

    type::  vector
        real(rp),allocatable,dimension(:):: vc
    contains
        generic::           init    =>  init_ar
        procedure,private:: init_ar    
    end type vector

!---------------------------------------------
    interface operator(+)
        procedure::  vplus
    end interface

    interface operator(*)
        procedure::  svproduct
        procedure::  vsproduct
    end interface
    
!-----
contains

    !&&
    pure subroutine init_ar(this,ar)
    class(vector),intent(out)::         this
    real(rp),dimension(:),intent(in)::  ar
        this%vc = ar
    end subroutine init_ar
    
!----operator
    elemental type(vector) function vplus(lhs,rhs) result(vvp)
    type(vector),intent(in)::   lhs,rhs
        vvp%vc = lhs%vc + rhs%vc
    end function vplus
    
    !--
    elemental function svproduct(lhs,rhs) result(vr)
    real(rp),intent(in)::               lhs
    type(vector),intent(in)::           rhs
    type(vector)::                      vr
        vr%vc = lhs * rhs%vc
    end function svproduct
    
    !--
    elemental function vsproduct(lhs,rhs) result(vr)
    type(vector),intent(in)::           lhs
    real(rp),intent(in)::               rhs
    type(vector)::                      vr
        vr%vc = rhs * lhs%vc
    end function vsproduct
    
end module vector_

program test
use constants
use vector_
implicit none
integer(ip)::   i,n,j
real(rp)::      t1,t2,t3,t4,t5,t6,t7
type(vector)::  p1
real(rp),dimension(:),allocatable:: p2
real(rp),dimension(100):: p3


    p3 = 1.0001d0
    p2 = p3
    call p1%init(p2)
    
    n = 1e7
    
    !1
    call CPU_TIME(t1)   !2.375
    do i=1,n
        p1 = p1 + 2.d0 * p1
    enddo
    
    !2
    call CPU_TIME(t2)   !0.297
    do i=1,n
        p2 = p2 + 2.d0 * p2
    enddo
    
    !3
    call CPU_TIME(t3)   !2.5
    do i=1,n
        call op(p2)
    enddo
    
    !4
    call CPU_TIME(t4)   !2.531
    do i=1,n
        p3 = p3 + 2.d0 * p3
    enddo
    
    !5
    call CPU_TIME(t5)   !2.515
    do i=1,n
        do j=1,100
            p3(j) = p3(j) + 2.d0 * p3(j)
        enddo
    enddo
    
    !6
    call CPU_TIME(t6)   !0.234
    do i=1,n
        call op(p3)
    enddo
    
    call CPU_TIME(t7)
    
    print*, '1',t2 - t1
    print*, '2',t3 - t2
    print*, '3',t4 - t3
    print*, '4',t5 - t4
    print*, '5',t6 - t5
    print*, '6',t7 - t6

contains

    pure subroutine op(s)
    real(rp),dimension(:),intent(inout):: s
        s = s + 2.d0 * s
    end subroutine op
    
end program test

here i test the operation of arraies, and three kinds of array are chosen

1. the derived type vector which is actually an array

2. the allocatable array with undetermined size

3. the array with determined size

and then 6 kinds of procedures are tested, which are all dealing with (s = s + 2.d0 * s )

then i find the difference time cost for each procedures

for O2, we get the time cost: proc1(2.375s), proc2(0.297s), proc3(2.5s), proc4(2.531s), proc5(2.515), proc6(0.234)

for O3, proc1, proc2, proc6 unchanged time cost, and proc3, proc4, proc5 decrease to 1.25s around

so i have a question: is it possible to get the speed as the proc6 for derived type with overriding operation?

how to do it?

Steve_Lionel · ‎06-23-2017

I have no idea, other than to make two observations:

Each time a new, significant language feature was added, it took time for compilers to learn how to optimize them well. Consider array operations vs. DO loops.
Any time you defer information to run-time, you lose performance. KIND type parameters are fine - those are always compile-time. But LEN parameters have been nothing but trouble for compiler implementors.

My advice would be to file a report with Intel and ask that the performance degradation be investigated. Maybe it's something simple, but don't get your hopes up too much.

View solution in original post

FortranFan · ‎07-19-2017

Steve Lionel (Ret.) wrote:

..Any time you defer information to run-time, you lose performance. KIND type parameters are fine - those are always compile-time. But LEN parameters have been nothing but trouble for compiler implementors. ..

Dear Intel Fortran team,

As an ordinary user of Fortran working in the industry and who is a customer looking to convince our management to renew team licenses for Intel Parallel Studio, please, please take note:

Parameterized derived types (PDTs) are extremely important and valuable feature for us in the Fortran standard, in both their incarnations i.e., with KIND type parameters as well as LEN type parameters. Over two and a half years ago, I had communicated on this forum what we think are the benefits: see Quote #8 in this thread https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/542412 .
As a customer, can I expect Intel to represent our interests too at all the Fortran standard committee discussions? Can we expect representatives from the Intel Fortran team at the standard body meetings to remain plugged into discussions at the Intel Forums on Fortran, at least at some reasonable frequency, and be generally aware of the feedback and the requests that customers of Intel, the actual users of Intel Fortran are trying to convey here and take those up at Fortran standards meetings.
Also, is it possible for the Intel representative to feedback to the Fortran standards committee on the value of PDTs as described at the above thread? It does not seem to have happened back in March 2015, for some on the committee are still severely understating or dismissing entirely the value of PDTs, particularly the LEN type aspect, with no regard for the opinions of actual practitioners of Fortran, as evident on standard document sites such as this one: http://www.nag.com/sc22wg5/docs.html - see document N2126. Can we count on Intel Fortran team to rectify the situation and redress the incorrect and premature opinions of some on the standards committee on PDTs?
While it is fully understandable that "Any time you defer information to run-time, you lose performance" and Intel Fortran customers are largely appreciative and accepting of this, it is entirely unacceptable that two very similar types, both with information deferred to run-time, but one with a type component that has the ALLOCATABLE attribute whereas the other has an equivalent component but with a LEN type parameter, should have performance characteristics that are so different with the PDT one being so much poorer than the other. That is a serious issue which has been evident in this thread as well as during other communications. It is not just a bug, it is akin to a serious design flaw that a product manager of Intel Fortran with any sense of PRODUCT STEWARDSHIP should follow up on their own and take the matter up internally with the entire team and work toward product improvement. Is it possible for us customers to expect this of Intel?
During the course of our investigation of PDT capabilities in Intel Fortran, I had submitted several incidents such as the one in thread: https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/594441. Will it be possible for someone at Intel Fortran to provide a summary of the status of open incidents involving PDTs?

Thank you very much,

LRaim · ‎07-20-2017

I am a daily fortran user working in the industry, developing commercial software for chemical engineering.
When dynamic allocations were not available I created this fortran feature by means of assembler routines.
I am also using C++ for windows interface.
It is very difficult for me to understand the 'extreme importance' of PDT (parameterized derived type) in fortran and I have never used such a feature.

Regards

Li_L_ · ‎07-20-2017

Luigi R. wrote:

I am a daily fortran user working in the industry, developing commercial software for chemical engineering.
When dynamic allocations were not available I created this fortran feature by means of assembler routines.
I am also using C++ for windows interface.
It is very difficult for me to understand the 'extreme importance' of PDT (parameterized derived type) in fortran and I have never used such a feature.

Regards

i think if PDT can offer a fast way to operate derived type, it's important

now, i just practice writing the type of polynomial, a derived type of array, and a very fundamental unit in functional space.

i want to encapsulate this concept perfect so that i won't transfer an array to polynomial in mind when i do projection in space.

actually i can do it, but speed slows down.

you say you can do it in a bottom way. i think this thought is against the development of computer language. modern Fortran should embrace the good encapsulation, if don't losing speed.

DataScientist · ‎11-04-2021

Any updates on the performance of PDT? I just checked some of the benchmarks posted here and nothing seems to have changed in Intel OneAPI 2021. Thanks.