Efficiency of derived type data and type bound procedures

An_N_1 · ‎10-27-2015

I haved posted a topic about questons of "oop and efficiency", and get a lot of valuable advices!

I did some test, and some results i am not very sure.

So, i paste brief test code here, thanks.

case1-4 does same calculation in loop with different way, and get a quite different time cost.

compiled by ivf with default release optimization options.

run the case one by one, that means while run case1, code of case2-4 are remmed.

approximate average time costs in my laptop:

case1: 0.72s

case2: 0.41s

case3: 0.57s

case4: 0.72s

i think the difference should impute to vecterization, my question is:

1) It seems that when using derived type the loop is not vecterized, why? If so, OOP will lost much efficiency right?

2) why case3 cost less time than case1 and case 4?

module mdl1
    type :: typ1
        
        real(8) :: m, mm
    contains
        procedure, pass :: p
        procedure, nopass :: pp
    end type
    
    type(typ1), allocatable :: t1(:)
    contains
    
    subroutine p(t1)
        class(typ1) :: t1
        t1%m  = t1%m  * t1%m + t1%m * 4
        t1%mm = t1%mm ** 5
        t1%mm = t1%mm * t1%mm
        t1%m  = t1%m  + t1%mm
    end subroutine
    
    subroutine pp(i)
    integer :: i
    
        t1(i)%m  = t1(i)%m  * t1(i)%m + t1(i)%m * 4
        t1(i)%mm = t1(i)%mm ** 5
        t1(i)%mm = t1(i)%mm * t1(i)%mm
        t1(i)%m  = t1(i)%m  + t1(i)%mm
        
    end subroutine    
end module
    
program console1
use  mdl1
integer :: i, j
    integer, parameter :: N = 50000000, K= 100
    real :: time1, time2, time(K)
    real(8) :: m(N), mm(N)
    
    allocate (t1(N))
    time3 = 0

    
    do j = 1, K
    
    do i=1, N
        m(i) = j 
        mm(i) =j
        t1(i)%m = j
        t1(i)%mm = j
    end do
    
    call CPU_TIME(time1)
    
    !case 1
    do i=1, N               !0.7193207    
        t1(i)%m  = t1(i)%m  * t1(i)%m + t1(i)%m * 4
        t1(i)%mm = t1(i)%mm ** 5
        t1(i)%mm = t1(i)%mm * t1(i)%mm
        t1(i)%m  = t1(i)%m  + t1(i)%mm
    end do    
    
    !case 2
    do i=1, N             !0.4073185s  
        m(i)  = m(i)  * m (i)+ m(i) * 4
        mm(i) = mm(i) ** 5
        mm(i) = mm(i) * mm(i)
        m(i)  = m(i)  + mm(i)
    end do

    !case 3
    do i=1, N       !  0.5687795    
        call t1(i) % p
    end do
    
    !case 4
    do i=1, N       !  0.7176050s   
        call t1 % pp (i)
    end do

    call CPU_TIME(time2) 
        
        time(j) = time2-time1
        print*,  m(1), mm(10000), t1(10000)%m, t1(1)%mm
        write(10,*) time(j)
        
    end do
    
        write(10,*) sum(time)/size(time)
        
end program

TimP · ‎10-28-2015

In your example, with derived type, you are processing data with stride 2, so there may be twice as much data moved as in your comparison stride 1 case. How efficiently this may be done may depend on your /arch: selection. In case the more difficult case can be vectorized, the opt-report should help show the differences.

An_N_1 · ‎10-28-2015

I have noticed optimization report, type procedures are all not vectorized, it says 'vectorization is possible but seems inefficient'.

I also tried to add directives like 'vector always, forceinline' to loops and type contains subs.

the time cost order is the same, and 'vector always' decreases time cost, but want i want to know is that:

why when calling type procedures, vectorization is not performed by default. And if the calculation is intensive, will it slow down the program much? I donnot think adding 'vector always' manually is a good manner.