Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

compare two type of matmul function

alsoran
Beginner
718 Views
The form of matmul like matmul(a(i).b,c(j).d) or matmul(a(i)%b,c(j)%d) have the extremely low efficiency.
but the overloaded matmul function, I customize mt(a(i).b,c(j).d) , have the much higher efficiency than origin matmul funcion.
[fortran]!################################################### PROGRAM matmul_in_derive implicit none integer:: N=1000,i type:: mm real(8),allocatable:: mat1(:,:),mat2(:,:),mat3(:,:) endtype type(mm):: mvp(2) real:: t1,t2 !########################### interface function mt(x,y) implicit none real(8),allocatable:: x(:,:),y(:,:) real(8),allocatable:: mt(:,:) end function end interface !############################# allocate(mvp(1).mat1(N,N),mvp(1).mat2(N,N),mvp(1).mat3(N,N)) allocate(mvp(2).mat1(N,N),mvp(2).mat2(N,N),mvp(2).mat3(N,N)) call random_number(mvp(1).mat1) call random_number(mvp(1).mat2) mvp(2).mat1 = mvp(1).mat1 mvp(2).mat2 = mvp(1).mat2 !################################### t1 = secnds(0.0) mvp(1).mat3 = matmul(mvp(1).mat1,mvp(1).mat2) t1 = secnds(t1) print*, 'matmul in Serial program: ',t1,' s' t2 = secnds(0.0) mvp(2).mat3 = mt(mvp(2).mat1,mvp(2).mat2) t2 = secnds(t2) print*, 'overloaded matmul in Serial program: ',t2,' s' END PROGRAM function mt(x,y) implicit none real(8),allocatable:: x(:,:),y(:,:) real(8),allocatable:: mt(:,:) allocate(mt(size(x,1),size(y,2))) mt = matmul(x,y) end function !#####################################################[/fortran]
and compile it:
ifort/O3note.f90/link/STACK:50000000

I got the result:
matmulinSerialprogram:7.008594s
overloadedmatmulinSerialprogram:0.4877813s

I cannot explain the result
thanks^_^


0 Kudos
5 Replies
TimP
Honored Contributor III
718 Views
If the compiler allocated plain temporary arrays as arguments to matmul, this would be an example where that is a useful transformation. I guess you didn't set options implying /Qopt-matmul; to implement that, the temporary arrays would be required in both cases (but you would be confused if you used the legacy secnds timer).
0 Kudos
alsoran
Beginner
718 Views
yesI didn't set the/Qopt-matmul. but I don't understand that the "both cases", what 's the both cases meaning?
0 Kudos
jimdempseyatthecove
Honored Contributor III
718 Views
In your function mt, you allocate the return array. This array was already allocated outside the call to the function. This should not necessarily be a cause for the discrepancy, but it adds a variable to the equasion.

In particular, by allocating within the mt function, the compiler optimization can see that the target array cannot also be one of the input arrays. See what happens when you comment out the allocate within the mt function (reusing memory already allocated prior to call).

If this is the cause of (or should I say solution to) the problem, then you have a lead on how to tune your code. (assist the compiler in identifying that the output array does not overlap the input arrays).

Jim Dempsey
0 Kudos
alsoran
Beginner
718 Views
thank you for your suggestion , in mt function, without the allocate will lead to an error "access violation "; if enabled the f2003 stardard (by /stardard-semantics), the allocate then can be comment out, but thediscrepancy between mt and matmul is still there in above code.
Thanks again!
0 Kudos
yeg001
Beginner
718 Views
I test the code in OS redhat 6.0 withifort 12.0.5.220Build20110719 and only O3 option. The result is :

matmulinSerialprogram:2.391108s
overloadedmatmulinSerialprogram:3.051880s

It seems become normal. It 's interesting why so abnormal in windows OS.
0 Kudos
Reply