- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The form of matmul like matmul(a(i).b,c(j).d) or matmul(a(i)%b,c(j)%d) have the extremely low efficiency.
but the overloaded matmul function, I customize mt(a(i).b,c(j).d) , have the much higher efficiency than origin matmul funcion.
[fortran]!###################################################
PROGRAM matmul_in_derive
implicit none
integer:: N=1000,i
type:: mm
real(8),allocatable:: mat1(:,:),mat2(:,:),mat3(:,:)
endtype
type(mm):: mvp(2)
real:: t1,t2
!###########################
interface
function mt(x,y)
implicit none
real(8),allocatable:: x(:,:),y(:,:)
real(8),allocatable:: mt(:,:)
end function
end interface
!#############################
allocate(mvp(1).mat1(N,N),mvp(1).mat2(N,N),mvp(1).mat3(N,N))
allocate(mvp(2).mat1(N,N),mvp(2).mat2(N,N),mvp(2).mat3(N,N))
call random_number(mvp(1).mat1)
call random_number(mvp(1).mat2)
mvp(2).mat1 = mvp(1).mat1
mvp(2).mat2 = mvp(1).mat2
!###################################
t1 = secnds(0.0)
mvp(1).mat3 = matmul(mvp(1).mat1,mvp(1).mat2)
t1 = secnds(t1)
print*, 'matmul in Serial program: ',t1,' s'
t2 = secnds(0.0)
mvp(2).mat3 = mt(mvp(2).mat1,mvp(2).mat2)
t2 = secnds(t2)
print*, 'overloaded matmul in Serial program: ',t2,' s'
END PROGRAM
function mt(x,y)
implicit none
real(8),allocatable:: x(:,:),y(:,:)
real(8),allocatable:: mt(:,:)
allocate(mt(size(x,1),size(y,2)))
mt = matmul(x,y)
end function
!#####################################################[/fortran]
and compile it:
ifort/O3note.f90/link/STACK:50000000
I got the result:
matmulinSerialprogram:7.008594s
overloadedmatmulinSerialprogram:0.4877813s
overloadedmatmulinSerialprogram:0.4877813s
I cannot explain the result
thanks^_^
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the compiler allocated plain temporary arrays as arguments to matmul, this would be an example where that is a useful transformation. I guess you didn't set options implying /Qopt-matmul; to implement that, the temporary arrays would be required in both cases (but you would be confused if you used the legacy secnds timer).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yesI didn't set the/Qopt-matmul. but I don't understand that the "both cases", what 's the both cases meaning?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your function mt, you allocate the return array. This array was already allocated outside the call to the function. This should not necessarily be a cause for the discrepancy, but it adds a variable to the equasion.
In particular, by allocating within the mt function, the compiler optimization can see that the target array cannot also be one of the input arrays. See what happens when you comment out the allocate within the mt function (reusing memory already allocated prior to call).
If this is the cause of (or should I say solution to) the problem, then you have a lead on how to tune your code. (assist the compiler in identifying that the output array does not overlap the input arrays).
Jim Dempsey
In particular, by allocating within the mt function, the compiler optimization can see that the target array cannot also be one of the input arrays. See what happens when you comment out the allocate within the mt function (reusing memory already allocated prior to call).
If this is the cause of (or should I say solution to) the problem, then you have a lead on how to tune your code. (assist the compiler in identifying that the output array does not overlap the input arrays).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
thank you for your suggestion , in mt function, without the allocate will lead to an error "access violation "; if enabled the f2003 stardard (by /stardard-semantics), the allocate then can be comment out, but thediscrepancy between mt and matmul is still there in above code.
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I test the code in OS redhat 6.0 withifort 12.0.5.220Build20110719 and only O3 option. The result is :
matmulinSerialprogram:2.391108s
overloadedmatmulinSerialprogram:3.051880s
It seems become normal. It 's interesting why so abnormal in windows OS.
matmulinSerialprogram:2.391108s
overloadedmatmulinSerialprogram:3.051880s
It seems become normal. It 's interesting why so abnormal in windows OS.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page