Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

matmul speed test with 4 situation

alsoran
Beginner
352 Views
Test result:

matmul(fixed,fixed) costs 0.4524 sec
matmul(alloc,alloc) costs 0.5460 sec
matmul(mytype.fixed,mytype.fixed) costs 0.4680 sec
matmul(mytype.alloc,mytype.alloc) costs 26.6762 sec

CAN YOU TELL ME why did the constom-data-type matrix consume such long time?

[fortran]MODULE MOD
implicit none
    integer,parameter:: N=1000
    type:: myt
        real(8):: fixed_mat1(N,N)
        real(8):: fixed_mat2(N,N)
        real(8),allocatable:: alloc_mat1(:,:)
        real(8),allocatable:: alloc_mat2(:,:)
    endtype myt
    real(8):: fixed_mat1(N,N)
    real(8):: fixed_mat2(N,N)
    real(8),allocatable:: alloc_mat1(:,:)
    real(8),allocatable:: alloc_mat2(:,:)
    real(8):: resu_mat(N,N)
ENDMODULE MOD

!!##########main############################
PROGRAM MAIN
USE MOD
implicit none
    type(myt):: mytype
    real(4):: t1,t2
!!#####matrix elements initialize######################    
    call random_number(fixed_mat1)
    call random_number(fixed_mat2)
    mytype.fixed_mat1 = fixed_mat1
    mytype.fixed_mat2 = fixed_mat2
    allocate(alloc_mat1,source=fixed_mat1)
    allocate(alloc_mat2,source=fixed_mat2)
    allocate(mytype.alloc_mat1,source=fixed_mat1)
    allocate(mytype.alloc_mat2,source=fixed_mat2)
!!#####measure the timeand compare them####################
    call cpu_time(t1)
        resu_mat = matmul(fixed_mat1,fixed_mat2)
    call cpu_time(t2)
    write(*,'(A,F8.4,A)') 'matmul(fixed,fixed) costs',t2-t1,' sec'
!--------------------------------------------------------------------   
    call cpu_time(t1)
        resu_mat = matmul(alloc_mat1,alloc_mat2)
    call cpu_time(t2)
    write(*,'(A,F8.4,A)') 'matmul(alloc,alloc) costs',t2-t1,' sec'
!--------------------------------------------------------------------    
    call cpu_time(t1)
        resu_mat = matmul(mytype.fixed_mat1,mytype.fixed_mat2)
    call cpu_time(t2)
    write(*,'(A,F8.4,A)') 'matmul(mytype.fixed,mytype.fixed) costs',t2-t1,' sec'
!----------------------------------------------------------------------    
    call cpu_time(t1)
        resu_mat = matmul(mytype.alloc_mat1,mytype.alloc_mat2)
    call cpu_time(t2)
    write(*,'(A,F8.4,A)') 'matmul(mytype.alloc,mytype.alloc) costs',t2-t1,' sec'
    
ENDPROGRAM[/fortran]

0 Kudos
7 Replies
TimP
Honored Contributor III
352 Views
On my early Core I7 desktop model, using current ifort, I get no consistent increase in time for your last case, once I get the stack limit adjusted. I have 6GB RAM with an unsupported combination of DIMM types, presumably effectively DDR3-1066. If you have only a small amount of RAM, maybe you should deallocate the arrays when you are done with them.
Do you have a reason for using non-standard syntax? It runs faster for me when the VAX/VMS structure notation is changed to standard syntax, except that the last case speeds up only when running on an increased stack allocation (not with /heap-arrays).
0 Kudos
IDZ_A_Intel
Employee
352 Views

In addition to TimP's suggestion of deallocation, reverse the order in which you allocate and test the various combinations. This should eliminate virtual memory paging issues (assuming your type can be fully resident in RAM as opposed to in the page file).

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
352 Views
For what it's worth, I note that it runs faster yet with one of the options which replace matmul by MKL. For example,
gfortran -O3 -fexternal-blas -L/opt/xeon/composer_xe_2011_sp1.8.273/mkl/lib/intel64/ ar1.f90 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential
(using e.g. gfortran 4.7), but the gfortran 4.5 versions available for Windows are failing, besides not being compatible with MKL.
I thought there should be an equivalent ifort option.
0 Kudos
Steven_L_Intel1
Employee
352 Views
There is such an option: /Qopt-matmul
0 Kudos
alsoran
Beginner
352 Views
and then I switch the "Interprocedural Optimization" to "Multi-file"(/Qipo) or"Single-file"(/Qip), the last case worded faster than before, and there's no huge differences between another 3 cases.
however , when I change the 4th cases fromDerived-data-types to its arrayform,for instance:
mytype.alloc_mat1--->mytype(1).alloc_mat1,mytype.alloc_mat2--->mytype(1).alloc_mat2 .....
then speed slow down, and I have no ideas again.
PS: all the test was in the /O3 optimization
0 Kudos
William_Gray
Beginner
352 Views

you said that the CALCULATION TIME is different (for each case).

i was just wondering, is the "ANSWER" (stored in the matrix RESU_MAT) also different ?

the reason i ask this is :-

years ago, i was using "Visual Fortran" software ("Digital Visual Fortran", i think). it had the MATMUL intrinsic function. anyway, although i "think" all of my fortran code was correct, i would sometimes get incorrect results -- which seemed to be caused by the results calculated by MATMUL. but, at the time, i think the fortran software had a few bugs. anyway, ever since then, i tend to write my own matrix manipulation code (e.g. matrix multiplication) -- just to be safe.

of course, all of this happened long before Intel took over the reigns (of "Visual Fortran"). so, i'm sure the (Intel Visual Fortran) version of MATMUL that you are using works correctly :)

0 Kudos
TimP
Honored Contributor III
352 Views
Matmul results would change slightly (within the bounds of roundoff error) when you enable or disable /Qopt-matmul, possibly also when you change -O optimization level.
0 Kudos
Reply