- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am calling dger on a subset of a 2 dimensional matrix. I call it in two different ways. One way has an extra set of brackets in the row subsetting of the matrix A - and this runs by around a factor of 4 slower than the way without the brackets.
Whether I compile this with
ifort -heap-arrays -O3 -mkl=sequential blas9F.f90 -o blas9test
or
ifort -heap-arrays -O3 blas9F.f90 -o blas9test -lblas
the second version runs much slower than the first - which doesn't make sense to me. When I compile it using gfortran with -lblas the two versions take the same time to run - as expected, and this is much faster than the slow version using the intel compiler (and slightly slower than the fast version using the intel compiler). What is the cause of this? Note that if NBR = 1 then the two versions take the same amount of time to run. I have a theory this has something to do with a temporary copy occurring, and that somehow the brackets confuse the compiler into thinking that a copy is necessary, even though the leading submatrix is always used if NBR = 0.
My first time posting here so let me know if I have broken the rules in some way.
program blas9F
USE ISO_C_BINDING
implicit none
REAL(C_DOUBLE), DIMENSION(:,:), ALLOCATABLE :: MAT1
REAL(C_DOUBLE), DIMENSION(:,:), ALLOCATABLE :: MAT2
INTEGER(C_LONG) :: NROW
INTEGER(C_LONG) :: NCOL
INTEGER(C_LONG) :: FIRSTCOLUMNOFSQUARE
INTEGER(C_LONG) :: NBR
INTEGER(C_LONG) :: NUMBEROFROWSAWAYFROMBOTTOM
INTEGER(C_LONG) :: CURRENTROW
INTEGER(C_LONG) :: CURRENTCOLUMN
INTEGER(C_LONG) :: I
REAL :: STARTTIME, FINISHTIME
REAL :: TIMETAKENVERSION3 = 0
REAL :: TIMETAKENVERSION4 = 0
REAL(C_DOUBLE) :: BLASALPHA = 2.0
REAL(C_DOUBLE) :: BLASBETA = 2.0
external dger
NROW = 3001
NCOL = 4001
NBR = 0
FIRSTCOLUMNOFSQUARE = NCOL - NROW + 1
ALLOCATE(MAT1(NROW,NCOL))
ALLOCATE(MAT2(NROW,NCOL))
MAT1 = 1
DO I = 1,NROW
MAT1(I,FIRSTCOLUMNOFSQUARE+I-1) = 10*NCOL !diagonally dominant matrix
END DO
MAT2 = 1
DO I = 1,NROW
MAT2(I,FIRSTCOLUMNOFSQUARE+I-1) = 10*NCOL !diagonally dominant matrix
END DO
DO NUMBEROFROWSAWAYFROMBOTTOM=0,NROW-1
CURRENTROW = NROW - NUMBEROFROWSAWAYFROMBOTTOM
CURRENTCOLUMN = NCOL - NUMBEROFROWSAWAYFROMBOTTOM
IF(CURRENTROW .NE. NBR) THEN
call cpu_time(STARTTIME)
call dger(CURRENTROW-1-NBR, &
CURRENTROW-1-NBR, BLASALPHA, &
MAT1(1+NBR:CURRENTROW-1,CURRENTCOLUMN), 1, &
MAT1(CURRENTROW,FIRSTCOLUMNOFSQUARE+NBR:CURRENTCOLUMN-1), 1, &
MAT1(1+NBR:NROW,FIRSTCOLUMNOFSQUARE:CURRENTCOLUMN-1), NROW - NBR)
call cpu_time(FINISHTIME)
TIMETAKENVERSION3 = TIMETAKENVERSION3 + FINISHTIME - STARTTIME
END IF
END DO
DO NUMBEROFROWSAWAYFROMBOTTOM=0,NROW-1
CURRENTROW = NROW - NUMBEROFROWSAWAYFROMBOTTOM
CURRENTCOLUMN = NCOL - NUMBEROFROWSAWAYFROMBOTTOM
IF(CURRENTROW .NE. NBR) THEN
call cpu_time(STARTTIME)
call dger(CURRENTROW-1-NBR, CURRENTROW-1-NBR, BLASALPHA, &
MAT2(1+NBR:CURRENTROW-1,CURRENTCOLUMN), 1, &
MAT2(CURRENTROW,FIRSTCOLUMNOFSQUARE+NBR:CURRENTCOLUMN-1), 1, &
MAT2((1+NBR):NROW,(FIRSTCOLUMNOFSQUARE+NBR):(CURRENTCOLUMN-1)), &
NROW - NBR)
call cpu_time(FINISHTIME)
TIMETAKENVERSION4 = TIMETAKENVERSION4 + FINISHTIME - STARTTIME
END IF
END DO
PRINT *, "TIMETAKENVERSION3"
PRINT *, TIMETAKENVERSION3
PRINT *, "TIMETAKENVERSION4"
PRINT *, TIMETAKENVERSION4
end
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page