Showing results for

- Intel Community
- Software
- Software Development Topics
- Software Tuning, Performance Optimization & Platform Monitoring
- Subsetting question regarding DGER

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Mazur__Luke

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-30-2018
07:10 PM

75 Views

Subsetting question regarding DGER

I am calling dger on a subset of a 2 dimensional matrix. I call it in two different ways. One way has an extra set of brackets in the row subsetting of the matrix A - and this runs by around a factor of 4 slower than the way without the brackets.

Whether I compile this with

ifort -heap-arrays -O3 -mkl=sequential blas9F.f90 -o blas9test

or

ifort -heap-arrays -O3 blas9F.f90 -o blas9test -lblas

the second version runs much slower than the first - which doesn't make sense to me. When I compile it using gfortran with -lblas the two versions take the same time to run - as expected, and this is much faster than the slow version using the intel compiler (and slightly slower than the fast version using the intel compiler). What is the cause of this? Note that if NBR = 1 then the two versions take the same amount of time to run. I have a theory this has something to do with a temporary copy occurring, and that somehow the brackets confuse the compiler into thinking that a copy is necessary, even though the leading submatrix is always used if NBR = 0.

My first time posting here so let me know if I have broken the rules in some way.

program blas9F

USE ISO_C_BINDING implicit none REAL(C_DOUBLE), DIMENSION(:,:), ALLOCATABLE :: MAT1 REAL(C_DOUBLE), DIMENSION(:,:), ALLOCATABLE :: MAT2 INTEGER(C_LONG) :: NROW INTEGER(C_LONG) :: NCOL INTEGER(C_LONG) :: FIRSTCOLUMNOFSQUARE INTEGER(C_LONG) :: NBR INTEGER(C_LONG) :: NUMBEROFROWSAWAYFROMBOTTOM INTEGER(C_LONG) :: CURRENTROW INTEGER(C_LONG) :: CURRENTCOLUMN INTEGER(C_LONG) :: I REAL :: STARTTIME, FINISHTIME REAL :: TIMETAKENVERSION3 = 0 REAL :: TIMETAKENVERSION4 = 0 REAL(C_DOUBLE) :: BLASALPHA = 2.0 REAL(C_DOUBLE) :: BLASBETA = 2.0 external dger NROW = 3001 NCOL = 4001 NBR = 0 FIRSTCOLUMNOFSQUARE = NCOL - NROW + 1 ALLOCATE(MAT1(NROW,NCOL)) ALLOCATE(MAT2(NROW,NCOL)) MAT1 = 1 DO I = 1,NROW MAT1(I,FIRSTCOLUMNOFSQUARE+I-1) = 10*NCOL !diagonally dominant matrix END DO MAT2 = 1 DO I = 1,NROW MAT2(I,FIRSTCOLUMNOFSQUARE+I-1) = 10*NCOL !diagonally dominant matrix END DO DO NUMBEROFROWSAWAYFROMBOTTOM=0,NROW-1 CURRENTROW = NROW - NUMBEROFROWSAWAYFROMBOTTOM CURRENTCOLUMN = NCOL - NUMBEROFROWSAWAYFROMBOTTOM IF(CURRENTROW .NE. NBR) THEN call cpu_time(STARTTIME) call dger(CURRENTROW-1-NBR, & CURRENTROW-1-NBR, BLASALPHA, & MAT1(1+NBR:CURRENTROW-1,CURRENTCOLUMN), 1, & MAT1(CURRENTROW,FIRSTCOLUMNOFSQUARE+NBR:CURRENTCOLUMN-1), 1, & MAT1(1+NBR:NROW,FIRSTCOLUMNOFSQUARE:CURRENTCOLUMN-1), NROW - NBR) call cpu_time(FINISHTIME) TIMETAKENVERSION3 = TIMETAKENVERSION3 + FINISHTIME - STARTTIME END IF END DO DO NUMBEROFROWSAWAYFROMBOTTOM=0,NROW-1 CURRENTROW = NROW - NUMBEROFROWSAWAYFROMBOTTOM CURRENTCOLUMN = NCOL - NUMBEROFROWSAWAYFROMBOTTOM IF(CURRENTROW .NE. NBR) THEN call cpu_time(STARTTIME) call dger(CURRENTROW-1-NBR, CURRENTROW-1-NBR, BLASALPHA, & MAT2(1+NBR:CURRENTROW-1,CURRENTCOLUMN), 1, & MAT2(CURRENTROW,FIRSTCOLUMNOFSQUARE+NBR:CURRENTCOLUMN-1), 1, & MAT2((1+NBR):NROW,(FIRSTCOLUMNOFSQUARE+NBR):(CURRENTCOLUMN-1)), & NROW - NBR) call cpu_time(FINISHTIME) TIMETAKENVERSION4 = TIMETAKENVERSION4 + FINISHTIME - STARTTIME END IF END DO PRINT *, "TIMETAKENVERSION3" PRINT *, TIMETAKENVERSION3 PRINT *, "TIMETAKENVERSION4" PRINT *, TIMETAKENVERSION4 end

Link Copied

0 Replies

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

For more complete information about compiler optimizations, see our Optimization Notice.