- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I encountered a problem using Scalapack routine PDGEMM under specific conditions. After investigation, I managed to reproduce the problem with a simple test program. In my case, the error occurs for mkl versions from 16 to 18 (I tested the 17, 18.0 and 18.3 release on our local cluster linking with intelmpi and libmkl_blacs_intelmpi_lp64, the 16 and 17 on our national cluster Curie, linking with bullxmpi and libmkl_blacs_openmpi_lp64, and locally on the latest release of mkl 18 linking with openmpi and libmkl_blacs_openmpi_lp64). Linking with manually compiled version of Scalapack2.0.2 resolve the problem. Linking with mkl version 14 and 15 runs fine too.
The test consist in multiplying two matrices with all coeffs set to 1, then testing the result. It appears that for a 2 by 2 processor grid (i.e. mpirun -n 4 ), the resulting matrix can be wrong. Increasing the grid size correct the problem. The error is silent as the code runs and terminate normally. Apart from the mkl, the test code is standalone and pasted below and attached.
Cordially,
Ivan Duchemin.
! ! The program test pdgemm matrix x matrix multiplication under fixed condition ! on a square processor grid provided by the user. ! ! The product tested is: ! ! C = A * B ! ! with A being a 8160 x 8160 matrix with all coeffs set to 1 ! and B being a 8160 x 19140 matrix with all coeffs set to 1 ! The result expected is thus all coeffs of C equal to 8160 ! PROGRAM TEST ! Parameters INTEGER , PARAMETER :: M=8160, N =19140, K=8160, DLEN_=9 INTEGER , PARAMETER :: CSRC=1, RSRC=1 DOUBLE PRECISION, PARAMETER :: ONE=1.0D+0, ZERO=0.0D+0 ! work variables INTEGER :: ICTXT INTEGER :: IAM INTEGER :: NPROCS INTEGER :: NPROW INTEGER :: NPCOL INTEGER :: MYROW INTEGER :: MYCOL INTEGER :: DESCA(9) INTEGER :: DESCB(9) INTEGER :: DESCC(9) INTEGER :: M_A INTEGER :: N_A INTEGER :: M_B INTEGER :: N_B INTEGER :: M_C INTEGER :: N_C INTEGER :: MB_A INTEGER :: NB_A INTEGER :: MB_B INTEGER :: NB_B INTEGER :: MB_C INTEGER :: NB_C DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:,:) :: A DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:,:) :: B DOUBLE PRECISION, ALLOCATABLE, DIMENSION(:,:) :: C ! Get starting information CALL BLACS_PINFO( IAM, NPROCS ) ! try setting square grid NPROW = sqrt(REAL(NPROCS,kind=8)) NPCOL = sqrt(REAL(NPROCS,kind=8)) if ( NPROW*NPCOL .ne. NPROCS ) then print *,"please provide a square number of procs" stop 1 end if ! Define process grid CALL BLACS_GET( -1, 0, ICTXT ) CALL BLACS_GRIDINIT( ICTXT, 'R', NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL ) ! set A matrix dimensions M_A = M N_A = K ! set B matrix dimensions M_B = K N_B = N ! set C matrix dimensions M_C = M N_C = N ! set blocking factors for A matrix MB_A = M_A/NPROW NB_A = N_A/NPCOL ! set blocking factors for B matrix MB_B = M_B/NPROW NB_B = 32 ! set blocking factors for C matrix MB_C = M_C/NPROW NB_C = 32 ! get A local dimensions MLOC_A = NUMROC( M_A, MB_A, MYROW, 0, NPROW ) NLOC_A = NUMROC( N_A, NB_A, MYCOL, 0, NPCOL ) ! get B local dimensions MLOC_B = NUMROC( M_B, MB_B, MYROW, 0, NPROW ) NLOC_B = NUMROC( N_B, NB_B, MYCOL, 0, NPCOL ) ! get C local dimensions MLOC_C = NUMROC( M_C, MB_C, MYROW, 0, NPROW ) NLOC_C = NUMROC( N_C, NB_C, MYCOL, 0, NPCOL ) ! Initialize the array descriptor for the matrix A, B and C CALL DESCINIT( DESCA, M_A, N_A, MB_A, NB_A, 0, 0, ICTXT, max(MLOC_A,1), INFO ) CALL DESCINIT( DESCB, M_B, N_B, MB_B, NB_B, 0, 0, ICTXT, max(MLOC_B,1), INFO ) CALL DESCINIT( DESCC, M_C, N_C, MB_C, NB_C, 0, 0, ICTXT, max(MLOC_C,1), INFO ) ! print grid infos do IPROC=0,NPROCS-1 if ( IPROC .eq. IAM ) then print *,"" print *,"-------------------------" print *,"PROC, MYROW, MYCOL :",PROC,MYROW,MYCOL print *,"MLOC_A, NLOC_A :",MLOC_A,NLOC_A print *,"MLOC_B, NLOC_B :",MLOC_B,NLOC_B print *,"MLOC_C, NLOC_C :",MLOC_C,NLOC_C print *,"DESCA :",DESCA print *,"DESCB :",DESCB print *,"DESCC :",DESCC print *,"-------------------------" print *,"" end if CALL SLEEP(2) end do ! allocate and set matrices ALLOCATE( A(MLOC_A,NLOC_A) ) ALLOCATE( B(MLOC_B,NLOC_B) ) ALLOCATE( C(MLOC_C,NLOC_C) ) ! init A matrix do j=1,NLOC_A do i=1,MLOC_A A(i,j)=ONE end do end do ! init B matrix do j=1,NLOC_B do i=1,MLOC_B B(i,j)=ONE end do end do ! compute A * B CALL PDGEMM('N', 'N', & & M, N, K, & & ONE, & & A, 1, 1, DESCA, & & B, 1, 1, DESCB, & & ZERO, & & C, 1, 1, DESCC ) ! check result do j=1,NLOC_C do i=1,MLOC_C if ( abs(C(i,j)-K) .gt. 1.0D-8 ) then print *,"Error: result differs from exact" print *,"C(",i,",",j,")=",C(i,j) print *,"expected ",K print *,"TEST FAILED!" stop 2 end if end do end do ! inform that everything is ok if ( IAM .eq. 0 ) then print *,"TEST PASSED!" end if ! terminate CALL BLACS_GRIDEXIT( ICTXT ) CALL BLACS_EXIT( 0 ) END PROGRAM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the case, Ivan.
We managed to reproduce the problem with 4 process but the test passed with 16 or 25... The problem is escalated and we will notify you of further updates of this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ivan, the fix of this problem available into MKL 2019 u1. This update is available for download. Please check how this work on your side and let us know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
attached the log file shows the test passed,( MKL_VERBOSE=1).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Gennady, did you intend the attachment in #4 to be seen only by privileged Intel personnel? Perhaps you selected "Sable Falls" without intending to?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks mecej4, it was some technical problem with this forum.
here is the log I captured
-------------------------
PROC, MYROW, MYCOL : 0.0000000E+00 0 0
MLOC_A, NLOC_A : 4080 4080
MLOC_B, NLOC_B : 4080 9572
MLOC_C, NLOC_C : 4080 9572
DESCA : 1 0 8160 8160 4080 4080
0 0 4080
DESCB : 1 0 8160 19140 4080 32
0 0 4080
DESCC : 1 0 8160 19140 4080 32
0 0 4080
-------------------------
PROC, MYROW, MYCOL : 0.0000000E+00 0 1
MLOC_A, NLOC_A : 4080 4080
MLOC_B, NLOC_B : 4080 9568
MLOC_C, NLOC_C : 4080 9568
DESCA : 1 0 8160 8160 4080 4080
0 0 4080
DESCB : 1 0 8160 19140 4080 32
0 0 4080
DESCC : 1 0 8160 19140 4080 32
0 0 4080
-------------------------
PROC, MYROW, MYCOL : 0.0000000E+00 1 0
MLOC_A, NLOC_A : 4080 4080
MLOC_B, NLOC_B : 4080 9572
MLOC_C, NLOC_C : 4080 9572
DESCA : 1 0 8160 8160 4080 4080
0 0 4080
DESCB : 1 0 8160 19140 4080 32
0 0 4080
DESCC : 1 0 8160 19140 4080 32
0 0 4080
-------------------------
PROC, MYROW, MYCOL : 0.0000000E+00 1 1
MLOC_A, NLOC_A : 4080 4080
MLOC_B, NLOC_B : 4080 9568
MLOC_C, NLOC_C : 4080 9568
DESCA : 1 0 8160 8160 4080 4080
0 0 4080
DESCB : 1 0 8160 19140 4080 32
0 0 4080
DESCC : 1 0 8160 19140 4080 32
0 0 4080
-------------------------
MKL_VERBOSE Intel(R) MKL 2019.0 Update 1 Product build 20180928 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions (Intel(R) AVX) enabled processors, Lnx 2.80GHz lp64 intel_thread
MKL_VERBOSE DGEMM(N,N,512,9568,4080,0x1250650,0x2b4f6b1de080,512,0x2b4f6e1af080,4080,0x1250658,0x2b4f587dd280,4080) 421.40ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:5
...................
MKL_VERBOSE DGEMM(N,N,496,9572,4080,0x1250650,0x2b798943b080,496,0x2b798b41c080,4080,0x14f76c8,0x2b7975a51280,4080) 394.36ms CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:5
TEST PASSED!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page