- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I ran into the following problem when using the Intel Fortran 2018 Update 1 Compiler. I implemented a block algorithm to compute an out-of-place triangular matrix-matrix product C := alpha * A * B + beta *C, where A is a upper triangular matrix. Since the matrix matrix product has a great potential for parallelization I did this using OpenMP tasks and task dependencies. Ending up with the following code:

SUBROUTINE DTRMM3(M,N,ALPHA,A,LDA,B,LDB,BETA,C,LDC) USE OMP_LIB IMPLICIT NONE DOUBLE PRECISION ALPHA,BETA INTEGER LDA,LDB,LDC,M,N DOUBLE PRECISION A(LDA,*),B(LDB,*),C(LDC,*) EXTERNAL DGEMM, DTRMM INTRINSIC MAX INTEGER K,KB,L,LB,J,JB ! .. Parameters .. DOUBLE PRECISION DONE,DZERO PARAMETER (DONE=1.0D+0,DZERO=0.0D+0) INTEGER NB PARAMETER(NB=256) ! .. Local Work... DOUBLE PRECISION TMP(NB,NB) IF (M.EQ.0 .OR. N.EQ.0) RETURN IF (ALPHA.EQ.DZERO) THEN DO J = 1,N !$omp simd safelen(64) DO K = 1,M C(K,J) = BETA * C(K,J) END DO !$omp end simd END DO RETURN END IF DO L = 1,N,NB LB = MIN(NB,N - L + 1) DO K = 1,M,NB KB = MIN(NB, M - K + 1) !$omp task firstprivate(K,KB,L,LB) depend(inout: C(K:K+KB-1,L:L+LB-1)) shared(C,BETA) C(K:K+KB-1, L:L+LB-1) = BETA * C(K:K+KB-1,L:L+LB-1) !$omp end task DO J = K, M, NB JB = MIN(NB, M - J + 1) !$omp task firstprivate(K,KB,L,LB, J, JB) private(TMP) & !$omp& depend(in:A(K:K+KB-1,J:J+JB-1), B(J:J+JB+1,L:L+LB-1)) depend(inout: C(K:K+KB-1,L:L+LB-1)) & !$omp& shared(ALPHA,A,B,LDA,LDB,LDC) default(none) IF ( K .EQ. J ) THEN TMP(1:KB,1:LB) = B(K:K+KB-1,L:L+LB-1) CALL DTRMM("L","U","N","U", KB, LB, ALPHA, A(K,K), LDA, TMP, NB) C(K:K+KB-1, L:L+LB-1) = C(K:K+KB-1,L:L+LB-1) + TMP(1:KB,1:LB) ELSE CALL DGEMM("N", "N", KB, LB, JB, ALPHA, A(K,J), LDA, B(J,L), LDB, DONE, C(K,L),LDC) END IF !$omp end task END DO END DO END DO RETURN END SUBROUTINE

and execute it using:

!$omp parallel !$omp master CALL DTRMM3(M, N, ALPHA, A, LDA, B, LDB, BETA, C2, LDC) !$omp end master !$omp taskwait !$omp end parallel

The attached file contains the whole example.

I compiled the code using

ifort -xHost -O3 dtrmm3_test.f90 -qopenmp -mkl -g

and executing it on a 16-core Xeon Silver 4110 leads to a segmentation fault:

./a.out 512 786 0.00000000D+00 0.00000000D+00 0.00000000D+00 T 512 786 0.00000000D+00 0.10000000D+01 0.00000000D+00 T 512 786 0.00000000D+00 0.20000000D+01 0.00000000D+00 T forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred forrtl: severe (174): SIGSEGV, segmentation fault occurred

The first three lines show that the path ALPHA=0.0 works and it only crashes when the task-based part of the algorithm is called.

Uisng GCC 7.3 and Netlib BLAS everything works fine without an error.

OS: CentOS 7.4 , Intel Fortran 2018 Update 1, MKL 2018 Update 1

Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page