Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26744 Discussions

Segmentation Fault with OpenMP Tasks in Subroutines (Intel Fortran 2018 Update 1)

Martin_K_7
Beginner
101 Views

I ran into the following problem when using the Intel Fortran 2018 Update 1 Compiler.  I implemented a block algorithm to compute  an out-of-place triangular matrix-matrix product  C := alpha * A * B + beta *C, where A is a upper triangular matrix. Since the matrix matrix product has a great potential for parallelization I did this using OpenMP tasks and task dependencies. Ending up with the following code:

SUBROUTINE DTRMM3(M,N,ALPHA,A,LDA,B,LDB,BETA,C,LDC)
    USE OMP_LIB
    IMPLICIT NONE
    DOUBLE PRECISION ALPHA,BETA
    INTEGER LDA,LDB,LDC,M,N
    DOUBLE PRECISION A(LDA,*),B(LDB,*),C(LDC,*)
    EXTERNAL DGEMM, DTRMM
    INTRINSIC MAX
    INTEGER K,KB,L,LB,J,JB
    !     .. Parameters ..
    DOUBLE PRECISION DONE,DZERO
    PARAMETER (DONE=1.0D+0,DZERO=0.0D+0)
    INTEGER NB
    PARAMETER(NB=256)
    !     .. Local Work...
    DOUBLE PRECISION TMP(NB,NB)

    IF (M.EQ.0 .OR. N.EQ.0) RETURN

    IF (ALPHA.EQ.DZERO) THEN
        DO J = 1,N
            !$omp simd safelen(64)
            DO K = 1,M
                C(K,J) = BETA * C(K,J)
            END DO
            !$omp end simd
        END DO
        RETURN
    END IF

    DO L = 1,N,NB
        LB = MIN(NB,N - L + 1)
        DO K = 1,M,NB
            KB = MIN(NB, M - K + 1)
            !$omp task firstprivate(K,KB,L,LB) depend(inout: C(K:K+KB-1,L:L+LB-1)) shared(C,BETA)
            C(K:K+KB-1, L:L+LB-1) = BETA * C(K:K+KB-1,L:L+LB-1)
            !$omp end task
            DO J = K, M, NB
                JB = MIN(NB, M - J + 1)
                !$omp task firstprivate(K,KB,L,LB, J, JB) private(TMP) &
                !$omp& depend(in:A(K:K+KB-1,J:J+JB-1), B(J:J+JB+1,L:L+LB-1)) depend(inout: C(K:K+KB-1,L:L+LB-1)) &
                !$omp& shared(ALPHA,A,B,LDA,LDB,LDC) default(none)
                IF ( K .EQ. J ) THEN
                    TMP(1:KB,1:LB) = B(K:K+KB-1,L:L+LB-1)
                    CALL DTRMM("L","U","N","U", KB, LB, ALPHA, A(K,K), LDA, TMP, NB)
                    C(K:K+KB-1, L:L+LB-1) = C(K:K+KB-1,L:L+LB-1) + TMP(1:KB,1:LB)
                ELSE
                    CALL DGEMM("N", "N", KB, LB, JB, ALPHA, A(K,J), LDA, B(J,L), LDB, DONE, C(K,L),LDC)
                END IF
                !$omp end task
            END DO

        END DO
    END DO
    RETURN
END SUBROUTINE


and execute it using:

    !$omp parallel
    !$omp master
    CALL DTRMM3(M, N, ALPHA, A, LDA, B, LDB, BETA, C2, LDC)
    !$omp end master
    !$omp taskwait
    !$omp end parallel

 

The attached file contains the whole example.

I compiled the code using

 ifort -xHost -O3 dtrmm3_test.f90  -qopenmp -mkl -g

and executing it on a 16-core Xeon Silver 4110 leads to a segmentation fault:

./a.out 
   512   786     0.00000000D+00   0.00000000D+00   0.00000000D+00  T
   512   786     0.00000000D+00   0.10000000D+01   0.00000000D+00  T
   512   786     0.00000000D+00   0.20000000D+01   0.00000000D+00  T
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred
forrtl: severe (174): SIGSEGV, segmentation fault occurred

The first three lines show that the path ALPHA=0.0 works and it only crashes when the task-based part of the algorithm is called.

Uisng GCC 7.3 and Netlib BLAS everything works fine without an error.

OS: CentOS 7.4 , Intel Fortran 2018 Update 1, MKL 2018 Update 1

0 Kudos
0 Replies
Reply