Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Segmentation fault while using libblas95.so( self build, dynamic) in MKL

MinGo_Jing
Beginner
567 Views

My ifort compiler version 15.0.0, in composer_xe_2015.6.233, linux platform.

I built libblas95.so from MKL source with two changes in published makefile (MKL/interfaces/blas95/makefile):

1. Add -fPIC while build %.o from %.f90;

2. Add -fPIC -shared flag while generate libblas95.so;

 

My source code(there may have some type error for i can't paste directly):

 
MODULE mod_local_mkl_proc

USE blas95, only : dot, nrm2
USE f95_precision

IMPLICIT NONE
PUBLIC calc_dihedral

INTERFACE calc_dihedral
  MODULE PROCEDURE calc_dihedral_4
END INTERFACE

INTERFACE local_cross_product
  MODULE PROCEDURE local_vv_cp_4
END INTERFACE

CONTAINS

  SUBROUTINE local_vv_cp_4(cros, lv, rv)
    REAL(4) :: cros(3)
    REAL(4), INTENT(in) :: lv(3), rv(3)

    cros(1) = lv(2) * rv(3) - lv(3)*rv(2)
    cros(2) = lv(3) * rv(1) - lv(1)*rv(3)
    cros(3) = lv(1) * rv(2) - lv(2)*rv(1)

  END SUBROUTINE 

  SUBROUTINE calc_dihedral_4(dihedral, coor1, coor2, coor3, coor4)
    REAL(4), INTENT(inout) :: dihedral
    REAL(4), INTENT(in) :: coor1(3), coor2(3), coor3(3), coor4(3)

    ! locals
    REAL(4) :: delt12(3), delt23(3), delt34(3), len23, uni23(3), &
      d12, d23, norm_v1(3), norm_v2(3), orien_v(3)

    ! init
    orien_v = (/1.0, 1.0, 1.0/)
    delt23 = (/1.0, 1.0, 1.0/)

    ! omit Call math_dihedral_4, without this call, the error occurs too
    ! 

    delt12 = coor2 - coor1
    delt23 = coor3 - coor2
    delt34 = coor4 - coor3

    !
    ! ATTENTION here:
    ! without this print statement, the segmentation fault occurs.
    ! 
    PRINT *, delt23(1), delt23(2), delt23(3)

    len23 = nrm2(delt23)
    uni23 = delt23 / len23
    d12 = dot(delt12, delt23)
    d23 = dot(delt34, delt23)
    norm_v1 = delt12 - uni23 * d12 /len23
    norm_v2 = delt34 - uni23 * d23 /len23
    CALL local_cross_product(orien_v, norm_v1, norm_v2)
    orien_v = orien_v * delt23
  
    IF (orien_v(1) >= 0. .AND. orien_v(2) >= 0. .AND. orien_v(3) >= 0.) THEN
      dihedral = -dihedral
    END IF
    
  END SUBROUTINE 

END MODULE

PROGRAM main

USE mod_local_mkl_proc, ONLY: calc_dihedral
IMPLICIT NONE

  ! locals
  REAL(4) :: dih, coor1(3), coor2(3), coor3(3), coor4(3)

  ! init
  coor1 = (/0., 1., 0./)
  coor2 = (/0., 0., 0./)
  coor3 = (/1., 0., 0./)
  coor4 = (/1., 1., 1./)
  dih = 0
  PRINT *, "init"

  ! process
  CALL calc_dihedral(dih, coor1, coor2, coor3, coor4)
  PRINT *, dih
  PRINT *, "calc_dihedral done."

  ! end 
  PRINT *, "done."

END PROGRAM

 

The .o build options is blow:

ifort -g -fPIC -shared -check bound -gcc-name=gcc -I{MKL_INC_FLAGS} -O3 -c *.f90

The binary link flags:

ifort -o test.bin $^ -Wl,--start-roup $(LIB_INC_FLAGS) -lblas95 -lmkl_avx2 -lmkl_intel_lp64 \

-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -Wl,--end-group

 

 

Error info with gdb> run:

Program received signal SIGSEGV, Segmentation fault.

#0 0x00007fffef790606 in LInc1_X16_Loop2gas_1 () from /opt/intel/.../mkl/lib/intel64/libmkl_avx.so

#1 0x00007fffffffc320 in ?? ()

#2 0x0000000000000000 in ?? ()

 

Use ldd test.bin, stange things found:

test.bin does NOT require libmkl_avx.so, but require libmkl_avx2.so.

 

When i delete the PRINT statement in function calc_dihedral_4, everything goes well, this really confused me.

What's real problem with my code, could anyone help?

0 Kudos
3 Replies
MinGo_Jing
Beginner
567 Views

Can anyone help?

0 Kudos
Steve_Lionel
Honored Contributor III
567 Views

You should probably ask in the MKL forum, but I'll comment that you're three years out of date on the product. Can you try a newer compiler and MKL? That removing the PRINT statement changes the behavior might indicate a compiler bug or might be a bug in your program referencing uninitialized storage.

0 Kudos
MinGo_Jing
Beginner
567 Views

Steve Lionel (Ret.) wrote:

You should probably ask in the MKL forum, but I'll comment that you're three years out of date on the product. Can you try a newer compiler and MKL? That removing the PRINT statement changes the behavior might indicate a compiler bug or might be a bug in your program referencing uninitialized storage.

Thanks Steve, i'll try a new version later.

0 Kudos
Reply