Solved: Re: Segfault in dgesvx

aaron2 · ‎07-27-2020

I'm getting a segfault in the routine dgesvx using MKL. Here is a minimal working example. I can use dgesv, but not dgesvx which is a version that estimates the condition number of the matrix. I'm compiling with ifort dgesvx_tester.f90 -L/opt/intel/composer2020/mkl/lib/intel64 -lmkl_core -lmkl_intel_lp64 -lmkl_sequential -lpthread and the output of the program segfaults at the dgesvx routine, but dgesv works. Any help would be appreciated. Sample code output:

 dgesv solution: 
     2.1213203    -0.7071068     3.0000000     4.0000000     5.0000000
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
a.out              000000000040594A  Unknown               Unknown  Unknown
libpthread-2.27.s  00007FDB72F2A8A0  Unknown               Unknown  Unknown
libmkl_core.so     00007FDB75B4DD7A  mkl_lapack_dgesvx     Unknown  Unknown
libmkl_intel_lp64  00007FDB74AE369D  DGESVX                Unknown  Unknown
a.out              0000000000404292  Unknown               Unknown  Unknown
a.out              0000000000403002  Unknown               Unknown  Unknown
libc-2.27.so       00007FDB727AAB97  __libc_start_main     Unknown  Unknown
a.out              0000000000402EEA  Unknown               Unknown  Unknown

aaron2 · ‎07-28-2020

The problem was passing a constant to the EQUED variable, which required a variable for output, as shown here.

View solution in original post

aaron2 · ‎07-27-2020

For reference, compiling with debug options (-O0 -debug all -debug-parameters all -debug pubnames -debug variable-locations -debug extended -fvar-tracking -CB -check stack -check uninit -traceback) shows the offending line in my program is the call to dgesvx:

 dgesv solution: 
     2.1213203    -0.7071068     3.0000000     4.0000000     5.0000000
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
a.out              000000000040DB7A  Unknown               Unknown  Unknown
libpthread-2.27.s  00007FF9DDDAD8A0  Unknown               Unknown  Unknown
libmkl_core.so     00007FF9E09D0D7A  mkl_lapack_dgesvx     Unknown  Unknown
libmkl_intel_lp64  00007FF9DF96669D  DGESVX                Unknown  Unknown
a.out              0000000000406656  MAIN__                     83  main.f90
a.out              0000000000403002  Unknown               Unknown  Unknown
libc-2.27.so       00007FF9DD429B97  __libc_start_main     Unknown  Unknown
a.out              0000000000402EEA  Unknown               Unknown  Unknown

Gennady_F_Intel · ‎07-27-2020

before checking the case -- into this example you shared, i see you linked against mkl2018 but into the description above, you mentioned mkl2020. Did you really try the v.2020?

! Test program for dgesvx

! compile command used:

!ifort dgesvx_tester.f90 -L/opt/intel/composer2018/mkl/lib/intel64 -lmkl_core -lmkl_intel_lp64 -lmkl_sequential -lpthread

mecej4 · ‎07-27-2020

You have chosen the hard way to start using Lapack and BLAS through MKL -- using implicit interfaces by declaring the Lapack routines as EXTERNAL, and attempting to pass the large number of secondary arguments (place holders for optional arguments, work arrays, etc.) that these interfaces require. It often requires quite a bit of time to check that all the input arguments have correct values set, that work arrays are large enough, to test the output codes for values that signify errors and failures, and take appropriate action.

I suggest that, instead, you use the Lapack95 interfaces, at least for your first attempts. Once the code is working, and you wish to use non-default values or gain more control over what the Lapack routines do, you may switch over to the F77 Lapack calls, but even then you can benefit from interface checking (by add adding INCLUDE 'mkl.fi' or the equivalent USE statements).

Here is simplified code for your test problem using LAPACK95:

program xgesvx
  use lapack95
  implicit none
  integer,  parameter   :: dp = selected_real_kind(15,300)

  real(dp), allocatable :: m1(:,:)
  real(dp), allocatable :: v1(:), v2(:)
  real(dp) :: r1, rcond
  integer  :: sz, ii, info

  sz = 5

  ! matrix and vector vars    
  allocate( v1(sz) )
  allocate( v2(sz) )
  allocate( m1(sz,sz) )

  ! set the matrix and the inhomogeneous vector
  m1 = 0.0_dp
  do ii = 1, sz
      v1(ii) = real(ii,dp)
      m1(ii,ii) = 1.0_dp
  end do

  ! set upper 2x2 to Hadamard
  r1 = 1.0_dp / sqrt(2.0_dp)
  m1(1,1) = r1
  m1(1,2) = r1
  m1(2,1) = r1
  m1(2,2) = -r1
  ! calculate v2 = m1.v1
  v2=matmul(m1,v1)
  ! solve
  call gesvx(m1,v2,v1,info=info,rcond=rcond)
  print *,'info from gesvx = ',info, ' and rcond = ',rcond
  print '(1x,A,5F10.5)','v1 = ',v1

  deallocate( v1 )
  deallocate( v2 )
  deallocate( m1 )

  stop
end program xgesvx

On Windows, I obtained the output

Q:\lang\mkl>xgesvx
 info from gesvx =            0  and rcond =   0.500000000000000
 v1 =    1.00000   2.00000   3.00000   4.00000   5.00000

mecej4 · ‎07-27-2020

Here is code in which I use the much-easier-to-use Lapack95 interface:

program xgesvx
  use lapack95
  implicit none
  integer,  parameter   :: dp = selected_real_kind(15,300)

  real(dp), allocatable :: m1(:,:)
  real(dp), allocatable :: v1(:), v2(:)
  real(dp) :: r1, rcond
  integer  :: sz, ii, info

  sz = 5

  ! matrix and vector vars    
  allocate( v1(sz) )
  allocate( v2(sz) )
  allocate( m1(sz,sz) )

  ! set the matrix and the inhomogeneous vector
  m1 = 0.0_dp
  do ii = 1, sz
      v1(ii) = real(ii,dp)
      m1(ii,ii) = 1.0_dp
  end do

  ! set upper 2x2 to Hadamard
  r1 = 1.0_dp / sqrt(2.0_dp)
  m1(1,1) = r1
  m1(1,2) = r1
  m1(2,1) = r1
  m1(2,2) = -r1
  ! calculate v2 = m1.v1
  v2=matmul(m1,v1)
  ! solve
  call gesvx(m1,v2,v1,info=info,rcond=rcond)
  print *,'info from gesvx = ',info, ' and rcond = ',rcond
  print '(1x,A,5F10.5)','v1 = ',v1

  deallocate( v1 )
  deallocate( v2 )
  deallocate( m1 )

  stop
end program xgesvx

The output, obtained on WIndows using the current MKL version (LP64):

 info from gesvx =            0  and rcond =   0.500000000000000
 v1 =    1.00000   2.00000   3.00000   4.00000   5.00000

NOTE: A few minutes ago, I had posted a more detailed reply with the same code but with more explanations and recommendations. The new Intel forums software, in its wisdom, decided to delete my post when I attempted to edit my reply in order to improve a couple of sentences.

No reason was given for why my post was deleted, and no obvious way exists to appeal the high-handed decision to delete. This is a disincentive to post to these forums in the future.

aaron2 · ‎07-28-2020

The problem was passing a constant to the EQUED variable, which required a variable for output, as shown here.

mecej4 · ‎07-28-2020

And now, about ten hours after they disappeared from this forum last night, both of my deleted posts have reappeared, with no comment or explanation.

Strange forum behavior!

Gennady_F_Intel · ‎07-28-2020

mecej4, we are very sorry. This happens due to migration to the AEM engine and the support team is working on further improve the stability of the forums.

mecej4 · ‎07-28-2020

Gennady,

The compiler could have caught the error (passing a const character string in place of a character variable) if the interfaces in MKL.fi contained INTENT attributes for the arguments and the OP INCLUDEd MKL.fi in his code.

The following files (in Parallel Studio XE 2020 U2) do have INTENT for some of the routines that they cover:

mkl_cluster_sparse_solver.fi
mkl_dss.fi
mkl_pardiso.fi
mkl_sparse_handle.fi
mkl_vsl_subroutine.fi

There are no INTENT clauses in

mkl_blas.fi
mkl_lapack.fi

aaron2 · ‎07-28-2020

I have see similar behavior when trying to post questions to these forums, they are heavily moderated. Thanks again for your solution, I will probably end up using the lapack95 variant in my implementation due to its ease of use.

Gennady_F_Intel · ‎07-28-2020

thanks for the update - it resolves the case:

>set MKL_VERBOSE=1

>dgesvx_tester.exe
MKL_VERBOSE Intel(R) MKL 2020.0 Update 2 Product build 20200624 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Win 2.60GHz cdecl intel_thread
MKL_VERBOSE DGESV(5,1,000002044145BF20,5,000002044145FFE0,0000020441487FD0,5,0) 194.56us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2
dgesv solution:
2.1213203 -0.7071068 3.0000000 4.0000000 5.0000000
MKL_VERBOSE DGESVX(N,N,5,1,000002044145BF20,5,000002044145BE40,5,000002044145FFE0,N,0000020441487F70,0000020441487F40,0000020441487FD0,5,0000020441487FA0,5,00000008CD8FFDA0,0000020441487F10,0000020441487EE0,0000020441453F60,000002044145FFC0,0) 126.49us CNR:OFF Dyn:1 FastMM:1 TID:0 NThr:2
dgesvx solution: (rcond= 0.7322E-01)
-1.4142136 3.4142136 3.0000000 4.0000000 5.0000000