Mysterious Segmentation fault when using ifort with f2py3

Conrad__Pascal · ‎01-20-2020

Dear Community, I do have some trouble with Segmentation faults that occur when I run self-built fortran-routines from Python on.

I wrap these routines into Python via f2py3.

The routines have the purpose to solve linear square systems that have a definite solution. They're multithreaded via OpenMP.

When running them with linear systems of high dimensions Python crashes and it returns an error message such as:

Segmentation fault: 11

So, I could suppose that my code but simply bad and it doesn't work at all. But, as I've already mentioned these issues do not occur for all linear systems.

Apart form the circumstance that it works for little linear systems and that it doesn't for huge ones, there are two important things that I've recognised referring to the OS and the hardware. The "limit value" for getting the error message is around 520 on macOS on a 2012 13-inch MacbookPro with a Core i5 3210m and 16GB of symmetrically installed memory.

The "limit value" is around 370 on Linux (Kubuntu 19.10) with Kernel 5.3 on a ThinkPad P43S with a Core i7-8565U and 24GB of asymmetrically installed memory. The next strange thing is that I have found a partial solution which works only on the MacBookPro with macOS but doesn't work on the ThinkPad with Linux. This means that I have built adapted subroutines for each dimension up to 2500 particularly both contained by multiple files and contained by one huge file.

Of course, I have not edited thousands of subroutines manually. This has been done by a Python script very well and the numerical results are correct too if I don't get an error message. The only difference of the source code is that the dimension isn't an input argument, of course.

Perhaps, anyone has had similar issues and knows a solution to fix them or he/she just knows the reasons why this happens because I'm still a beginner at programming.

I submit an example of my source code and the used command to build it here. The adapted ones are similar, of course. Don't wonder about the comments with "write" statements.

They have been used for debugging.

f2py3 --fcompiler=intelem --f90flags='-qopenmp' -c -m linsolve linsolve.f90 -liomp5

subroutine linsolve (SoLEQ, N, sol_vector)
  use omp_lib
  implicit none
  
  integer(kind=8), intent(in) :: N
  real(kind=8), intent(in) :: SoLEQ(:,:)
  integer(kind=8) :: i, j, q, n0, k0, m0, sdvl = 10, threadnumber = 2
  integer(kind=8) :: ln_amount, clm_amount, dim0, DIM
  real(kind=8) :: av0, tol0
  integer(kind=8) :: id0(N,N), id1(N,N), id2(N+1,N+1), bcrw(N,1), bccw(1,N+1)
  real(kind=8) :: rwch(N,N+1), rwch1(N,N+1), eq_ref(1,N+1)
  real(kind=8) :: sbth(N,N+1), dim0_ar(N,N+1), dim0_clm(N,1)
  real(kind=8), intent(out) :: sol_vector(N+1, 1)
  
!  call omp_set_num_threads(threadnumber)


  ln_amount = size(SoLEQ(:, 1))
  clm_amount = size(SoLEQ(1, :))
  DIM = clm_amount - 1
!  av0 = sum(abs(SoLEQ)) / (clm_amount * ln_amount)
  tol0 = 1e-09 * DIM
  
!  write(*,*) "SoLEQ is:"
!  do i = 1, ln_amount
!	write(*,*) SoLEQ(i, :)
!  end do
  id0(:,:) = 0
  id2(:,:) = 0
  
  
  do i = 1, DIM, 1
	id0(i, i) = 1
	id2(i+1, i) = 1
  end do
  id2(1, clm_amount) = 1
!  rwch = matmul(SoLEQ, id2)
  rwch(:,:) = 0.0
  !$omp parallel private( n0, m0, k0 )
  !$omp do schedule( static, sdvl )
  do n0 = 1, N, 1
	do m0 = 1, N+1, 1
	  do k0 = 1, N+1, 1
		rwch(n0, m0) = rwch(n0, m0) + SoLEQ(n0, k0) * id2(k0, m0)
	  end do
	end do
  end do
  !$omp end do
  !$omp end parallel
  bcrw(:,1) = 1
  bccw(1,:) = 1
  
!  write(*,*) "rwch is:"
!  do i = 1, ln_amount, 1
!	  write(*,*) rwch(i, :)
!  end do
  
  
  
  
  do dim0 = 1, DIM, 1
!	write(*,*) "dim0 =", dim0
!	ln_amount = size(SoLEQ(:, 1))
!	where ((abs(rwch)) >= tol0)
!	  rwch = rwch
!	elsewhere
!	  rwch = 0.0
!	end where

  !$omp parallel private( i, j )
  !$omp do schedule( static, sdvl )
	do i = 1, ln_amount, 1
	  do j = 1, clm_amount, 1
		if (((abs(rwch(i, j))) >= tol0)) then
		  rwch(i, j) = rwch(i, j)
		else
		  rwch(i, j) = 0.0
		end if		  
	  end do
	end do
  !$omp end do
  !$omp end parallel
	
	do q = dim0, DIM, 1
	  if ((rwch(q, dim0)) /= 0.0) exit
	end do
!	write(*,*) "q =", q
!	write(*,*) "It's", rwch(q, dim0)
	
	id1 = id0
	id1(dim0, dim0) = 0
	id1(q, q) = 0
	id1(dim0, q) = 1
	id1(q, dim0) = 1
!	do i = 1, DIM, 1
!	  write(*,*) id1(i, :)
!	  write(*,*) ""
!	end do
	
!	rwch = matmul(id1, rwch)
	rwch1 = rwch
	rwch(:,:) = 0.0
	!$omp parallel private( n0, m0, k0 )
	!$omp do schedule( static, sdvl )
	do n0 = 1, N, 1
	  do m0 = 1, N+1, 1
		do k0 = 1, N, 1
		  rwch(n0, m0) = rwch(n0, m0) + id1(n0, k0) * rwch1(k0, m0)
		end do
	  end do
	end do
	!$omp end do
	!$omp end parallel
	
	!$omp parallel private( i )
	!$omp do schedule( static, sdvl )
	do i = 1, clm_amount, 1
	  eq_ref(1, i) = rwch(dim0, i) / rwch(dim0, dim0)
!	  write(*,*) rwch(dim0, i), "/", rwch(dim0, dim0), "=", eq_ref(1, i)
	end do
	!$omp end do
	!$omp end parallel
	
!	write(*,*) "eq_ref is:"
!	write(*,*) eq_ref(1, :)	
!	sbth = matmul(bcrw, eq_ref)
	sbth(:,:) = 0.0
	
	!$omp parallel private( n0, m0 )
	!$omp do schedule( static, sdvl )
	do n0 = 1, N, 1
	  do m0 = 1, N+1, 1
		sbth(n0, m0) = eq_ref(1, m0)
	  end do
	end do
	!$omp end do
	!$omp end parallel

	sbth(dim0, :) = 0
!	write(*,*) "subtrahend is:"
!	do i = 1, ln_amount, 1
!	  write(*,*) sbth(i, :)
!	end do
	
	!$omp parallel private( i )
	!$omp do schedule( static, sdvl )
	do i = 1, ln_amount, 1
	  dim0_clm(i, 1) = rwch(i, dim0)
	end do
	!$omp end do
	!$omp end parallel	
	
!	dim0_ar = matmul(dim0_clm, bccw)
	dim0_ar(:,:) = 0.0
	!$omp parallel private( n0, m0 )
	!$omp do schedule( static, sdvl )
	do n0 = 1, N, 1
	  do m0 = 1, N+1, 1
		dim0_ar(n0, m0) = dim0_clm(n0, 1)
	  end do
	end do
	!$omp end do
	!$omp end parallel
	
!	write(*,*) "dim0_ar is:"
!	do i = 1, ln_amount, 1
!	  write(*,*) dim0_ar(i, :)
!	end do	
!	sbth = sbth * dim0_ar

	!$omp parallel private( i, j )
	!$omp do schedule( static, sdvl )
	do i = 1, ln_amount, 1
	  do j = 1, clm_amount, 1
		sbth(i, j) = sbth(i, j) * dim0_ar(i, j)
	  end do
	end do
	!$omp end do
	!$omp end parallel
	
!	write(*,*) "subtrahend is now:"
!	do i = 1, ln_amount, 1
!	  write(*,*) sbth(i, :)
!	end do	
!	rwch = rwch - sbth

	!$omp parallel private( i, j )
	!$omp do schedule( static, sdvl )
	do i = 1, ln_amount, 1
	  do j = 1, clm_amount, 1
		rwch(i, j) = rwch(i, j) - sbth(i, j)
	  end do
	end do
	!$omp end do
	!$omp end parallel

!	write(*,*) "rwch is now:"
!	do i = 1, ln_amount, 1
!	  write(*,*) rwch(i, :)
!	end do
	
  end do
  
  sol_vector(1, 1) = 0.0
  write(*,*) "********* ********* ********* ********* ********* *********"
!  write(*,*) "Solution vector is:"
  write(*,*) sol_vector(1, 1)
  do j = 1, DIM, 1
	sol_vector(j+1, 1) = rwch(j, DIM+1) / rwch(j, j)
!	write(*,*) sol_vector(j+1, 1)
  end do
  
end subroutine linsolve

Steve_Lionel · ‎01-20-2020

I'd guess you have a stack overflow, which is made worse by using OpenMP. You can use a shell command to increase the stack limit (command varies by shell), and you may also need to set the OMP_STACKSIZE environment variable to a higher value. It will take some experimentation to find the best choices.

jimdempseyatthecove · ‎01-21-2020

You can also see if adding the Fortran compiler switch -heap-arrays to use heap arrays. This should not affect the performance too much seeing that the heap allocations will only occur at the (serial) entry of the subroutine (as opposed to on entry to, or use within, each parallel region.

Jim Dempsey