- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Community, I do have some trouble with Segmentation faults that occur when I run self-built fortran-routines from Python on.
I wrap these routines into Python via f2py3.
The routines have the purpose to solve linear square systems that have a definite solution. They're multithreaded via OpenMP.
When running them with linear systems of high dimensions Python crashes and it returns an error message such as:
Segmentation fault: 11
So, I could suppose that my code but simply bad and it doesn't work at all. But, as I've already mentioned these issues do not occur for all linear systems.
Apart form the circumstance that it works for little linear systems and that it doesn't for huge ones, there are two important things that I've recognised referring to the OS and the hardware. The "limit value" for getting the error message is around 520 on macOS on a 2012 13-inch MacbookPro with a Core i5 3210m and 16GB of symmetrically installed memory.
The "limit value" is around 370 on Linux (Kubuntu 19.10) with Kernel 5.3 on a ThinkPad P43S with a Core i7-8565U and 24GB of asymmetrically installed memory. The next strange thing is that I have found a partial solution which works only on the MacBookPro with macOS but doesn't work on the ThinkPad with Linux. This means that I have built adapted subroutines for each dimension up to 2500 particularly both contained by multiple files and contained by one huge file.
Of course, I have not edited thousands of subroutines manually. This has been done by a Python script very well and the numerical results are correct too if I don't get an error message. The only difference of the source code is that the dimension isn't an input argument, of course.
Perhaps, anyone has had similar issues and knows a solution to fix them or he/she just knows the reasons why this happens because I'm still a beginner at programming.
I submit an example of my source code and the used command to build it here. The adapted ones are similar, of course. Don't wonder about the comments with "write" statements.
They have been used for debugging.
f2py3 --fcompiler=intelem --f90flags='-qopenmp' -c -m linsolve linsolve.f90 -liomp5
subroutine linsolve (SoLEQ, N, sol_vector) use omp_lib implicit none integer(kind=8), intent(in) :: N real(kind=8), intent(in) :: SoLEQ(:,:) integer(kind=8) :: i, j, q, n0, k0, m0, sdvl = 10, threadnumber = 2 integer(kind=8) :: ln_amount, clm_amount, dim0, DIM real(kind=8) :: av0, tol0 integer(kind=8) :: id0(N,N), id1(N,N), id2(N+1,N+1), bcrw(N,1), bccw(1,N+1) real(kind=8) :: rwch(N,N+1), rwch1(N,N+1), eq_ref(1,N+1) real(kind=8) :: sbth(N,N+1), dim0_ar(N,N+1), dim0_clm(N,1) real(kind=8), intent(out) :: sol_vector(N+1, 1) ! call omp_set_num_threads(threadnumber) ln_amount = size(SoLEQ(:, 1)) clm_amount = size(SoLEQ(1, :)) DIM = clm_amount - 1 ! av0 = sum(abs(SoLEQ)) / (clm_amount * ln_amount) tol0 = 1e-09 * DIM ! write(*,*) "SoLEQ is:" ! do i = 1, ln_amount ! write(*,*) SoLEQ(i, :) ! end do id0(:,:) = 0 id2(:,:) = 0 do i = 1, DIM, 1 id0(i, i) = 1 id2(i+1, i) = 1 end do id2(1, clm_amount) = 1 ! rwch = matmul(SoLEQ, id2) rwch(:,:) = 0.0 !$omp parallel private( n0, m0, k0 ) !$omp do schedule( static, sdvl ) do n0 = 1, N, 1 do m0 = 1, N+1, 1 do k0 = 1, N+1, 1 rwch(n0, m0) = rwch(n0, m0) + SoLEQ(n0, k0) * id2(k0, m0) end do end do end do !$omp end do !$omp end parallel bcrw(:,1) = 1 bccw(1,:) = 1 ! write(*,*) "rwch is:" ! do i = 1, ln_amount, 1 ! write(*,*) rwch(i, :) ! end do do dim0 = 1, DIM, 1 ! write(*,*) "dim0 =", dim0 ! ln_amount = size(SoLEQ(:, 1)) ! where ((abs(rwch)) >= tol0) ! rwch = rwch ! elsewhere ! rwch = 0.0 ! end where !$omp parallel private( i, j ) !$omp do schedule( static, sdvl ) do i = 1, ln_amount, 1 do j = 1, clm_amount, 1 if (((abs(rwch(i, j))) >= tol0)) then rwch(i, j) = rwch(i, j) else rwch(i, j) = 0.0 end if end do end do !$omp end do !$omp end parallel do q = dim0, DIM, 1 if ((rwch(q, dim0)) /= 0.0) exit end do ! write(*,*) "q =", q ! write(*,*) "It's", rwch(q, dim0) id1 = id0 id1(dim0, dim0) = 0 id1(q, q) = 0 id1(dim0, q) = 1 id1(q, dim0) = 1 ! do i = 1, DIM, 1 ! write(*,*) id1(i, :) ! write(*,*) "" ! end do ! rwch = matmul(id1, rwch) rwch1 = rwch rwch(:,:) = 0.0 !$omp parallel private( n0, m0, k0 ) !$omp do schedule( static, sdvl ) do n0 = 1, N, 1 do m0 = 1, N+1, 1 do k0 = 1, N, 1 rwch(n0, m0) = rwch(n0, m0) + id1(n0, k0) * rwch1(k0, m0) end do end do end do !$omp end do !$omp end parallel !$omp parallel private( i ) !$omp do schedule( static, sdvl ) do i = 1, clm_amount, 1 eq_ref(1, i) = rwch(dim0, i) / rwch(dim0, dim0) ! write(*,*) rwch(dim0, i), "/", rwch(dim0, dim0), "=", eq_ref(1, i) end do !$omp end do !$omp end parallel ! write(*,*) "eq_ref is:" ! write(*,*) eq_ref(1, :) ! sbth = matmul(bcrw, eq_ref) sbth(:,:) = 0.0 !$omp parallel private( n0, m0 ) !$omp do schedule( static, sdvl ) do n0 = 1, N, 1 do m0 = 1, N+1, 1 sbth(n0, m0) = eq_ref(1, m0) end do end do !$omp end do !$omp end parallel sbth(dim0, :) = 0 ! write(*,*) "subtrahend is:" ! do i = 1, ln_amount, 1 ! write(*,*) sbth(i, :) ! end do !$omp parallel private( i ) !$omp do schedule( static, sdvl ) do i = 1, ln_amount, 1 dim0_clm(i, 1) = rwch(i, dim0) end do !$omp end do !$omp end parallel ! dim0_ar = matmul(dim0_clm, bccw) dim0_ar(:,:) = 0.0 !$omp parallel private( n0, m0 ) !$omp do schedule( static, sdvl ) do n0 = 1, N, 1 do m0 = 1, N+1, 1 dim0_ar(n0, m0) = dim0_clm(n0, 1) end do end do !$omp end do !$omp end parallel ! write(*,*) "dim0_ar is:" ! do i = 1, ln_amount, 1 ! write(*,*) dim0_ar(i, :) ! end do ! sbth = sbth * dim0_ar !$omp parallel private( i, j ) !$omp do schedule( static, sdvl ) do i = 1, ln_amount, 1 do j = 1, clm_amount, 1 sbth(i, j) = sbth(i, j) * dim0_ar(i, j) end do end do !$omp end do !$omp end parallel ! write(*,*) "subtrahend is now:" ! do i = 1, ln_amount, 1 ! write(*,*) sbth(i, :) ! end do ! rwch = rwch - sbth !$omp parallel private( i, j ) !$omp do schedule( static, sdvl ) do i = 1, ln_amount, 1 do j = 1, clm_amount, 1 rwch(i, j) = rwch(i, j) - sbth(i, j) end do end do !$omp end do !$omp end parallel ! write(*,*) "rwch is now:" ! do i = 1, ln_amount, 1 ! write(*,*) rwch(i, :) ! end do end do sol_vector(1, 1) = 0.0 write(*,*) "********* ********* ********* ********* ********* *********" ! write(*,*) "Solution vector is:" write(*,*) sol_vector(1, 1) do j = 1, DIM, 1 sol_vector(j+1, 1) = rwch(j, DIM+1) / rwch(j, j) ! write(*,*) sol_vector(j+1, 1) end do end subroutine linsolve
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd guess you have a stack overflow, which is made worse by using OpenMP. You can use a shell command to increase the stack limit (command varies by shell), and you may also need to set the OMP_STACKSIZE environment variable to a higher value. It will take some experimentation to find the best choices.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can also see if adding the Fortran compiler switch -heap-arrays to use heap arrays. This should not affect the performance too much seeing that the heap allocations will only occur at the (serial) entry of the subroutine (as opposed to on entry to, or use within, each parallel region.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page