- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have recently tried to add a print statement in a program that I had written in February.
he program does solve regular linear systems via iterative linear projection as I call it. But that doesn't matter.
A print emit actually shouldn't change the program's stability.
But, when executing the binary I immediately get a segmentation fault when working with arrays approximately greater than 400x400. Therefore, I removed the print statement and in opposite of what I had expected I get segmentation faults wether there is the print emit or not.
Of course, as the program works it computes arithmetically correct results as I have examined.
This is really crazy behavior because I had edited the sources in February last time and I still have some "old" binaries from that time. These "old" binaries still work up to array sizes that I can hardly handle limited by memory size without producing any faults and outputting arithmetically correct results too, of course.
At the end of July I have completely reinstalled my Computer running Ubuntu 20.04 due to an SSD exchange. Considering these circumstances I assume that there is something deeply broken with my compiler.
I compile the source via »ifort -o geometric_linsolve_main geometric_linsolve_main.f90«
Here is my source code:
program linsolve_mainprogram
implicit none
integer(8) :: N
real(8) :: sttm_0, edtm_0, mainprogramtime
real(8), allocatable :: SoLEQ(:,:)
real(8), allocatable :: sv_0(:,:)
open(1, file='/mnt/ramdisk0/N.unf', form='unformatted')
read(1) N
close(1)
allocate(SoLEQ(N, N+1))
allocate(sv_0(N+1, 1))
open(100, file='/mnt/ramdisk0/SoLEQ.unf', form='unformatted')
read(100) SoLEQ
close(100)
call cpu_time(sttm_0)
call linsolve_subroutine(SoLEQ, N, sv_0)
call cpu_time(edtm_0)
open(10, file='/mnt/ramdisk0/sol_vector.unf', form='unformatted')
write(10) sv_0
close(10)
mainprogramtime = edtm_0 - sttm_0
open(2, file='/mnt/ramdisk0/fortrantime.unf', form='unformatted')
write(2) mainprogramtime
close(2)
end program linsolve_mainprogram
subroutine linsolve_subroutine (SoLEQ, N, sv_0)
implicit none
integer(8), intent(in) :: N
real(8), intent(in) :: SoLEQ(N, N+1)
integer(8) :: i, j, k, q
integer(8) :: dim_0
real(8), parameter :: alpha = 1.0, beta = 0.0
real(8) :: av_0, tol_0, sttm_0, edtm_0, con_tmp
real(8) :: cfm_0(N, N), cnc_0(N, 1), cfl_0(1, N)
real(8) :: pos_0(N, N+1), pv_0(N, 1), vec_0(N, N+1)
real(8) :: scpd_0(1, 1), scpd_1(1, N+1), lvr_0(1, N+1)
real(8), intent(out) :: sv_0(N+1, 1)
tol_0 = 1e-09 * N
av_0 = sum(abs(SoLEQ)) / (N**2 + N)
call random_seed()
call random_number(pos_0)
write(*,*) "N =", N
do i = 1, N, 1
cnc_0(i, 1) = SoLEQ(i, 1)
pos_0(i, 1) = pos_0(i, 1) - 0.5
do j = 1, N, 1
cfm_0(i, j) = SoLEQ(i, j+1)
pos_0(i, j+1) = pos_0(i, j+1) - 0.5
end do
end do
do dim_0 = 1, N, 1
write(*,*) "dim0 =", dim_0
con_tmp = cnc_0(dim_0, 1)
do j = 1, N, 1
cfl_0(1, j) = cfm_0(dim_0, j)
end do
pv_0(:, 1) = pos_0(:, N+2 - dim_0)
do j = 1, N+1 - dim_0, 1
vec_0(:, j) = pos_0(:, j) - pv_0(:, 1)
end do
scpd_0 = matmul(cfl_0, pv_0)
scpd_1 = matmul(cfl_0, vec_0)
do j = 1, N+1 - dim_0, 1
lvr_0(1, j) = (con_tmp - scpd_0(1, 1)) / scpd_1(1, j)
end do
do j = 1, N+1 - dim_0, 1
pos_0(:, j) = pv_0(:, 1) + lvr_0(1, j) * vec_0(:, j)
end do
end do
sv_0(1, 1) = 0.0
do i = 1, N, 1
sv_0(i+1, 1) = pos_0(i, 1)
end do
end subroutine linsolve_subroutine
program linsolve_mainprogram
implicit none
integer(8) :: N
real(8) :: sttm_0, edtm_0, mainprogramtime
real(8), allocatable :: SoLEQ(:,:)
real(8), allocatable :: sv_0(:,:)
open(1, file='/mnt/ramdisk0/N.unf', form='unformatted')
read(1) N
close(1)
allocate(SoLEQ(N, N+1))
allocate(sv_0(N+1, 1))
open(100, file='/mnt/ramdisk0/SoLEQ.unf', form='unformatted')
read(100) SoLEQ
close(100)
call cpu_time(sttm_0)
call linsolve_subroutine(SoLEQ, N, sv_0)
call cpu_time(edtm_0)
open(10, file='/mnt/ramdisk0/sol_vector.unf', form='unformatted')
write(10) sv_0
close(10)
mainprogramtime = edtm_0 - sttm_0
open(2, file='/mnt/ramdisk0/fortrantime.unf', form='unformatted')
write(2) mainprogramtime
close(2)
end program linsolve_mainprogram
subroutine linsolve_subroutine (SoLEQ, N, sv_0)
implicit none
integer(8), intent(in) :: N
real(8), intent(in) :: SoLEQ(N, N+1)
integer(8) :: i, j, k, q
integer(8) :: dim_0
real(8), parameter :: alpha = 1.0, beta = 0.0
real(8) :: av_0, tol_0, sttm_0, edtm_0, con_tmp
real(8) :: cfm_0(N, N), cnc_0(N, 1), cfl_0(1, N)
real(8) :: pos_0(N, N+1), pv_0(N, 1), vec_0(N, N+1)
real(8) :: scpd_0(1, 1), scpd_1(1, N+1), lvr_0(1, N+1)
real(8), intent(out) :: sv_0(N+1, 1)
tol_0 = 1e-09 * N
av_0 = sum(abs(SoLEQ)) / (N**2 + N)
call random_seed()
call random_number(pos_0)
write(*,*) "N =", N
do i = 1, N, 1
cnc_0(i, 1) = SoLEQ(i, 1)
pos_0(i, 1) = pos_0(i, 1) - 0.5
do j = 1, N, 1
cfm_0(i, j) = SoLEQ(i, j+1)
pos_0(i, j + 1) = pos_0(i, j + 1) - 0.5
end do
end do
do dim_0 = 1, N, 1
write(*,*) "dim0 =", dim_0
con_tmp = cnc_0(dim_0, 1)
do j = 1, N, 1
cfl_0(1, j) = cfm_0(dim_0, j)
end do
pv_0(:, 1) = pos_0(:, N+2 - dim_0)
do j = 1, N+1 - dim_0, 1
vec_0(:, j) = pos_0(:, j) - pv_0(:, 1)
end do
scpd_0 = matmul(cfl_0, pv_0)
scpd_1 = matmul(cfl_0, vec_0)
do j = 1, N+1 - dim_0, 1
lvr_0(1, j) = (con_tmp - scpd_0(1, 1)) / scpd_1(1, j)
end do
do j = 1, N+1 - dim_0, 1
pos_0(:, j) = pv_0(:, 1) + lvr_0(1, j) * vec_0(:, j)
end do
end do
sv_0(1, 1) = 0.0
do i = 1, N, 1
sv_0(i+1, 1) = pos_0(i, 1)
end do
end subroutine linsolve_subroutine
Does anyone have knowledge about segmentation faults? Which programming paradigms do I need to avoid them? Why do they only occur when handling huge array sizes?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Such behaviour is normally caused by a bug in your code. For example you go past an array limit and clobber (corrupt) some other data. That other data might not be important it may have been used already or may get initialised later and thus corrected. Making any change, e.g. a print statement alters the layout of your code and you now clobber a different thing. Thus the bug and appear and disappear but in reality is always there.
My recommendation is to always aim to have all compile and runtime checks enabled when debugging and to also use standards checking (/stand). Fix any items that get thrown out by that before looking harder, often you fix the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your enrichening ideas.
Thinking that there was a bug in my code I have tried to completely rewrite the code thanks to the fat that it isn't very much. Doing this I intended to write, compile and execute it step-by-step in order to figure out when the bug occurs. First, I wrote the entire IO main program and the specification section of the main of the subroutine only, compiled it and tested it.
Fortunately, This "empty program" which didn't compute anything just defining variables and arrays and allocating memory for them did not caused any errors.
So I began examining the first proper computation part that simply maps an array of random floats in [-0.5, 0.5] on the array of random floats in [0, 1] generated by „call random_number({array})“.
This little double-loop was sufficient to make the issue occur. I felt a little overwhelmed and unmighty because every computation I tried accessing huge arrays led to a segmentation fault whether how simple ans trivial it actually was.
But, this morning I remembered that I had already had the issue in January if remember correctly but, this doesn't matter so exactly. There was a simple compilation argument call „-heap-arrays“ which I had used those days at the early year.
As far as I know that happens because ifort stores allocatable arrays in the stack by default. Due to the static properties of the stack the kernel can't dedicate enough of space to the array and when accessing adresses which exceed the sections which the program is allowed to access the kernel interrupts. That's what we normally call a segmentation fault.
If somebody has something to add or to correct concerning my assumption, he or she is welcome to teach me, of course.
Of course, it still works pretty well.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using the -heap-arrays flag puts static defined variables into the heap instead of the stack and avoids the issues you described. Allocated variables are allways stored in heap by default, if I'm not wrong. You wrote it the other way round. However, in your subroutine you defined a lot of N, NxN and NxN+1 arrays statically. These had blown your stack probably.
Anyways, I would encourage you to use modules and by that explicit interfaces to your subroutines. Further allocating the big arrays in the subroutine will avoid using -heap-arrays and you're able to control the usage of heap and stack better or you should add a size, when heap shall be use (-heap-arrays size, size Is an integer value representing the size of thearrays in kilobytes. Arrays smaller than size are puton the stack). If needed you could play around the the stack-size in the linker settings (Windows OS is different to GNU/Linux). I've seen code, where the limit had been set to a very high value (LINKER -> Stack reserve size = 999999999 or /STACK:999999999 on Windows OS).
Happy coding, Johannes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are a couple of recommendations.
Add the STATUS = 'OLD' clause to the OPEN statements for existing files. Without that, if you did not set the correct path to the files in the OPEN statement, an empty new file will be created, and the subsequent READ statements on that file will fail.
If you want more help, please provide all the source and data files, zipped together and attached to your reply.
Too often, the compiler is suspected to be responsible when the user or some other software is at fault.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>But, when executing the binary I immediately get a segmentation fault when working with arrays approximately greater than 400x400.>>>
Default user mode stack allocation per thread on Linux is ~8MiB. As pointed out by the others you statically allocated a lot of arrays of primitive type real(8). The total size of those arrays is not know, but you mentioned, that allocation greater than 400x400 double precision elements i.e. (1,280,000 bytes) caused a segmentation fault, probably by touching some kind of guard page or overflowing the previous frame partly or completely.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Large automatic arrays can be a problem. It appears that there is the potential to overflow the available stack, so rather than some automatic arrays, I would recommend using ALLOCATABLE arrays.
I also don't understand why you would have two files : N.unf and SoLEQ.unf, as they are not independent.
Finally I would recommend some reporting of the reading stage for SoLEQ.unf and possibly reducing the size of the unformatted records ( You have suggested that N is an 8-byte integer ). The following is an alternative approach to this phase of your program, which assumes you can modify the creation of file SoLEQ.unf.
! ...
integer :: i, iostat, stat
integer(8) :: mem
open (100, file='/mnt/ramdisk0/SoLEQ.unf', form='unformatted')
read (100, iostat=iostat) N
if (iostat /= 0) then
write (*,*) ' FILE ERROR : problem reading N, iostat=',iostat
stop
end if
write (*,*) 'problem size N =',n
!
allocate (sv_0(N+1, 1), stat=stat)
allocate (SoLEQ(N, N+1), stat=stat)
if (stat/=0) then
write (*,*) ' ALLOCATE ERROR :: could not allocate SoLEQ(N, N+1) : stat=',stat
stop
end if
!
do i = 1,N+1
read(100, iostat=iostat) SoLEQ(:,I)
if (iostat /= 0) then
write (*,*) ' FILE ERROR : problem reading SoLEQ, iostat=',iostat
stop
end if
end do
write (*,*) 'SoLEQ recovered from /mnt/ramdisk0/SoLEQ.unf'
!
close(100)
!
mem = ( N*(N+1)*3 & ! 11 declared arrays bytes
+ N*N *1 &
+ (N+1) *3 &
+ N *3 &
+ 1 *1 ) * 8
write (*,*) 'Expected memory demand =',mem,' bytes'
! ...
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page