- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm running a computational fluid dynamics MPI code. Originally, the grid size is around 2000x1000. When I change it to 4000x1000, my code aborts at the early stage of the run.
I tried to compile using the debug option and it works fine.
I then tried to pinpoint the location at which the abort occurs in the release version.
I inserted :
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "x"
where x is 0 to 10 at different locations
Strangely, the error occurs right after the code enters a subroutine:
program ....
...
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "0"
call initial
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "2"
...
end program
subroutine initial
integer :: i,j,ierr
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "1"
....
end subroutine initial
So it prints "0" and then aborts. If run correctly, it should print "0", "1", "2".
The strange thing is that it does not even print "1". I am only entering a subroutine. There is no allocation of data or whatsoever. Why does the code abort here?
How should I debug? In the release version, I'm using "-O3 -r8 -w95 -c -save -ipo"
In the debug version, I'm using "-g -debug all -check all -implicitnone -warn unused -fp-stack-check -heap-arrays -ftrapuv -check pointers -check bounds -r8 -w95 -c -O0 -save"
I'm running a computational fluid dynamics MPI code. Originally, the grid size is around 2000x1000. When I change it to 4000x1000, my code aborts at the early stage of the run.
I tried to compile using the debug option and it works fine.
I then tried to pinpoint the location at which the abort occurs in the release version.
I inserted :
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "x"
where x is 0 to 10 at different locations
Strangely, the error occurs right after the code enters a subroutine:
program ....
...
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "0"
call initial
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "2"
...
end program
subroutine initial
integer :: i,j,ierr
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "1"
....
end subroutine initial
So it prints "0" and then aborts. If run correctly, it should print "0", "1", "2".
The strange thing is that it does not even print "1". I am only entering a subroutine. There is no allocation of data or whatsoever. Why does the code abort here?
How should I debug? In the release version, I'm using "-O3 -r8 -w95 -c -save -ipo"
In the debug version, I'm using "-g -debug all -check all -implicitnone -warn unused -fp-stack-check -heap-arrays -ftrapuv -check pointers -check bounds -r8 -w95 -c -O0 -save"
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
> There is no allocation of data or whatsoever. Why does the code abort here?
Do you have any local variables, specifically arrays or structures of appreciable size, in that subroutine? If so, you may be running out of stack space. What, if any, error messages do you see when the program aborts?
There is a compiler option to have local arrays allocated on the heap, instead of the stack. There is a linker (Windows) or shell (Linux) option to set a specified maximum stack allocation.
Do you have any local variables, specifically arrays or structures of appreciable size, in that subroutine? If so, you may be running out of stack space. What, if any, error messages do you see when the program aborts?
There is a compiler option to have local arrays allocated on the heap, instead of the stack. There is a linker (Windows) or shell (Linux) option to set a specified maximum stack allocation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Linux, the stack limit is not a linker option, it is a process option with kernel configurastion limits.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply mecej4,
As you see below:
subroutine initial
integer :: i,j,ierr
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "1"
call gen_xy_uv
there's only integer i,j,ierr declared. Before evening getting to gen_xy_uv, "1" should be printed out.
However, it didn't. I also use MPI_Barrier(MPI_COMM_WORLD,ierr) to ensure that at least all procs are synchronize before moving forward.
I also got these messages below. I wonder if they are relevant.
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_ess_tm: /opt/openmpi-1.5.3/lib/openmpi/mca_ess_tm.so: cannot open shared object file: Text file busy (ignored)
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_plm_rsh: /opt/openmpi-1.5.3/lib/openmpi/mca_plm_rsh.so: cannot open shared object file: Text file busy (ignored)
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_iof_orted: /opt/openmpi-1.5.3/lib/openmpi/mca_iof_orted.so: cannot open shared object file: Text file busy (ignored)
Thanks
As you see below:
subroutine initial
integer :: i,j,ierr
call MPI_Barrier(MPI_COMM_WORLD,ierr); if (myid==0) print *, "1"
call gen_xy_uv
there's only integer i,j,ierr declared. Before evening getting to gen_xy_uv, "1" should be printed out.
However, it didn't. I also use MPI_Barrier(MPI_COMM_WORLD,ierr) to ensure that at least all procs are synchronize before moving forward.
I also got these messages below. I wonder if they are relevant.
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_ess_tm: /opt/openmpi-1.5.3/lib/openmpi/mca_ess_tm.so: cannot open shared object file: Text file busy (ignored)
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_plm_rsh: /opt/openmpi-1.5.3/lib/openmpi/mca_plm_rsh.so: cannot open shared object file: Text file busy (ignored)
[n12-52:08178] mca: base: component_find: unable to open /opt/openmpi-1.5.3/lib/openmpi/mca_iof_orted: /opt/openmpi-1.5.3/lib/openmpi/mca_iof_orted.so: cannot open shared object file: Text file busy (ignored)
Thanks
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page