- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm triing to compile and run a fortran code with coarray. The compilation of the code seems ok, but when I run it I obtaine the seguent error:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 721 RUNNING AT 41ca5757f459
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 722 RUNNING AT 41ca5757f459
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
I've tried also hello world code with coarray but the resault is the same, not seem's to be a memory error! It is possible I have to load other modules?
This is my loaded modules:
Currently Loaded Modulefiles:
1) mpi/2021.6.0 2) tbb/latest 3) compiler-rt/latest 4) oclfpga/latest 5) compiler/2022.1.0 6) init_opencl/2022.1.0 7) vtune/2022.2.0
and this is the list of the avalibles modules from oneapi modulefiles:
--------------------------------------------------------------- /opt/intel/oneapi/modulefiles ---------------------------------------------------------------
advisor/2022.1.0 compiler-rt32/latest dev-utilities/2021.6.0 dnnl/latest inspector/2022.1.0 mkl/latest vpl/2022.1.0
advisor/latest compiler/2022.1.0 dev-utilities/latest dpl/2021.7.0 inspector/latest mkl32/2022.1.0 vpl/latest
ccl/2021.6.0 compiler/latest dnnl-cpu-gomp/2022.1.0 dpl/latest intel_ipp_intel64/2021.6.0 mkl32/latest vtune/2022.2.0
ccl/latest compiler32/2022.1.0 dnnl-cpu-gomp/latest icc/2022.1.0 intel_ipp_intel64/latest mpi/2021.6.0 vtune/latest
clck/2021.6.0 compiler32/latest dnnl-cpu-iomp/2022.1.0 icc/latest intel_ippcp_intel64/2021.6.0 mpi/latest
clck/latest dal/2021.6.0 dnnl-cpu-iomp/latest icc32/2022.1.0 intel_ippcp_intel64/latest oclfpga/2022.1.0
compiler-rt/2022.1.0 dal/latest dnnl-cpu-tbb/2022.1.0 icc32/latest itac/2021.6.0 oclfpga/latest
compiler-rt/latest debugger/2021.6.0 dnnl-cpu-tbb/latest init_opencl/2022.1.0 itac/latest tbb/2021.6.0
compiler-rt32/2022.1.0 debugger/latest dnnl/2022.1.0 init_opencl/latest mkl/2022.1.0 tbb/latest
Thanks in advance,
best regards
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Suggestions/Questions:
1) Does your program (and "Hello") run standalone (without MPI) on the development system?
2) Does your program run via MPI with 1 rank on the development system?
3) Does your program run via MPI with 2 ranks on the development system?
4) Does your program (and "Hello") run standalone (without MPI) on a remote system via an MPI launch from the development system?
The above will help isolate where the problem occurs.
Also, what is your command line launch method? (mpirun, mpiexec, hydra...., etc...)?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim, and thanks for your help.
My command line launch method for the code is:
ifort -O3 -coarray=shared -coarray-num-images=6 mod_bathy_field.f90 bathy.f90 `nf-config --fflags --flibs` -o pippo
The standalone without coarray not compile, in the code I've the parallelization via coarray, if I try to compile without -coarray flag, I've the following error:
mod_bathy_field.f90(21): error #8516: Coarray declarations are not allowed when the -coarray compiler option is not specified. [BFRAME]
real, allocatable,dimension(:,:)::aframe,bframe[:]
-----------------------------------------^
mod_bathy_field.f90(59): error #8516: Coarray declarations are not allowed when the -coarray compiler option is not specified. [TEDGE]
real,allocatable, dimension(:,:),codimension[:]::tedge,sedge
-------------------------------------------------^
mod_bathy_field.f90(59): error #8516: Coarray declarations are not allowed when the -coarray compiler option is not specified. [SEDGE]
real,allocatable, dimension(:,:),codimension[:]::tedge,sedge
-------------------------------------------------------^
mod_bathy_field.f90(191): error #8516: Coarray declarations are not allowed when the -coarray compiler option is not specified. [BT]
real, allocatable,dimension(:,:),codimension[:] :: bt
---------------------------------------------------^
mod_bathy_field.f90(68): error #8510: If an allocate-coarray-spec is specified then the allocate-object must be an allocatable coarray. [TEDGE]
allocate(tedge(nlons,2)[*],sedge(nlats,2)[*])
---------^
mod_bathy_field.f90(68): error #8510: If an allocate-coarray-spec is specified then the allocate-object must be an allocatable coarray. [SEDGE]
allocate(tedge(nlons,2)[*],sedge(nlats,2)[*])
---------------------------^
mod_bathy_field.f90(75): error #8363: An entity is not a coarray. [SEDGE]
self%gemo%z(0,1:nlats)=sedge(1:nlats,1)[jj]
-----------------------^
mod_bathy_field.f90(83): error #8363: An entity is not a coarray. [SEDGE]
self%gemo%z(nlons+1,1:nlats)=sedge(1:nlats,1)[jj]
-----------------------------^
mod_bathy_field.f90(91): error #8363: An entity is not a coarray. [TEDGE]
self%gemo%z(1:nlons,0)=tedge(1:nlons,1)[jj]
-----------------------^
mod_bathy_field.f90(99): error #8363: An entity is not a coarray. [TEDGE]
self%gemo%z(1:nlons,nlats+1)=tedge(1:nlons,2)[jj]
-----------------------------^
mod_bathy_field.f90(204): error #8510: If an allocate-coarray-spec is specified then the allocate-object must be an allocatable coarray. [BT]
allocate(bt(nlon,nlat)[*])
---------^
mod_bathy_field.f90(213): error #8363: An entity is not a coarray. [BT]
bt(i,j)[1]=bt(i,j)
---------^
compilation aborted for mod_bathy_field.f90 (code 1)
For the "hello", the compilation of the code in standalone mode goes ok, but if I try to compile with the -coarray flag I've the same error.
I hope I have been close enough,
thank you
Alessandro
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens when you compile with -coarray=shared -coarray-num-images=1?
Segmentation fault indicates an access (either code or data) to a virtual memory page (either resident or in page file) that is .NOT. part of your virtual address page. It may also occur when writing to a read-only page .OR. data read from an execute-only page.
Try inserting diagnostic
program
...
print *,"debug 1" ! 1st statement of program
...
print *,"debug 2" ! 1st statement following first coarray-esk statement.
...
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've inserted diagnostic in two different point of the program, the first one immediately after the beginning of the code.
If I run with -coarray=shared -coarray-num-images=1 I've the same erorr.
It is possible the problem is related to the environment configuration?
I have this module loaded at the moment:
Currently Loaded Modulefiles:
1) intel_ipp_intel64/2021.6.0 3) compiler-rt/2022.1.0 5) compiler/2022.1.0 7) mkl/2022.1.0 9) dev-utilities/2021.6.0
2) tbb/2021.6.0 4) oclfpga/2022.1.0 6) mpi/2021.6.0 itac/2021.6.0
and the avalible module are:
root@9d9bc7f4fe75:/home/data# module avail
--------------------------------------------------------------- /opt/intel/oneapi/modulefiles ---------------------------------------------------------------
advisor/2022.1.0 compiler/2022.1.0 dnnl-cpu-gomp/2022.1.0 icc/2022.1.0 intel_ippcp_intel64/2021.6.0 oclfpga/2022.1.0
ccl/2021.6.0 compiler32/2022.1.0 dnnl-cpu-iomp/2022.1.0 icc32/2022.1.0 itac/2021.6.0 tbb/2021.6.0
clck/2021.6.0 dal/2021.6.0 dnnl-cpu-tbb/2022.1.0 init_opencl/2022.1.0 mkl/2022.1.0 vpl/2022.1.0
compiler-rt/2022.1.0 debugger/2021.6.0 dnnl/2022.1.0 inspector/2022.1.0 mkl32/2022.1.0 vtune/2022.2.0
compiler-rt32/2022.1.0 dev-utilities/2021.6.0 dpl/2021.7.0 intel_ipp_intel64/2021.6.0 mpi/2021.6.0
-------------------------------------------------------------- /usr/share/modules/modulefiles ---------------------------------------------------------------
I have configured all the system in a container to genereate a trial environment, before movin everything on a phisical machine, this could be the problem?
Thank you in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Run this test:
Console Fortran app
program
print *, "Hello"
end program
Then, from the command line .AND. environment you use to test your app, issue:
mpiexec ./HelloWorld (or whatever and wherever name you call your executable)
This should run (assuming MPI in LD_LIBRARY_PATH as setup by the MPI environment script) with default number of processes at 1 per core. If this does not run, then either the environment is not set correctly .OR. installation failure. Coarrays uses MPI.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
I've used a code with diagnostic as you suggested, th result is:
root@8c6937a9880a:/home/data# ifort hello.f90 -o a.out
root@8c6937a9880a:/home/data# mpiexec ./a.out
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
But if I try to compile with -coarray flag, the result is:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 6 PID 883 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 7 PID 886 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 8 PID 889 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 9 PID 892 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 10 PID 895 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 11 PID 898 RUNNING AT 8c6937a9880a
= KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================
Where is the mistake, I'm going crazy with this problem.
Thanks for your help, best ragards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me guess:
Your run command line for the coarray version was:
mpiexec ./a.out
Note, coarray programs contain a shell that performs the mpiexec.
The proper way to launch this coarray program is:
./a.out
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if I run with ./a.out I've:
root@f9f64a14d624:/home/data# ifort hello.f90 -o a.out
root@f9f64a14d624:/home/data# ./a.out
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
while if I run mpiexec I've:
root@f9f64a14d624:/home/data# mpiexec ./a.out
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
debug 1
Hello from image 1 of 1
debug 2
Goodbye from image 1 of 1
It is possible there is a missing library or something like that?
There are another way to run fortran code with coarray on a linux system?
Thamks in advance.
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>It is possible there is a missing library or something like that?
Possibly. Or you may have an incorrect LD_LOAD_LIBRARY setting that results in an incompatible version being loaded.
>>There are another way to run Fortran code with coarray on a linux system?
Try making a shell main PROGRAM, that calls the former PROGRAM source as a subroutine (in a separate source file). Compile all files except the new shell program with -O3 into a static library. Then compile the shell program with -OD (and other coarray options as before) but now adding the new static library. If this runs, then suspect something with the LD_LOAD_LIBRARY setting or some other corruption. Also, check to see if multiple (conflicting) Intel library paths are in LD_LOAD_LIBRARY in the release environment.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my cluster environment, where the program run without problem, I've to load any module before compie and run...
Tha modules are:
export HDF5_USE_FILE_LOCKING=FALSE
module load intel/oneAPI/2021.2/all-tools
module load netcdff/serial/4.5.4
module load intel/oneAPI/2021.2/hpc-sdk
But in my oneAPI installation in the modulefiles I don't have any of thi module...
Thi are the only module I've in my oneAPI modulefiles:
advisor compiler compiler32 dev-utilities dnnl-cpu-iomp icc inspector itac mpi vpl
ccl compiler-rt dal dnnl dnnl-cpu-tbb icc32 intel_ipp_intel64 mkl oclfpga vtune
clck compiler-rt32 debugger dnnl-cpu-gomp dpl init_opencl intel_ippcp_intel64 mkl32 tbb
any idea?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Why did you wait to say the program runs on one system but not the other?
Did you install the oneAPI HPCkit on your oneAPI installation system that is having the problem? IOW is the mpi installed on your oneAPI installation system the oneAPI version of mpi?.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No the first system is a cluster with an old version of the oneAPI, the system I'm triing to pull up is a workstation with ubuntu.
Do you think the different version of oneAPI would be the cause? Why any other fortran code give me the same error?
The problem seems to be in the coarray "installation", all other fortran code was compiled ok and run, but nothing to do with coarray...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the application built with the ifx compiler suffer the same issues?
Can you install the older version of oneAPI on the problem system? (may require uninstall of the newer oneAPI).
If that works, you might then be able to determine if the issue resides with the compiler or with a library. Then attempt a mix.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, I've downgraded the oneAPI-hpckit to the 2021.2.0-2997 and now when i run the test on the my hello.f90 I've:
root@b002fe43068c:/home/data# ifort -coarray=shared hello.f90 -o a.out
root@b002fe43068c:/home/data# ./a.out
debug 1
Hello from image 1 of 12
debug 1
Hello from image 2 of 12
debug 1
Hello from image 3 of 12
debug 1
Hello from image 4 of 12
debug 1
Hello from image 5 of 12
debug 1
debug 1
Hello from image 7 of 12
debug 1
Hello from image 8 of 12
debug 1
Hello from image 9 of 12
debug 1
Hello from image 10 of 12
debug 1
Hello from image 11 of 12
debug 1
Hello from image 12 of 12
Hello from image 6 of 12
debug 2
Goodbye from image 2 of 12
debug 2
debug 2
Goodbye from image 4 of 12
debug 2
Goodbye from image 7 of 12
debug 2
Goodbye from image 10 of 12
debug 2
Goodbye from image 1 of 12
debug 2
Goodbye from image 5 of 12
debug 2
Goodbye from image 6 of 12
debug 2
Goodbye from image 9 of 12
debug 2
Goodbye from image 11 of 12
Goodbye from image 3 of 12
debug 2
Goodbye from image 8 of 12
debug 2
Goodbye from image 12 of 12
Now I've to recompile the netcdf library with the ifort and icc compiler to test my fortran code, but at the moment this seems to be the best result, the problem seems to be due to the oneAPI version or a bad installation of the newest version. After the installation and tes I'll let you know if there is all ok.
Thanks, best regards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok I've compiled all requirements, and now:
root@7e44fbb70643:/home/data# ifort -O3 -coarray=shared -coarray-num-images=6 mod_bathy_field.f90 bathy.f90 `nf-config --fflags --flibs` -o pippon
root@7e44fbb70643:/home/data# ./pippon
Find neighbor!
Find neighbor!
Find neighbor!
Find neighbor!
Find neighbor!
Find neighbor!
4 next left --> 6
5 next left --> 1
6 next left --> 1
2 next left --> 4
1 next left --> 3
3 next left --> 5
5 next left --> 1 passed
4 next left --> 6 passed
2 next left --> 4 passed
6 next left --> 1 passed
1 next left --> 3 passed
3 next left --> 5 passed
2 next right --> 6
4 next right --> 2
5 next right --> 3
6 next right --> 4
1 next right --> 6
3 next right --> 1
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
2 next top --> 1
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
4 next top --> 3
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
6 next top --> 5
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
1 next top --> 1
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000
5 next top --> 5
-1710.2000 -1724.3999 -1732.0000 -1734.3999 -1731.8000 -1722.2000 -1717.3999 -1719.5999 -1730.0000 -1747.0000 -1769.0000 -1795.0000 -1824.3999 -1856.2000 -1887.3999 -1916.0000 -1939.5999 -1939.5999 -1953.3999 -1973.5999
-1994.0000
3 next top --> 3
All seems to work properly.
Thank you all for your help!
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
New mistake,
I've compiled all the program with no error, I've used the following command:
ifort -O3 -traceback -coarray=shared -coarray-num-images=6 mod_bathy_field.f90 bathy.f90 `nf-config --fflags --flibs` -o pippon
but if I try o run it, after few second the program freezing. To run it I've used the ./pippon command.
-trceback option suggest me to see on line 42 of the main program:
[mpiexec@14a3e02b6a26] Press Ctrl-C again to force abort
forrtl: error (69): process interrupted (SIGINT)
In coarray image 6
Image PC Routine Line Source
pippon 000000000042970B Unknown Unknown Unknown
libc.so.6 00007FF498DF9520 Unknown Unknown Unknown
libc.so.6 00007FF498EBFCAB __sched_yield Unknown Unknown
libmpi.so.12.0.0 00007FF4939FA8F4 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493F8EB7E Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493B7E0A7 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493B7991E Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493940A7C Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493940388 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF4939B32F8 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493985BE9 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493965780 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493A67495 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FF493944444 MPI_Barrier Unknown Unknown
libicaf.so 00007FF498FE9263 for_rtl_ICAF_BARR Unknown Unknown
pippon 0000000000413794 MAIN__ 42 bathy.f90
pippon 000000000040B9E2 Unknown Unknown Unknown
libc.so.6 00007FF498DE0D90 Unknown Unknown Unknown
libc.so.6 00007FF498DE0E40 __libc_start_main Unknown Unknown
pippon 000000000040B8E5 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
In coarray image 2
Image PC Routine Line Source
pippon 000000000042970B Unknown Unknown Unknown
libc.so.6 00007FAF3B1A1520 Unknown Unknown Unknown
libc.so.6 00007FAF3B284F9A epoll_wait Unknown Unknown
libtcp-fi.so 00007FAEA5008241 Unknown Unknown Unknown
libtcp-fi.so 00007FAEA500EB3E Unknown Unknown Unknown
librxm-fi.so 00007FAEA440B6CC Unknown Unknown Unknown
librxm-fi.so 00007FAEA4416BD9 Unknown Unknown Unknown
librxm-fi.so 00007FAEA4416CE9 Unknown Unknown Unknown
librxm-fi.so 00007FAEA4431A3D Unknown Unknown Unknown
librxm-fi.so 00007FAEA44319C7 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF3622B3FE Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35DFA7A1 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF3638EB7E Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35F7E0A7 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35F7991E Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35D40A7C Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35D40388 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35DB32F8 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35D85BE9 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35D65780 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35E67495 Unknown Unknown Unknown
libmpi.so.12.0.0 00007FAF35D44444 MPI_Barrier Unknown Unknown
libicaf.so 00007FAF3B391263 for_rtl_ICAF_BARR Unknown Unknown
pippon 0000000000413794 MAIN__ 42 bathy.f90
pippon 000000000040B9E2 Unknown Unknown Unknown
libc.so.6 00007FAF3B188D90 Unknown Unknown Unknown
libc.so.6 00007FAF3B188E40 __libc_start_main Unknown Unknown
pippon 000000000040B8E5 Unknown Unknown Unknown
The line 42 is a sync all, but If I comment the line the error pass to the line 43, where is a call to a subroutine.
There are other informations I can supply to help me with this problem?
Best regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What is the interface to that subroutine on line 43?
IOW if the calling parameters are not specified and incorrect, or if the calling API is incorrect, an image may abend, and thus bring down the other images. And the trace back above shows those of the other images and not the one that first abended.
What is the stack trace dump when you comment out line 42?
Another peculiarity of the above stack trace is that both __sched_yield and epoll_wait appear to be calling down to somewhere in pippon?
Has your program SIGNALQQ to specify an interrupt signal handler?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jim,
the dump after comment the sync all @ line 42 is:
forrtl: error (69): process interrupted (SIGINT)
In coarray image 2
Image PC Routine Line Source
pippon 000000000042A79B Unknown Unknown Unknown
libc.so.6 00007F2897396520 Unknown Unknown Unknown
libc.so.6 00007F289745CCAB __sched_yield Unknown Unknown
libmpi.so.12.0.0 00007F2891FFA8F4 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F289258EB7E Unknown Unknown Unknown
libmpi.so.12.0.0 00007F2892171F43 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F2891F45432 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F2891F447C2 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F28920668F4 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F28920DD654 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F28920CB8AB Unknown Unknown Unknown
libmpi.so.12.0.0 00007F28920B83BE Unknown Unknown Unknown
libmpi.so.12.0.0 00007F2892046D82 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F28920659E2 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F289259CA0C MPI_Win_create Unknown Unknown
libicaf.so 00007F2897592431 for_rtl_ICAF_COIN Unknown Unknown
pippon 000000000040C214 mod_bathy_field_m 207 mod_bathy_field.f90
pippon 0000000000413A20 MAIN__ 43 bathy.f90
pippon 000000000040B9E2 Unknown Unknown Unknown
libc.so.6 00007F289737DD90 Unknown Unknown Unknown
libc.so.6 00007F289737DE40 __libc_start_main Unknown Unknown
pippon 000000000040B8E5 Unknown Unknown Unknown
forrtl: error (69): process interrupted (SIGINT)
In coarray image 6
Image PC Routine Line Source
pippon 000000000042A79B Unknown Unknown Unknown
libc.so.6 00007F78D4C2A520 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CFBCF46D Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF7FA996 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CFD8EB7E Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF97E0A7 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF97991E Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CFC94C4B Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CFC93094 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF866B1A Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF8DD654 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF8CB8AB Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF8B83BE Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF846D82 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CF8659E2 Unknown Unknown Unknown
libmpi.so.12.0.0 00007F78CFD9CA0C MPI_Win_create Unknown Unknown
libicaf.so 00007F78D4E26431 for_rtl_ICAF_COIN Unknown Unknown
pippon 000000000040C214 mod_bathy_field_m 207 mod_bathy_field.f90
pippon 0000000000413A20 MAIN__ 43 bathy.f90
pippon 000000000040B9E2 Unknown Unknown Unknown
libc.so.6 00007F78D4C11D90 Unknown Unknown Unknown
libc.so.6 00007F78D4C11E40 __libc_start_main Unknown Unknown
pippon 000000000040B8E5 Unknown Unknown Unknown
The line 48 in the dump is a call to the subroutine in question.
After commented the sync all, the process seems to goes on a little more, I understand this because it prints blocks that it did'nt before. But alwaysand only for 2 images.
exit from halo
1 inner test------------
exit from halo
2 inner test------------
exit from halo
3 inner test------------
exit from halo
4 inner test------------
exit from halo
5 inner test------------
exit from halo
6 inner test------------
exit from init
002497 3.3750 -2850.2000 -2847.8000 -2852.2000 -2846.2000 -2853.8000
002498 3.4167 -2851.2000 -2851.8000 -2850.8000 -2849.2000 -2853.8000
002499 3.4583 -2848.8000 -2851.8000 -2846.8000 -2846.2000 -2850.2000
002500 3.5000 -2834.2000 -2835.8000 0.0000 -2829.8000 -2839.2000
exit from init
-------------------------------------------
6 /home/data/bat/D4.mnt 23.2490 6731 37.2740 24.9990 7200 40.0010 479.929
6 write_bat test -----------------------
6 inner test------------
6 bbox= 974 1287 1 235
bbox exp= 23.2500 36.2917 30.1875 39.9375
-------------------------------------------
2 /home/data/bat/B4.mnt -16.2510 9480 3.5010 24.9990 7200 40.0010 479.949
2 write_bat test -----------------------
2 inner test------------
2 bbox= 26 500 1 235
bbox exp= -16.2500 3.5000 30.1875 39.9375
subroutine write_bat(medb,outf)
use iso_fortran_env, only: stdout => output_unit, &
stderr => error_unit
class (Sbathy),intent(in)::medb
character(len=*), intent(in)::outf
character*40,parameter :: odir="/home/data/bathy_img/"
character*12, parameter ::ofile='pippo1.bat'
real, allocatable, dimension(:,:),codimension[:] :: bt
integer ::i,j,ic, nlat,nlon
integer, dimension(4)::ib
print *, this_image(), 'write_bat test -----------------------'
nlat=medb%gmed%nlat
nlon=medb%gmed%nlon
ib= medb%get_inner()
print*, this_image(), 'bbox= ',ib
flush(stdout)
write(*,'(a10,4(f8.4,2x))') 'bbox exp= ',medb%gmed%lon(ib(1)),medb%gmed%lon(ib(2)),&
medb%gmed%lat(ib(3)),medb%gmed%lat(ib(4))
!if (this_image()==2) then
!print*,this_image(), medb%gemo%box%p0%plat, medb%gemo%box%p1%plat ,medb%gmed%box%p0%plat
!end if
allocate(bt(nlon,nlat)[*])
!local bathymetry
bt=medb%gmed%z(1:nlon,1:nlat)
print*, 'ciao 1'
sync all
print*, 'ciao 2'
do ic=2,num_images()
if (this_image().eq.ic) then
do j=ib(3),ib(4)
do i=ib(1),ib(2)
bt(i,j)[1]=bt(i,j)
print*, 'ciao 3'
end do
end do
end if
sync all
end do
open(stdout,file=outf,status='unknown',access='stream')
if(this_image()==1) write(stdout) bt(:,:)
close(stdout)
!open(stdout,file=trim(odir)//trim(ofile),status='unknown',access='stream')
!if (this_image()==2) write(stdout) medb%gmed%z
!close(stdout)
end subroutine write_bat
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
could you just attach both mod_bathy_field.f90 bathy.f90 so we can try your code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Ron,
I hope this help, thank you very much for the patience and the help.
best regards.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page