- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a program that when compiled with ifort v9.0 and g++ v3.4.5 and uses mpich 1.2.7 runs fine, but when I upgraded to ifort v9.1.039 Build 20060927Z, crashes with:
p1_23035: p4_error: interrupt SIGSEGV: 11
rm_l_1_23052: (0.791327) net_send: could not write to fd=5, errno = 32
p1_23035: (0.803600) net_send: could not write to fd=5, errno = 32
It is a mixed combination of FORTRAN compiled with ifort, and C code compiled with g++, and linked with g++. I have the FORTRAN modules in mpich compiled with v9.1.
It is on a Linux system:
uname -a
Linux crater 2.4.21-47.ELsmp #1 SMP Wed Jul 5 20:30:47 EDT 2006 i686 athlon i386 GNU/Linux
mpich was build with:
export FC=ifort
export F90=ifort
export CXX=/usr/local/bin/g++
export CC=/usr/local/bin/gcc
export RSHCOMMAND=ssh
export CFLAGS="-03 -static"
configure --with-flibname=mpichf90 --enable-f90 --enable-f90modules --prefix=/chome/mjcrawf/mpich/mpich1.2.7
I have also tried linking with icc with the flags as recommended in the Release Notes for ifort v1.9.039, and tried also with -i-dynamic. I have also tried the "-I/usr/include/nptl -L/usr/lib/nptl" recommendation.
The code is linked with:
g++ -g -fexceptions -fpic -DMPI -O3 -Dlinux -DANSI -I/chome/mjcrawf/mpich/1.2.7-ifc/include -I/chome/mjcrawf/mpich/1.2.7-ifc/include/f90base -L/opt/intel/fc/9.1.032/lib... -L/chome/mjcrawf/mpich/1.2.7-ifc/lib -lmpichf90 -lmpich -lifport -lifcoremt -lpthread -o its.x (I left out the list of 200+ object files).
I have the environment variable P4_GLOBMEMSIZE=110000000
I am now out of ideas, any suggestions?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, I am not writing to unit 5. It works in ifort 9.0, but not using ifort 9.1 is the gotcha. I would hope it would crash the same on both compilers if it were my write that caused the anomaly.
After configuringa parallel process debugger (totalview) and also using some write statements out default output, the anomaly occurs in a MPI_Bcast of an integer. Which one it crashes on varies slightly, there are about 100+ in a row. It moves as I debug or add writes statements.
This must be a device (or something) that MPICH sets up for inter-process communication.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page