Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28492 Discussions

HOMME run-time failure on Altix, compiled with Intel 10.1

notahoo
Beginner
361 Views

Hi,

I have compiled HOMME on an Sgi Altix with Intel 10.1. Compilation is successfule but fails in run time. I posted the relevant part of the batch output and gdb trace below. I appreciate your insight resolving the error.

HOMME is written in Fortran, MPI, some C, and NetCDF for I/O.

MPI: --------stack traceback-------
Internal Error: Can't read/write file "/dev/mmtimer", (errno = 22)
Internal Error: Can't read/write file "/dev/sgi_fetchop", (errno = 22)
MPI: Intel Debugger for applications running on IA-64, Version 10.1-32 , Build 20070829
MPI: Attaching to program: /homme/benchmark/preqx, process 23903
MPI: [New Thread 2305843009318652880 (LWP 23903)]
MPI:
MPI: #0 0xa000000000010621
MPI: #1 0x20000000062076f0 in __waitpid () in /lib/libc-2.4.so
MPI: #2 0x20000000001c6570 in MPI_SGI_stacktraceback () in /usr/lib/libmpi.so
MPI: #3 0x20000000001c7c20 in slave_sig_handler () in /usr/lib/libmpi.so
MPI: #4 0xa0000000000107e0
MPI: #5 0x4000000000183e60 in SCHEDULE_MOD::setcycle (schedule=, cycle= {...}, edge=) at schedule_mod.F90:1017
MPI: #6 0x4000000000182c60 in SCHEDULE_MOD::genedgesched (partnumber=, lschedule= {...}, metavertex=) at schedule_mod.F90:225
MPI: #7 0x40000000001e07e0 in PREQ_INIT_MOD::preq_init (edge2dv=, edge1= {...}, edge2=, edge3= {...}, edge3p1= {...}, edge4=, red= {...}, par= {...}, timer= {...}) at ../src/preq_init_mod.F90:256
MPI: #8 0x4000000000004ab0 in prim_main () at ../src/prim_main.F90:67

MPI: -----stack traceback ends-----
MPI: /homme/benchmark/preqx, Rank 2, Process 23903: Dumping core on signal SIGSEGV(11) into directory /homme/benchmark/little-endian
MPI: MPI_COMM_WORLD rank 2 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-suse-linux"...
Using host libthread_db library "/lib/libthread_db.so.1".

warning: .dynamic section for "/lib/libc.so.6.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /usr/lib/libmpi.so...done.
Loaded symbols for /usr/lib/libmpi.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_gf_ilp64.so...done.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_gf_ilp64.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_core.so...done.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_core.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_sequential.so...don e.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_sequential.so
Reading symbols from /opt/intel/fc/10.1.008/lib/libimf.so.6...done.
Loaded symbols for /opt/intel/fc/10.1.008/lib/libimf.so.6
Reading symbols from /lib/libm.so.6.1...done.
Loaded symbols for /lib/libm.so.6.1
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libunwind.so.7...done.
Loaded symbols for /lib/libunwind.so.7
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libbitmask.so...done.
Loaded symbols for /usr/lib/libbitmask.so
Reading symbols from /usr/lib/libcpuset.so...done.
Loaded symbols for /usr/lib/libcpuset.so
Reading symbols from /usr/lib/libxpmem.so...done.
Loaded symbols for /usr/lib/libxpmem.so
Core was generated by `../preqx'.
Program terminated with signal 11, Segmentation fault.
#0 0x4000000000183e60 in schedule_mod_mp_setcycle_ ()
(gdb) bt
#0 0x4000000000183e60 in schedule_mod_mp_setcycle_ ()
#1 0x4000000000182c60 in schedule_mod_mp_genedgesched_ ()
#2 0x40000000001e07e0 in preq_init_mod_mp_preq_init_ ()
#3 0x4000000000004ab0 in prim_main () at ../src/prim_main.F90:67
#4 0x4000000000004790 in main ()
(gdb)

0 Kudos
2 Replies
Alexander_Semenov__I
361 Views
Quoting - notahoo

Hi,

I have compiled HOMME on an Sgi Altix with Intel 10.1. Compilation is successfule but fails in run time. I posted the relevant part of the batch output and gdb trace below. I appreciate your insight resolving the error.

HOMME is written in Fortran, MPI, some C, and NetCDF for I/O.

MPI: --------stack traceback-------
Internal Error: Can't read/write file "/dev/mmtimer", (errno = 22)
Internal Error: Can't read/write file "/dev/sgi_fetchop", (errno = 22)
MPI: Intel Debugger for applications running on IA-64, Version 10.1-32 , Build 20070829
MPI: Attaching to program: /homme/benchmark/preqx, process 23903
MPI: [New Thread 2305843009318652880 (LWP 23903)]
MPI:
MPI: #0 0xa000000000010621
MPI: #1 0x20000000062076f0 in __waitpid () in /lib/libc-2.4.so
MPI: #2 0x20000000001c6570 in MPI_SGI_stacktraceback () in /usr/lib/libmpi.so
MPI: #3 0x20000000001c7c20 in slave_sig_handler () in /usr/lib/libmpi.so
MPI: #4 0xa0000000000107e0
MPI: #5 0x4000000000183e60 in SCHEDULE_MOD::setcycle (schedule=, cycle= {...}, edge=) at schedule_mod.F90:1017
MPI: #6 0x4000000000182c60 in SCHEDULE_MOD::genedgesched (partnumber=, lschedule= {...}, metavertex=) at schedule_mod.F90:225
MPI: #7 0x40000000001e07e0 in PREQ_INIT_MOD::preq_init (edge2dv=, edge1= {...}, edge2=, edge3= {...}, edge3p1= {...}, edge4=, red= {...}, par= {...}, timer= {...}) at ../src/preq_init_mod.F90:256
MPI: #8 0x4000000000004ab0 in prim_main () at ../src/prim_main.F90:67

MPI: -----stack traceback ends-----
MPI: /homme/benchmark/preqx, Rank 2, Process 23903: Dumping core on signal SIGSEGV(11) into directory /homme/benchmark/little-endian
MPI: MPI_COMM_WORLD rank 2 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-suse-linux"...
Using host libthread_db library "/lib/libthread_db.so.1".

warning: .dynamic section for "/lib/libc.so.6.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /usr/lib/libmpi.so...done.
Loaded symbols for /usr/lib/libmpi.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_gf_ilp64.so...done.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_gf_ilp64.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_core.so...done.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_core.so
Reading symbols from /opt/intel/mkl/10.0.3.020/lib/64/libmkl_sequential.so...don e.
Loaded symbols for /opt/intel/mkl/10.0.3.020/lib/64/libmkl_sequential.so
Reading symbols from /opt/intel/fc/10.1.008/lib/libimf.so.6...done.
Loaded symbols for /opt/intel/fc/10.1.008/lib/libimf.so.6
Reading symbols from /lib/libm.so.6.1...done.
Loaded symbols for /lib/libm.so.6.1
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libunwind.so.7...done.
Loaded symbols for /lib/libunwind.so.7
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /usr/lib/libbitmask.so...done.
Loaded symbols for /usr/lib/libbitmask.so
Reading symbols from /usr/lib/libcpuset.so...done.
Loaded symbols for /usr/lib/libcpuset.so
Reading symbols from /usr/lib/libxpmem.so...done.
Loaded symbols for /usr/lib/libxpmem.so
Core was generated by `../preqx'.
Program terminated with signal 11, Segmentation fault.
#0 0x4000000000183e60 in schedule_mod_mp_setcycle_ ()
(gdb) bt
#0 0x4000000000183e60 in schedule_mod_mp_setcycle_ ()
#1 0x4000000000182c60 in schedule_mod_mp_genedgesched_ ()
#2 0x40000000001e07e0 in preq_init_mod_mp_preq_init_ ()
#3 0x4000000000004ab0 in prim_main () at ../src/prim_main.F90:67
#4 0x4000000000004790 in main ()
(gdb)


Dear notahoo,

First of all, it seems you used a wrong forum - you should send this request to the Fortran compiler forum.

From what you wrote, it seems you have problems with the Intel Fortran compiler. According to your trace, there is a SIGSEGV signal in the following part of the HOMME code:

1015#ifndef _PREDICT

1016 if(il .gt. 0) then

1017 elem(il)%desc%getmapV(face) = Edge%edgeptrV(i) + Cycle%ptrV 1 <-

SEGV is here !!

1018 elem(il)%desc%getmapP(face) = Edge%edgeptrP(i) + Cycle%ptrP - 1

1019 endif

1020#endif

It looks likea compiler issue. Please send your this request to the Intel Fortran Compiler for Linux and Mac OS* X forum (http://software.intel.com/en-us/forums/, section "Intel Software Development Products")

You can also try the following as possible workarounds: 1) remove -fno-alias if you are using it; 2) if you use -O3 optimization level, decrease it to -O2.

Thanks,
Alexander Semenov

0 Kudos
notahoo
Beginner
361 Views


Dear notahoo,

First of all, it seems you used a wrong forum - you should send this request to the Fortran compiler forum.

From what you wrote, it seems you have problems with the Intel Fortran compiler. According to your trace, there is a SIGSEGV signal in the following part of the HOMME code:

1015#ifndef _PREDICT

1016 if(il .gt. 0) then

1017 elem(il)%desc%getmapV(face) = Edge%edgeptrV(i) + Cycle%ptrV 1 <-

SEGV is here !!

1018 elem(il)%desc%getmapP(face) = Edge%edgeptrP(i) + Cycle%ptrP - 1

1019 endif

1020#endif

It looks likea compiler issue. Please send your this request to the Intel Fortran Compiler for Linux and Mac OS* X forum (http://software.intel.com/en-us/forums/, section "Intel Software Development Products")

You can also try the following as possible workarounds: 1) remove -fno-alias if you are using it; 2) if you use -O3 optimization level, decrease it to -O2.

Thanks,
Alexander Semenov


Thank you for your reply. I did not have -fno -alian or -O3 in my compilation. I sent theerror to Intel compiler forum as you mentioned.

0 Kudos
Reply