Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

idbc and fortran = friends or foes?

mcguiganj
Beginner
746 Views
Hey everyone,

I'm having trouble debugging a fortran program using idbc and I was hoping for some advice on how to approach the problem.

A little background, I'm using the molecular dynamics suite AMBER 10 (specifically sander and pmemd) but it keeps crashing with a SIGSEGV fault when I run a specific input file (otherwise it works okay). This leads to me believe there is a bug in sander/pmemd.

My first step was to recompile sander using the -g flag with ifort, after this I ran sander inside of idbc like so:

[john@arabinose one]$ idbc ~/amber-intel/amber10/exe/sander
Intel Debugger for applications running on Intel 64, Version 11.1, Build [1.2097.2.295]
------------------
object file name: /home/john/amber-intel/amber10/exe/sander

<-- snip -->

Reading symbols from /home/john/amber-intel/amber10/bin/sander...done.


Next step was to run sander with the input in question:

(idb) run -O -i minwat.in -o minwat.out -p alpha_ara_ome_tip4p.top -c alpha_ara_ome_tip4p.crd -r minwat.rst -ref alpha_ara_ome_tip4p.crd
[New Thread 47658539109376 (LWP 4419)]
[New Thread 47658539109376 (LWP 4419)]
Starting program: /home/john/amber-intel/amber10/bin/sander
Program received signal SIGSEGV
ewald_force (crd=, numatoms=-220440305, iac=, ico=(...), nntypes=1676387787, charge=(...), cn1=(...), cn2=(...), asol=(...), bsol=(...), eelt=0, epol=0, frc=(...), x=(...), ix=(...), ipairs=(...), xr=(...), virvsene=0, pol=(...), qm_pot_only=.FALSE., cn114=(...), cn214=(...)) at /home/john/amber-intel/amber10/src/sander/_ew_force.f:947
Warning: Source file '/home/john/amber-intel/amber10/src/sander/_ew_force.f' more recent than executable file '/home/john/amber-intel/amber10/bin/sander'.
947 if ( mpoltype == 0 )then


Oh noes! Backtrace..

(idb) backtrace
#0 0x00000000004ff036 in ewald_force (crd=, numatoms=-220440305, iac=, ico=(...), nntypes=1676387787, charge=(...), cn1=(...), cn2=(...), asol=(...), bsol=(...), eelt=0, epol=0, frc=(...), x=(...), ix=(...), ipairs=(...), xr=(...), virvsene=0, pol=(...), qm_pot_only=.FALSE., cn114=(...), cn214=(...)) at /home/john/amber-intel/amber10/src/sander/_ew_force.f:947
#1 0x00000000006e8494 in force (xx=, ix=(...), ih=, ipairs=(...), x=(...), f=(...), ener=(...), vir=(...), fs=(...), rborn=(...), reff=(...), onereff=(...), qsetup=.FALSE., do_list_update=.TRUE., .tmp.IH.len_V$65b=4) at /home/john/amber-intel/amber10/src/sander/_force.f:1044
#2 0x00000000004bac95 in runmin (xx=, ix=(...), ih=, ipairs=(...), x=(...), fg=(...), w=(...), ib=(...), jb=(...), conp=(...), winv=(...), igrp=(...), skips=(...), ene=(...), carrms=6.9532069177360673e-310, qsetup=.FALSE., .tmp.IH.len_V$5fd=4) at /home/john/amber-intel/amber10/src/sander/_runmin.f:665
#3 0x00000000004ab638 in sander () at /home/john/amber-intel/amber10/src/sander/_sander.f:1294
#4 0x00000000004a755c in multisander () at /home/john/amber-intel/amber10/src/sander/_multisander.f:291

numatoms=-220440305 .. whaa?

This is about as far as I've gotten, I'm trying to tell idbc to watch numatoms to see where/how it gets set to -220440305.. I've tried loading sander into idbc and entering "watch numatoms" but I get this:

No symbol "numatoms" in current context.
Error: no value for numatoms
Warning: Watchpoint not set.


Okay, so that's not the way... maybe if I set a break point on nb_adjust_ and then run watch numatoms it will work...

but it says:

Warning: Watchpoint not set.

... So this is where I'm stuck.. I've read skimmed through the Intel Debugger command reference (http://cache-www.intel.com/cd/00/00/40/60/406036_406036.pdf) but I can't really find anything that works to watch a variable.. what I would like to do is be able to see when numatoms is initialized, read, changed, etc so I can see why/where things go bad..

So, suggestions?

Thanks, John

Follow-up, after submitting this I tried to set a break point on ewald_force:

[john@arabinose one]$ idbc ~/amber-intel/amber10/exe/sander
Intel Debugger for applications running on Intel 64, Version 11.1, Build [1.2097.2.295]
------------------
object file name: /home/john/amber-intel/amber10/exe/sander

<--- snip --->

Reading symbols from /home/john/amber-intel/amber10/bin/sander...done.
(idb) watch ewald_force
Warning: Watchpoint not set.
(idb) break ewald_force
Breakpoint 1 at 0x4fbc1f: file /home/john/amber-intel/amber10/src/sander/_ew_force.f, line 653.
(idb) watch numatoms
No symbol "numatoms" in current context.
Error: no value for numatoms
Warning: Watchpoint not set.
(idb) run -O -i minwat.in -o minwat.out -p alpha_ara_ome_tip4p.top -c alpha_ara_ome_tip4p.crd -r minwat.rst -ref alpha_ara_ome_tip4p.crd
[New Thread 47673592351744 (LWP 4696)]
[New Thread 47673592351744 (LWP 4696)]
Starting program: /home/john/amber-intel/amber10/bin/sander

Breakpoint 1, ewald_force (crd=(...), numatoms=4071, iac=(...), ico=(...), nntypes=9, charge=(...), cn1=(...), cn2=(...), asol=(...), bsol=(...), eelt=1.3339772437713657e-322, epol=0, frc=(...), x=(...), ix=(...), ipairs=(...), xr=(...), virvsene=0, pol=(...), qm_pot_only=.FALSE., cn114=(...), cn214=(...)) at /home/john/amber-intel/amber10/src/sander/_ew_force.f:653
Warning: Source file '/home/john/amber-intel/amber10/src/sander/_ew_force.f' more recent than executable file '/home/john/amber-intel/amber10/bin/sander'.
653
(idb)

Okay, so numatoms is 4071 at this breakpoint.. interesting.. if I try to call watch numatoms it seems to work:

(idb) watch numatoms
Watchpoint 2: numatoms

Yay, but when I try to continue running by typing run (this probably isn't right..) I get this:

(idb) run
Program exited normally.
[New Thread 47962434202624 (LWP 4724)]
[New Thread 47962434202624 (LWP 4724)]
Starting program: /home/john/amber-intel/amber10/bin/sander
Old value = 0
New value = 4071

Breakpoint 1, ewald_force (crd=(...), numatoms=4071, iac=(...), ico=(...), nntypes=9, charge=(...), cn1=(...), cn2=(...), asol=(...), bsol=(...), eelt=1.3339772437713657e-322, epol=0, frc=(...), x=(...), ix=(...), ipairs=(...), xr=(...), virvsene=0, pol=(...), qm_pot_only=.FALSE., cn114=(...), cn214=(...)) at /home/john/amber-intel/amber10/src/sander/_ew_force.f:653
Warning: Source file '/home/john/amber-intel/amber10/src/sander/_ew_force.f' more recent than executable file '/home/john/amber-intel/amber10/bin/sander'.
653

Program exited normally? huh? Glad to see numatoms went from 0 to 4071 but how does it get to -220440305?

Any suggestions on how to track this pesky variable?
0 Kudos
1 Reply
Martyn_C_Intel
Employee
746 Views
Hi,
You might try using the GUI version of the debugger. Open an evaluation window and then use the symbol browser to add numatoms to it. If the problem was one of how to specify the scope of numatoms, this may work around that.
What sort of variable is numatoms and howis it declared? (E.g. in a module, in a common block, locally,...).Depending on circumstances, you may need to specify the module name, e.g.
module_name%%variable_name.

The debugger also has a Debug > Signal Handling . Menu entry where you can specify which OS signals to stop on. This may not help much, but it would stop the debugger as soon as the trap occurs and may thus allow you a more complete snapshot picture than just the stack trace.

One or two general debugging thoughts. Are you building at -O0 or -O2? (-g changes the default to -O0 unless you specify -O2 or -O3 explicitly). If you are building at -O2, have you specified -debug extended as well as -g (helps the debugger to find the correct values and locations of variables in optimized code, when they may be held in registers, for example).
Do you set the maximum stack limit to'unlimited', e.g. by ulimit -s unlimited ? (The compiler may allocate local or temporary variables on the stack; a seg fault may occur if the stack limit is exceeded. The default limit is small, {~10MB or less} on many distributions. I doubt that's the explanation here, but best to be sure).
Have you tried adding any -check options, such as -check bounds ?
There's a useful feature -gen-interfaces -warn interfaces that will check calling sequences at compile time. Getting the type of an argument wrong can sometimes lead to hard-to-debug runtime errors.
Isthis version of Amber using either OpenMP or MPI?

Incidentally, to continue after a break point, you need cont (idb mode) or continue (gdb mode). run just restarts your program at the beginning.
0 Kudos
Reply