I work with cluster/MPI code. Most of my debugging has been done with a set of scripts I built on top of gdb. 90% or more of what I do involves using this as a simple segfault locator: the program being debugged just runs until it crashes, then I trace back & examine variables to try to find out why.
So one thing that's important to me is to be able to examine any variable, even if the compiler has somehow optimized it away. It's really annoying to try to look at something you suspect might be the cause of a problem, only to be told that there's no such variable name.
Thanks James, that's great feedback.
I'll forward the information regarding the GUI, the documentation and the xmm registers to the engineering team.
I'll also take a closer look at the CodeView debugger to learn something more about its interface design.
In the mean time, IDB is being enhanced to do more optimized code debugging (including some initial support for register variables, split lifetime variables and in-lined functions). You're likely to see this in the 8.1 final release. You're also likely to see these areas being enhanced (and some probable additions along the lines of semantic stepping through optimized code) in future releases.
In addition, the MPI support in IDB lets you control application instances spread across nodes in a cluster from a single debugger session. It uses an aggregation network to concentrate the program output in a way that allows for a nice clean command line experience (while giving you access to possible variations in that output). It does so with near linear performance on large cluster configurations (tested up to 2500 nodes).
If you have an opportunity, please give it a try and let me know what you think.
I recommand totalview as the debug tools, it is really good tools for debugging mpi program. it is GUI and has many other advancecd characters.
For debugging mpi program, it is most important to locate the problem, that 's a key.
I too believe Etnus TotalView to be the Gold Standard in debugger technology in general, and for debugging cluster parallel applications in particular.
There are actually several good tools available for debugging clustered codes. For instance, theIntel Cluster Tools (i.e. Intel Trace Collector and Intel Trace Analyzer) can be used together as an effective debugger on clustered codes. Both IDB and Streamline DDT are also viable debugger options. IDB is provided without cost on the Intel Compiler kits and provides a nice clean command line interface (with an aggregation network for concentrating program and debugger output) for clustered codes. Streamline DDT is a premium debugger that provides more graphical features while coming in at a lower cost per seat than Etnus TotalView.
There are actually several good tools available for debugging clustered codes. For instance, theIntel Cluster Tools (i.e. Intel Trace Collector and Intel Trace Analyzer) can be used together as an effective debugger on clustered codes.
Actually, the Intel tools that you mention here are performance analyzers. They can be used, to some small degree, as debuggers if you are unsure where messages might have originated from or gone to. However, that activity would require such painstaking steps that I would recommend using 'printf' to trace message traffic, instead. TotalView has the ability to track message queues that is really quite useful for this type of debugging.
I've tried the IDB to track down a problem I was having with an MPI code (though the error was not related to MPI, as far as I could tell). I started by launching four separate instances of the debugger and then attaching to the MPI processes at an artificial pause inserted into the code for just this purpose. I did not have much luck. Your message seemed to imply that their could be some easier method for using IDB in such a situation. If there is, could you briefly share what that usage would be? Do all the processes need to be on the same node or can they be spread out across a cluster?
mpirun -dbg=idb -np N [other mpich options] application [application arguments] [--idbopt idb options]