Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

problems with Intel Trace Collector

Anonymous78
Beginner
536 Views
Hi all,
I've installed Intel Trace Collector 6 on Red Hat Linux and I useIntel Fortran 9 andMPICH 2.
After compiling the sample program in intel/mpi/1.0.2/test with
mpif90 test.f90 -c
and linking with

mpif90 test.o -L{VT_ROOT}/lib -lVT -ldwarf -lelf -lnsl -lm -lpthread -o ftest
I get the follwing error message
aborting job:
Fatal error in MPI_Comm_dup: Invalid communicator, error stack:
MPI_Comm_dup(171): MPI_Comm_dup(comm=0x5b, new_comm=0xbfffc250) failed
MPI_Comm_dup(93): Invalid communicator
rank 0 in job 31 {host_name}_33927 caused collective abort of all ranks
exit status of rank 0: return code 13
The program run Ok if I don't use ITC.
If I use Intel MPI 1, then there is no error message,
but there is no output from the program either.
Any help is very appreciated.
Paolo
0 Kudos
4 Replies
ClayB
New Contributor I
536 Views

Paolo -

Are you running on an Itanium 2 system? If so, you need to add a "-lvtunwind" flag after the "-lVT" in the linking step. This is noted on page 7 (Chapter 3) of the User's Guide for the 5.0 version.

Otherwise, can you run the application by itself, without the Trace Collector, using MPICH 2 (or Intel MPI)? That is, is the problem only when you try to run using Trace Collector or is there something going wrong at a more basic level with MPI on your system?

--clay
0 Kudos
Anonymous78
Beginner
536 Views
Clay,
I'm using IA-32(16 Dual-xeon processor cluster).
No, I don't think the problem is with the program.
This is the 'hello world' example and it runs with
either MPICH2 and IMPI1, without ITC.
I also tried other more complex cases, and they fail
only when I use ITC.
I guess my problem
must be with the installation of ITC 6, although
I followed the instructution in the user guide.
Is thereanychecks I can run to test the
installation?
Should I try v.5?
Thank you
Paolo
0 Kudos
ClayB
New Contributor I
536 Views

Paolo -

I agree, there seems to be some problem with the installation of ITC. Are the libraries visible (or loaded) on each node of the cluster? If you run 'ldd' on the binary, where will the application be looking for the shared library objects?

Can you create a statically linked version of the app and run this on the cluster nodes? Will the program run if you restrict the processes to the node that you installed ITC on?

If none of the above works and you have the libraries available on the cluster nodes, you should report the error to the Intel Premier Support site.

--clay

0 Kudos
Anonymous78
Beginner
536 Views

Clay
I've installed ITC on a NFS. If I do 'which VTserver' on the master or any of the nodes, Iget ~/libraries/itc/bin/VTserver.
If I run ldd on the app I have
libnsl.so.1 => /lib/libnsl.so.1 (0xb75c9000)
libimf.so => /home/paolo/intel/fc/9.0/lib/libimf.so (0xb73ed000)
libm.so.6 => /lib/tls/libm.so.6 (0xb73cb000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb73bb000)
librt.so.1 => /lib/tls/librt.so.1 (0xb73a7000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7270000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb75eb000)

I've statically linked version of the app but does not run on the nodes.

Paolo
0 Kudos
Reply