Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Jérôme
Beginner
64 Views

Critical problem with VTune

Hello,

a Non commercial version of VTune (version 9.1 revision 8) is installed on a HPC machine (with 4 Xeon X7460). The installed OS is a Debian Squeeze (testing) with a 2.6.32-5-amd64 kernel. We recently encountered problems.

A user use it to analyze some programs. These programs sometimes segfault (I don't know if it is relevant). Anyway, since he does its experimentations, the system goes wrong.

Symptoms:
  • the user doing its experimentations encouters an error message like "Sockets error. Could not connect to remote machine 127.0.0.1. Error No. (111)." He is the only user to encounter this error, other users can still use vtune
  • once this error occurs, a critical side effect occurs: ps and htop programs crash. When it crashes, Ctrl+C has no effect, the only solution is to close the terminal. And when I try to reboot the machine, the system itself can not kill this processes, I must use magic keys to kill them.
I understand that these symptoms are not very precise, but I can not find any relevant messages in /var/log/

I created this thread in last resort. The solution for now is to uninstall vtune and not use it for this machine.

Jerome
0 Kudos
2 Replies
TimP
Black Belt
64 Views

It seems that this application attempts to access other machines, in a manner which you haven't specified. If it were by MPI, for example, it might mean that the hosts file is not correct, or the application doesn't have the right to ssh, if that is the chosen mode. It doesn't look like something which could be under the control of VTune, unless possibly to the extent that your system is set up to deny remote access to the VTune user (who should have different privilege from what the user normally has). If the application can be profiled with access limited to the same node, that might be more appropriate.
Jérôme
Beginner
64 Views

It seems that this application attempts to access other machines, in a manner which you haven't specified.

Not at all. The problem occurs with purely shared memory programs. We also tried with vtunedemo and ls:
[bash]vtl activity -d 30 -c sampling -c callgraph -master sampling -app /bin/ls -moi /bin/ls run[/bash]
I understand why you think this problem is network related, but I am confident it is not.
Reply