Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
46 Views

vtune cannot connect to mic

Hello,

I'm trying to use Vtune on a host with Xeon Phi cards. I have Intel Xeon Phi coprocessor (native) as the target system and 0(mic0) as the card number in the amplxe-gui. When I try to do an analysis, vtune complains:

ssh: Could not resolve hostname 0(mic0): Name or service not known

(see attached screen shot for vtune version info and the error message)

I can do 'ssh mic0' without any problem though, and mic0 is correctly set up in /etc/hosts. I'm member of the vtune group and the sep and vtssp drivers are loaded on the mic and on the host (here also pax).

Any suggestions about what could be wrong?

0 Kudos
11 Replies
Highlighted
Employee
46 Views

Hi Hans-Christian S.:

Are you performing the ssh and VTune Amplifier functionality both as root?  Or, both as non-root?

0 Kudos
Highlighted
46 Views

I'm working with a normal user account, non root.

Right now I have the feeling that the vtune GUI mixes up the content of the drop down box "0(mic0)" with the hostname of the mic "mic0". As I mentioned before, I can do passwordless ssh to the mic0 with the same user name as on the host. I don't know, however, if vtune is attempting to do this, or if it wants to connect with another username, or if the error is completely misleading and this has nothing to do with ssh.

Thanks for taking the time to look at this,

HC

 

0 Kudos
Highlighted
46 Views

Hello,

It would be helpful if you could provide output from the following command:

<vtune_install_dir>/bin64/sep -version -mic

on the box with the mic card where paswordless access work fine for your account.

Thank you, Regards, Dmitry

0 Kudos
Highlighted
46 Views

Here it is:

[l_stadler_h@merlinx01 ~]$ /nfs/opt/intel/intel-15/vtune_amplifier_xe/bin64/sep -version -mic
Sampling Enabling Product version: 3.15 (private) built by patbbinn on Jan 30 2015 02:39:55
SEP User Mode Version: 3.15.5
mic 0 (merlinx01-mic0.psi.ch): SEP driver version 3.15.5
mic 1 (merlinx01-mic1.psi.ch): SEP driver version 3.15.5

and the relevant section of /etc/hosts

172.31.1.1      merlinx01-mic0.psi.ch mic0 #Generated-by-micctrl
172.31.2.1      merlinx01-mic1.psi.ch mic1 #Generated-by-micctrl

I can do passwordless ssh with all these names for mic card number 0 and 1.

Thanks, HC

 

0 Kudos
Highlighted
Employee
46 Views

First at all, you need to check if sep is workable on mic.

Do "# ssh mic0 lsmod | grep sep"

Sometime, vtune component was not installed properly on mic, you need to do below under vtune_amplifier_xe_2015/bin64/k1om/

 

# ./sep_micboot_install.sh 

SEP configuration files have been successfully installed in the configuration directory.
Please run  "service mpss restart" to start the SEP service.
# service mpss restart

Then use below to verify:

# amplxe-cl -collect advanced-hotspots --target-system=mic-native:0 -search-dir=parent-path-of-bin-src -- path-mic/program

 

 

0 Kudos
Highlighted
Employee
46 Views

Hi Hans-Christian.

We need some logs from you to triage issue.

Could you share the for me?

Instructions:

  • create directory at user home dir (for example, logs)
  • cd to ~/logs
  • export MSNGR_DEBUG=1
  • export EXCHANGE_DEBUG=1
  • export AMPLXE_LOG_DIR=~/logs
  • amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -- path-mic/program
  • send all from logs dir

Thanks, Kirill

0 Kudos
Highlighted
46 Views

Peter: the sep module is running on the mic cards

Kirill: Starting vtune in this way works, but it gets stuck and doesn't react to CTRL-C. I waited for a long time (much longer than the program should take to do it's work) between pressing CTRL-C. I had to kill -9 the process amplxe-runss to make it stop.

[l_stadler_h@merlinx01 ~]$ export MSNGR_DEBUG=1
[l_stadler_h@merlinx01 ~]$ export EXCHANGE_DEBUG=1
[l_stadler_h@merlinx01 ~]$ export AMPLXE_LOG_DIR=~/logs
[l_stadler_h@merlinx01 ~]$ amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -- /nfs/home/l_stadler_h/matrix-test/matrix-test-mic
amplxe: Using target: mic-native:0
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /nfs/home/l_stadler_h/r000ah -command stop.

^Camplxe: CTRL-C signal is received.
^Camplxe: Error: Cannot handle the given command due to an internal error.

The tarred version of the logs directoryis attached. I saw something like architecture not supported in the logs, so

[l_stadler_h@merlinx01 logs]$ lsb_release -d
Description:    CentOS Linux release 7.0.1406 (Core)
[l_stadler_h@merlinx01 logs]$ uname -a
Linux merlinx01.psi.ch 3.10.0-123.20.1.el7.x86_64 #1 SMP Thu Jan 29 18:05:33 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

and on the mic cards we have MPSS 3.4

Thanks for taking the time to look at this,

HC

 

0 Kudos
Highlighted
Employee
46 Views

>$ amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -- /nfs/home/l_stadler_h/matrix-test/matrix-test-mic
>amplxe: Using target: mic-native:0
>amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /nfs/home/l_stadler_h/r000ah -command stop.

>^Camplxe: CTRL-C signal is received.
>^Camplxe: Error: Cannot handle the given command due to an internal error.

If this is a long run application, is it possible that you can try below avoid ctrl-C?

amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -duration 30 -- /nfs/home/l_stadler_h/matrix-test/matrix-test-mic

0 Kudos
Highlighted
Employee
46 Views

Hi Hans-Christian.

Is it possible to run clean collection now?

I mean,

  • run new console (without debug amplxe env)
  • reinstall vtune on card bin64/k1om/micboot_install.sh
  • restart mpss service
  • run collection with duration (like Peter suggested)

Thanks, Kirill

0 Kudos
Highlighted
46 Views

Hurray, yes vtune seems to work now, thanks a lot!

To make the test program produce output I also had to make a script, that sets LD_LIBRARY_PATH (this was not the reason for vtune to get stuck before though, because now vtune also works if I don't set LD_LIBRARY path, although it doesn't collect much useful stuff in that case ;-).

I included a log of the activities and hope this looks like it should.

Thanks so much, HC

Reinstalling sep:

[root@merlinx01 vtune_amplifier_xe]# bin64/k1om/micboot_install.sh
SEP configuration files have been successfully installed in the configuration directory.
Please run  "service mpss restart" to start the SEP service.
itt successfully installed.
Please restart mpss service.
_amplxe_vtune_amplifier_xe_2015.2.0.393444 successfully installed.

Enabling prebuild driver installation.
vtsspp driver is ready to install.
vtsspp driver successfully installed for group "500".

Restart Intel Manycore Platform Software Stack (MPSS) to complete installation.
  sudo service mpss restart
[root@merlinx01 vtune_amplifier_xe]# service mpss restart
Restarting mpss (via systemctl):                           [  OK  ]


Run natively via ssh (time in seconds):

[l_stadler_h@merlinx01 matrix-test]$ cat micrun.sh
#!/bin/bash

export LD_LIBRARY_PATH=/nfs/opt/intel/intel-15/lib/mic:/nfs/opt/intel/intel-15/composerxe/lib/mic:/nfs/opt/intel/intel-15/composerxe/mkl/lib/mic

exec $@
[l_stadler_h@merlinx01 matrix-test]$ ssh mic0 /nfs/home/l_stadler_h/matrix-test/micrun.sh  /nfs/home/l_stadler_h/matrix-test/matrix-test-mic 800 10000 s
single precision
initializing ...
dimensions: 800x800
loop: 10000
cleanup ...
time: 1.32397

Run vtune:

[l_stadler_h@merlinx01 matrix-test]$ source /nfs/opt/intel/intel-15/bin/compilervars.sh intel64
[l_stadler_h@merlinx01 matrix-test]$ source /nfs/opt/intel/intel-15/vtune_amplifier_xe/amplxe-vars.sh
Copyright (C) 2009-2014 Intel Corporation. All rights reserved.
Intel(R) VTune(TM) Amplifier XE 2015 (build 393444)
[l_stadler_h@merlinx01 matrix-test]$ amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -duration 30 -follow-child -- /nfs/home/l_stadler_h/matrix-test/micrun.sh  /nfs/home/l_stadler_h/matrix-test/matrix-test-mic 800 10000 s
amplxe: Using target: mic-native:0
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /nfs/home/l_stadler_h/matrix-test/r001ah -command stop.
single precision
initializing ...
dimensions: 800x800
loop: 10000
cleanup ...
time: 2.96683
amplxe: Collection stopped.
amplxe: Using result path `/nfs/home/l_stadler_h/matrix-test/r001ah'
amplxe: Executing actions 16 % Resolving module symbols                        
amplxe: Locating file `dma_module.ko' on the remote system
amplxe: Locating file `/usr/lib64/libcoi_device.so.0' on the remote system
amplxe: Locating file `/lib64/libcrypto.so.1.0.0' on the remote system
amplxe: Locating file `/var/volatile/tmp/coi_procs/1/5724/amplxe-michelper' on the remote system
amplxe: Locating file `/usr/sbin/sshd' on the remote system
amplxe: Locating file `/usr/lib/debug/lib/modules/2.6.38.8+mpss3.4.2/vmlinux' on the remote system
amplxe: Locating file `/lib64/libc-2.14.90.so' on the remote system
amplxe: Locating file `/nfs/opt/intel/intel-15/composer_xe_2015.2.164/mkl/lib/mic/libmkl_core.so' on the remote system
amplxe: Warning: Cannot locate file `dma_module.ko'.
amplxe: Executing actions 16 % Resolving information for `dma_module'          
amplxe: Locating file `/usr/lib64/libittnotify.so' on the remote system
amplxe: Executing actions 16 % Resolving information for `libcoi_device.so.0'  
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libcoi_device.so.0/e73b2c18d6393b4790ab9261d05b3ff2/libcoi_device.so.0'.
amplxe: Locating file `sep3_15.ko' on the remote system
amplxe: Executing actions 16 % Resolving information for `libcrypto.so.1.0.0'  
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libcrypto.so.1.0.0/2fc14f872d9ca6c49501676046029554/libcrypto.so.1.0.0'.
amplxe: Locating file `micscif.ko' on the remote system
amplxe: Warning: Cannot locate file `/var/volatile/tmp/coi_procs/1/5724/amplxe-michelper'.
amplxe: Executing actions 17 % Resolving information for dangling locations    
amplxe: Locating file `/usr/lib64/libstdc++.so.6.0.16' on the remote system
amplxe: Executing actions 17 % Resolving information for `sshd'                
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/sshd/4354ab1d5cd53de91c85d8abe4000899/sshd'.
amplxe: Locating file `/lib64/ld-2.14.90.so' on the remote system
amplxe: Locating file `/boot/vmlinuz-2.6.38.8+mpss3.4.2' on the remote system
amplxe: Executing actions 17 % Resolving information for `libc-2.14.90.so'     
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libc-2.14.90.so/45397051311f95599054edd6ac8d616d/libc-2.14.90.so'.
amplxe: Locating file `ringbuffer.ko' on the remote system
amplxe: Executing actions 17 % Resolving information for `libittnotify.so'     
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libittnotify.so/85c0a1dc81a877c7b3bed6097324ea23/libittnotify.so'.
amplxe: Locating file `intel_micveth.ko' on the remote system
amplxe: Executing actions 17 % Resolving information for `libmkl_core.so'      
amplxe: Warning: Cannot locate file `sep3_15.ko'.
amplxe: Executing actions 17 % Resolving information for `sep3_15'             
amplxe: Locating file `/nfs/home/l_stadler_h/matrix-test/matrix-test-mic' on the remote system
amplxe: Executing actions 18 % Resolving information for `sep3_15'             
amplxe: Locating file `/nfs/opt/intel/intel-15/composer_xe_2015.2.164/mkl/lib/mic/libmkl_intel_thread.so' on the remote system
amplxe: Warning: Cannot locate file `micscif.ko'.
amplxe: Executing actions 18 % Resolving information for `micscif'             
amplxe: Locating file `/lib64/libpthread-2.14.90.so' on the remote system
amplxe: Executing actions 18 % Resolving information for `libstdc++.so.6.0.16'
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libstdc++.so.6.0.16/29fe0f4c123c3fb8399e4468d2195b83/libstdc++.so.6.0.16'.
amplxe: Locating file `/nfs/opt/intel/intel-15/composer_xe_2015.2.164/compiler/lib/mic/libiomp5.so' on the remote system
amplxe: Executing actions 18 % Resolving information for `ld-2.14.90.so'       
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/ld-2.14.90.so/5152c3600c8161c8d9c8d152b5ad591e/ld-2.14.90.so'.
amplxe: Locating file `vmlinux-2.6.38.8+mpss3.4.2' on the remote system
amplxe: Warning: Cannot locate file `ringbuffer.ko'.
amplxe: Executing actions 18 % Resolving information for `ringbuffer'          
amplxe: Warning: Cannot locate file `intel_micveth.ko'.
amplxe: Executing actions 19 % Resolving information for `libpthread-2.14.90.so
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libpthread-2.14.90.so/8c5545cd5a19b4dfe119b65a04d9e90c/libpthread-2.14.90.so'.
amplxe: Executing actions 19 % Resolving information for `libiomp5.so'         
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-l_stadler_h/modules.mic-native_0/libiomp5.so/208d54001130595334266c322d4e2449/libiomp5.so'.
amplxe: Locating file `/boot/vmlinuz' on the remote system
amplxe: Warning: Cannot locate file `/usr/lib/debug/lib/modules/2.6.38.8+mpss3.4.2/vmlinux'.
amplxe: Executing actions 50 % Generating a report                             

General Exploration Metrics
---------------------------
Parameter             r001ah      
--------------------  ------------
CPU Time              555.326     
Clockticks            687493788696
Instructions Retired  133970437458
CPI Rate              5.132       
Cache Usage           0.0         
Vectorization Usage   0.0         
TLB Usage             0.0         

Collection and Platform Info
----------------------------
Parameter                 r001ah                                                                                                            
------------------------  ------------------------------------------------------------------------------------------------------------------
Application Command Line  /nfs/home/l_stadler_h/matrix-test/micrun.sh "/nfs/home/l_stadler_h/matrix-test/matrix-test-mic" "800" "10000" "s"
User Name                 l_stadler_h                                                                                                       
Operating System          Intel MIC Platform Software Stack (Built by Poky 7.0) 3.4.2 \n \l                                                 
Computer Name             merlinx01-mic0.psi.ch                                                                                             
Result Size               184359000                                                                                                         
Collection start time     09:59:52 10/03/2015 UTC                                                                                           
Collection stop time      09:59:55 10/03/2015 UTC                                                                                           

CPU
---
Parameter          r001ah                    
-----------------  --------------------------
Name               Intel Xeon Phi coprocessor
Frequency          1238000000                
Logical CPU Count  244                       

Summary
-------
Elapsed Time:       3.645  
CPU Time:           555.326
Average CPU Usage:  146.763
CPI Rate:           5.132  

Event summary
-------------
Hardware Event Type    Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
---------------------  -------------------------  --------------------------------  -----------------
CPU_CLK_UNHALTED                    687493788696                            555284  1238094          
INSTRUCTIONS_EXECUTED               133970437458                            108207  1238094          
amplxe: Executing actions 100 % done                                           
[l_stadler_h@merlinx01 matrix-test]$

0 Kudos
Highlighted
Employee
46 Views

Sometime re-installing vtune even restarting the system solves the problem, we don't know what last state is to interfere the use of vtune:-)

0 Kudos