Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Tonnis_P_
Beginner
361 Views

PMU ret error

Trying to use Vtune with matrix.mic like described. With amplxe-gui and amplxe-cl -target-system=mic-native:0 I get:

amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
abstract_Reserve_PMU ret error
abstract_Release_PMU ret error
Invalid error code
amplxe: Collection failed.
amplxe: Internal Error

Regards

 Josef

0 Kudos
35 Replies
Peter_W_Intel
Employee
211 Views

First at all, you need to check if you used latest VTune Amplifier XE 2016 U1 and use MPSS version v3.0-v3.5.

You can check if your vtune driver works on MIC or not:  ssh mic0 lsmod | grep sep

Even you can try sampling on MIC without your applicatopm: amplxe-cl -target-system=mic-native:0 -c advanced-hotspots -d 10

If vtune driver doesn't work on MIC, please do: sep_mic_install.sh ; under vtune/bin64/k1om - and follow steps

 

Tonnis_P_
Beginner
211 Views

Hi,

sampling without the application does work. Only when I want to start /tmp/matrix.mic I get the error. Perhaps I should mention, that mic and host does not share files. To execute matrix.mic on mic I copy libiomp5.so to /tmp and change the LD_LIBRARY_PATH.

Peter_W_Intel
Employee
211 Views

Thank you for this update. It's interesting for this application specific issue, can you attach (or send me private message) matrix.mic? It seemed that you can run matrix.mic without VTune (you might copy all libraries under compiler/lib/mic to /lib64 on MIC)

By the way, do you use latest VTune Amplifier XE 2016 Update 1?

Tonnis_P_
Beginner
211 Views

We had VTune XE 2016 without Update 1. Now we have Update 1. The yesterday mentioned run without matrix.mic (-d 10 as parameter instead) does not work any more but both return the same error as matrix.mic - abstract_Reserve_PMU ret error and abstract_Release_PMU ert error.

lsmod gives sep

Running matrix.mic without vtune on mic0 (LD_LIBRARY_PATH extended for libiomp5.so) works fine.

Dmitry_P_Intel1
Employee
211 Views

Hello,

Could you please reboot the card and clean up /tmp on host and try this again?

And just in case to check - are you sharing the card with somebody who might do the analysis there simultaniously?

Thanks & Regards, Dmitry

Peter_W_Intel
Employee
211 Views

@Tonnis P,

Can you share matrix.mic to me? I try to reproduce this on my side. Thank you.

Tonnis_P_
Beginner
211 Views

matrix.mic.gz follows.

I restarted mic0 this morning, cleared /tmp on host and mic. The card is not shared with anybody.

Peter_W_Intel
Employee
211 Views

Thanks for your matrix.mic, but I cannot reproduce this problem on my side - it generated expected result.

# amplxe-cl -c advanced-hotspots -target-system=mic-native:0 -data-limit=0 -- /root/matrix.mic

amplxe: Using target: mic-native:0
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /home/peter/r002ah -command stop.
amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.
Addr of buf1 = 0x7f239e5d2010
Offs of buf1 = 0x7f239e5d2180
Addr of buf2 = 0x7f2397551010
Offs of buf2 = 0x7f23975511c0
Addr of buf3 = 0x7f23904d0010
Offs of buf3 = 0x7f23904d0100
Addr of buf4 = 0x7f238944f010
Offs of buf4 = 0x7f238944f140
Threads #: 240 OpenMP threads
Matrix size: 3840
Using multiply kernel: multiply1
Execution time = 25.735 seconds
amplxe: Collection stopped.
amplxe: Using result path `/home/peter/r002ah'

...

amplxe: Executing actions 75 % Generating a report                             

General Exploration Metrics
---------------------------
Parameter             r002ah           
--------------------  -----------------
Clockticks            5979546020448.000
Instructions Retired  705676748868     
CPI Rate              8.473            
MUX Reliability       1.000            
Cache Usage           0.0              
Vectorization Usage   0.0              
TLB Usage             0.0              

Collection and Platform Info
----------------------------
Parameter                 r002ah                                                           
------------------------  -----------------------------------------------------------------
Application Command Line  /root/matrix.mic                                                 
User Name                 root                                                             
Operating System          Intel MIC Platform Software Stack (Built by Poky 7.0) 3.2.3 \n \l
Computer Name             prc-mic01-mic0                                                   
Result Size               2014603732                                                       
Collection start time     19:07:09 20/11/2015 UTC                                          
Collection stop time      19:07:37 20/11/2015 UTC                                          

CPU
---
Parameter          r002ah                    
-----------------  --------------------------
Name               Intel Xeon Phi coprocessor
Frequency          1090000000                
Logical CPU Count  244                       

Summary
-------
Elapsed Time:       28.283  
CPU Time:           5485.822
Average CPU Usage:  193.853
CPI Rate:           8.473   

Event summary
-------------
Hardware Event Type    Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
---------------------  -------------------------  --------------------------------  -----------------
CPU_CLK_UNHALTED                   5979546020448                           5481256  1090908          
INSTRUCTIONS_EXECUTED               705676748868                            646871  1090908          
amplxe: Executing actions 100 % done                                           

It seemed the problem was caused on MIC on your side, but you said "sampling without the application does work. Only when I want to start /tmp/matrix.mic I get the error.".  I have other questions:

1. Can you do sampling with other simple app on MIC? (check again if vtune driver works on MIC)

2. Can you check MPSS version? If it is too old, maybe upgrade it. Usually most of users run MPSS v3.2 - v3.5.

3. Do below,

#export AMPLXE_DEBUG=1
#export AMPLXE_LOG_LEVEL=TRACE
#export AMPLXE_LOG_DIR=<dir>

Run sampling,

amplxe-feedback -create-bug-report <report archive>

Please send result directory, trace logs and report logs to me. Thank you.

 

 

Tonnis_P_
Beginner
211 Views

MPSS Version is 3.5.2

Here comes the feedback:

Intel(R) VTune(TM) Amplifier XE 2016 (build 434111) feedback tool
Copyright (C) 2009-2015 Intel Corporation. All rights reserved.
Collecting pids tree
Quality Feedback Agent
Version: Quality Feedback Agent for Linux*
Time: Fri Nov 20 07:41:03 2015


This diagnostic script generates detailed system information about your system.
When  you forward  the output of this command to Intel(R) Support, the enclosed
server data will be used solely for the purposes of troubleshooting the problem
you have reported to Intel(R) Support. This information will not be shared with
any other companies or users, nor will it be used for any other purpose.

The script is collecting info. Please wait ...
Getting Hardware Information ...
    * Detecting architecture type
    * CPU info
    * CPU affinity
    * Amount of memory
    * PCI/AGP cards installed
Getting Software Information ...
    * Kernel version and type (UP/SMP)
    * glibc version
    * gcc version
    * icc version
    * NPTL version
    * Detecting Linux* distribution
Warning: The product has not been validated with this platform
It may work, but it is not supported
    * Checking IA32 libraries
    * File system disk space usage
    * Intel(R) Software Products installed
    * List loaded kernel modules
    * Check Linux* kernel sources availability
    * Check if sampling driver can be compiled
    * Environment variables
    * System configuration variables
    * System limits and kernel parameters
    * Permission to directories
    * Detecting GTK* and Motif* versions
    * Current locale
    * SELinux configuration
    * The list of running product processes
    * Query runlevel information for system services
    * Information on IPC facilities
    * Current user credentials
Product specific files ...
    * Product install dir
    * Product version
    * Product Driver Kit version
    * Product licenses
    * Printing bootup messages by dmesg
    * Printing messages from file /var/log/messages
    * Install log
ls: cannot access /tmp/intel.pset.*.log: No such file or directory
Use of uninitialized value $file in -r at /home/schuele/Projekte/vtune/log/2015-11-20-Fri-08-41-03-376219.amplxe-feedback/qfagent.cmd line 398.
Use of uninitialized value $file in concatenation (.) or string at /home/schuele/Projekte/vtune/log/2015-11-20-Fri-08-41-03-376219.amplxe-feedback/qfagent.cmd line 400.
ls: cannot access /tmp/intel.issa.*.log: No such file or directory
Use of uninitialized value $file in -r at /home/schuele/Projekte/vtune/log/2015-11-20-Fri-08-41-03-376219.amplxe-feedback/qfagent.cmd line 398.
Use of uninitialized value $file in concatenation (.) or string at /home/schuele/Projekte/vtune/log/2015-11-20-Fri-08-41-03-376219.amplxe-feedback/qfagent.cmd line 400.
    * Product core files ...

Please attach archive file to your problem report.
Intel(R) Premier Support website: https://premier.intel.com

Peter_W_Intel
Employee
211 Views

@Tonnis P

I checked release notes - MPSS v3.5.3 is supported, and you already use VTune Amplifier XE 2016 Update1.

Thanks for your answer, but please do my question 1) & 3) of my last post. Also, please attached result of 3) - includes vtune result, trace result and feedback result.

Tonnis_P_
Beginner
211 Views

ok, back agin.

My simple hello world produces the same error.

log.zip is attached

Peter_W_Intel
Employee
211 Views

Thanks for your logs of feedback, but I need more trace logs and VTune result (I mentioned at point 3)

See my example,

# mkdir trace_log
# export AMPLXE_DEBUG=1
# export AMPLXE_LOG_LEVEL=TRACE
# export AMPLXE_LOG_DIR=./trace_log
# amplxe-cl -c advanced-hotspots -target-system=mic-native:0 -data-limit=0 -- /root/matrix.mic

Zipped "trace_log" directory, attach it and VTune result to me. Thank you!

 

 

Tonnis_P_
Beginner
211 Views

ok, there it is.

Dmitry_P_Intel1
Employee
211 Views

Hello,

One more quick check.

Could you please ssh to the card and do

>cd  /dev

>ls -lna | grep sep

and provide the output?

Thanks & Regards, Dmitry

Tonnis_P_
Beginner
211 Views

Result ls -lna | grep sep on /dev:

drwxrwxrwx    2 0        0              100 Nov 19 11:15 sep3_15

Dmitry_P_Intel1
Employee
211 Views

Thank you, one more check:

<VTune_intstall_dir_on_host>/bin64/sep -version -mic

And while we are checking why it could happen could you please try something like:

>amplxe-cl -target-system=mic-native:0 -collect advanced-hotspots -collection-detail=stack-sampling ./your_app_on_card

The point is that collection with stacks uses different driver and it would be interesting to see if it works.

Thanks & Regards, Dmitry

Peter_W_Intel
Employee
211 Views

@Tonnis P

Thanks for your trace log - I have escalated to developer to investigate error message. I will get back to you as soon as I can.

By the way, what difference of MPSS version on my side is v3.3.5, yours is version 3.5.2 which is latest but supported in VTune's list. You may find other machine to verify this issue again:-)

Tonnis_P_
Beginner
211 Views

sep -version -mic:

mic 0 (node210-mic0): SEP driver version 3.15.5
mic 1 (node210-mic1): SEP driver version 3.15.5

To the amplxe I get: Unknown option: -collection-detail=stack-sampling

Dmitry_P_Intel1
Employee
211 Views

Hello,

Thank you for additional info.

The option should be "-knob collection-detail=stack-sampling" to be precise.

Thank you, Regards, Dmitry

Tonnis_P_
Beginner
134 Views

amplxe: Using target: mic-native:0
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /home/schuele/r001ah -command stop.
amplxe: Collection stopped.
amplxe: Copying result directory from the device
amplxe: Copying collection logs from the device.
amplxe: Using result path `/home/schuele/r001ah'
amplxe: Executing actions 22 % Resolving information for dangling locations    
amplxe: Locating file `/lib64/libpthread-2.14.1.so' on the remote system
amplxe: Locating file `/usr/lib/debug/lib/modules/2.6.38.8+mpss3.5.2/vmlinux' on the remote system
amplxe: Executing actions 23 % Resolving information for `libpthread-2.14.1.so'
amplxe: Warning: Cannot locate debugging symbols for file `/tmp/amplxe-tmp-schuele/modules.mic-native_0/libpthread-2.14.1.so/c1ac92d0fae7da65246d9534553eb090/libpthread-2.14.1.so'.
amplxe: Executing actions 25 % Resolving information for `libpthread-2.14.1.so'
amplxe: Locating file `/boot/vmlinuz-2.6.38.8+mpss3.5.2' on the remote system
amplxe: Locating file `/boot/vmlinuz' on the remote system
amplxe: Warning: Cannot locate file `/usr/lib/debug/lib/modules/2.6.38.8+mpss3.5.2/vmlinux'.mplxe: Executing actions 75 % Generating a report                             
General Exploration Metrics
---------------------------
Parameter  r001ah
---------  ------

Collection and Platform Info
----------------------------
Parameter                 r001ah                                                                               
------------------------  -------------------------------------------------------------------------------------
Application Command Line  /tmp/matrix.mic                                                                      
Operating System          2.6.38.8+mpss3.5.2 Intel MIC Platform Software Stack (Built by Poky 7.0) 3.5.2 \n \l

Computer Name             node210-mic0                                                                         
Result Size               2077850                                                                              
Collection start time     07:39:35 02/12/2015 UTC                                                              
Collection stop time      07:39:35 02/12/2015 UTC                                                              

CPU
---
Parameter          r001ah                    
-----------------  --------------------------
Name               Intel Xeon Phi coprocessor
Frequency          1047414056                
Logical CPU Count  240                       

Summary
-------
Elapsed Time:  0.051
CPU Time:           
CPI Rate:           

Event summary
-------------
Hardware Event Type    Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
---------------------  -------------------------  --------------------------------  -----------------
CPU_CLK_UNHALTED                               0                                 0  20000000         
INSTRUCTIONS_EXECUTED                          0                                 0  20000000         
amplxe: Executing actions 100 % done

Reply