Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Max_C_
Beginner
88 Views

Intel VTune can not collect information & cause core dump

Jump to solution

Hello,
we are using Intel VTune 2015 for profiling our application which is  running under operating system:2.6.32-504.1.3.el6.x86_64 Red Hat Enterprise Linux Server release 6.6 (Santiago)
CPU: Intel(R) Xeon(R) E5/E7 v2 processor
Frequency          2800004679
Logical CPU Count  4
----------------------------------------------------------------
I started four ngss.elf which is our product.
# ps -ef|grep ngss
root       400 31483  0 07:34 pts/0    00:00:00 ./ngss.elf --iomn 294921
root     32508 31483  0 07:34 pts/0    00:00:00 ./ngss.elf --iomn 360459
root     32509 31483  0 07:34 pts/0    00:00:00 ./ngss.elf --iomn 393228
root     32510 31483 36 07:34 pts/0    00:00:47 ./ngss.elf --iomn 425997
----------------------------------------------------------------------
Then I used the following command:
# ./amplxe-cl -collect hotspots -run-pass-thru=-no-altstack -target-pid=32510

----------------------------------------------------------------------
I used the follow command to stop Vtune. But it didn't work.
#./amplxe-cl -r /opt/intel/vtune_amplifier_xe_2015.1.0.367959/bin64/r007hs -command stop
-----------------------------------------------------------------------
So I typed CRTL+C here
The thread just became a defunct thread.
root     32510 31483  1 07:34 pts/0    00:00:51 [ngss.elf] <defunct>
Then boom! Coredump happened after a while.
[3] - Memory fault(coredump)   ./ngss.elf --iomn 425997 &

# ps -ef|grep ngss
root       400 31483  0 07:34 pts/0    00:00:02 ./ngss.elf --iomn 294921
root      1468 31483  0 07:48 pts/0    00:00:00 grep ngss
root     32508 31483  0 07:34 pts/0    00:00:02 ./ngss.elf --iomn 360459
root     32509 31483  0 07:34 pts/0    00:00:02 ./ngss.elf --iomn 393228

 

#######################################################
kill -9 32508
#####################################################
# ./amplxe-cl -collect hotspots -duration 5 -target-pid=32508
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /opt/intel/vtune_amplifier_xe_2015.1.0.367959/bin64/r009hs -command stop.

 

amplxe: Collection detached.
amplxe: Collection stopped.
amplxe: Using result path `/opt/intel/vtune_amplifier_xe_2015.1.0.367959/bin64/r009hs'
amplxe: Executing actions 34 % Precomputing frequently used data
amplxe: Warning: Cannot find data to precompute. Skipping the precomputation step.
amplxe: Executing actions 50 % Generating a report

Collection and Platform Info
----------------------------
Parameter                 r009hs
------------------------  --------------------------------------------------------------------------------
Application Command Line
Operating System          2.6.32-504.1.3.el6.x86_64 Red Hat Enterprise Linux Server release 6.6 (Santiago)
Computer Name             isc01-s00c02h0
Result Size               1380818
Collection start time     08:45:20 04/02/2015 UTC
Collection stop time      08:45:20 04/02/2015 UTC

CPU
---
Parameter          r009hs
-----------------  -----------------------------------
Name               Intel(R) Xeon(R) E5/E7 v2 processor
Logical CPU Count  4

Summary
-------
Elapsed Time:  0.000
amplxe: Executing actions 100 % done
drwx------  6 root root    4096 Feb  4 08:45 r009hs
# ./amplxe-cl -report hotspots
amplxe: Using result path `/opt/intel/vtune_amplifier_xe_2015.1.0.367959/bin64/r009hs'
amplxe: Executing actions 50 % Generating a report

Empty request output.
amplxe: Executing actions 100 % done
amplxe: Error: 0x40000027 (Reporter error)

So, I got three questions:

1.  What happened when I press "CTRL+C"? Did vtune send some signal or some other message to the process?

2. What happened after I input"kill -9 pid"?

3. Why "Empty request output" happened?

Thanks

0 Kudos

Accepted Solutions
Peter_W_Intel
Employee
88 Views

Maybe it was a VTune bug for ctrl-C handler. Is it possible that you can help to send us? Thank you.

Our developer asked to do:

  1.        export AMPLXE_LOG_LEVEL=TRACE
  2.        reproduce the scenario
  3.        send me collected result folder and related logs from /tmp/amplxe-log-<username> (last 3 folders)

I have an idea to use advance-hotspots instead of hotspots, with system wide profiling. Performance data of all active processes will be collected.

For example:

amplxe-cl -collect advanced-hotspots -r r009ah -duration 30

Using amplxe-gui to view report, and you can filter data to narrow down your interest of process(es). 

View solution in original post

8 Replies
Peter_W_Intel
Employee
88 Views

This might be another bug report. Thank you.

You may try to user option "-profiling-signal 33" with option "-duration 60" (avoid ctrl-C incorrect handler?), please reference this thread (the bug hasn't be fixed yet...I hope that you can generated expected result). For example: (putting vtune result under vtune/bin64 is not recommended)

amplxe-runss -r r008 --data-limit-mb=500 --stack-stitching --follow-child --itt-config=frame --stackwalk=offline -profiling-signal 33 -duration 30 --type=cpu:counters:nostack --type=cpu:stack --target-pid 32510

 

 

 

Max_C_
Beginner
88 Views

Hi Peter,

Thanks for your reply.

I tried the command you mentioned but failed.

I set the duration to 3. Here is the detail:

# ps -ef|grep ngss
root     22545 15238  0 05:43 pts/0    00:00:00 ./ngss.elf --iomn 360459
root     22546 15238  0 05:43 pts/0    00:00:00 ./ngss.elf --iomn 393228
root     22547 15238  0 05:43 pts/0    00:00:00 ./ngss.elf --iomn 425997
root     22902 15238  0 05:44 pts/0    00:00:00 ./ngss.elf --iomn 294921

 

amplxe-runss -r r008 --data-limit-mb=500 --stack-stitching --follow-child --itt-config=frame --stackwalk=offline -profiling-signal 33 -duration 3 --type=cpu:counters:nostack --type=cpu:stack --target-pid 22902

----------------------------------------------------------------------------------------------------------------

I waitted for about 1 hour. But it just didn't quit. 

-----------------------------------------------------------------------------------------------------------------

So I tried to use kill -9+pid of ngss

output:

amplxe: Detached
amplxe: Collection stopped - application return code is 0
[4] + Killed                   ./ngss.elf --iomn 294921 &

--------------------------------------------------------------------------------------------------------------------

I tried to input CTRL+C

output:

^Camplxe: Collection failed

# amplxe-cl -report summary
amplxe: Using result path `/storage/max/r008'
amplxe: Executing actions 14 % Loading data files
amplxe: Error: Cannot load data file `/storage/max/r008/data.1/systemcollector-isc01-s00c02h0.sc' (File is already loaded).
amplxe: Executing actions 34 % Precomputing frequently used data
amplxe: Warning: Cannot find data to precompute. Skipping the precomputation step.
amplxe: Executing actions 50 % Generating a report

Collection and Platform Info
----------------------------
Parameter              r008
---------------------  --------------------------------------------------------------------------------
Operating System       2.6.32-504.1.3.el6.x86_64 Red Hat Enterprise Linux Server release 6.6 (Santiago)
Computer Name          isc01-s00c02h0
Result Size            1369261
Collection start time  06:01:57 05/02/2015 UTC
Collection stop time   06:01:57 05/02/2015 UTC

CPU
---
Parameter          r008
-----------------  ----
Logical CPU Count  4

Summary
-------
Elapsed Time:  0.000
amplxe: Executing actions 100 % done
<isc01-s00c02h0:root>/storage/max:
# amplxe-cl -report hotspots
amplxe: Using result path `/storage/max/r008'
amplxe: Executing actions 50 % Generating a report

Empty request output.
amplxe: Executing actions 100 % done
amplxe: Error: 0x40000027 (Reporter error)

pstree information:

     `-xinetd-+-in.telnetd---login---ksh---su---ksh-+-amplxe-runss-+-pinbin
              |                                     |              `-6*[{amplxe-runss}]
              |                                     `-2*[ngss.elf---32*[{ngss.elf}]]

 

-------------------------------------------------------------------------------------------------------------------

I start the ngss.elf again and use the following command. Vtune did not quit.

amplxe-runss -r r009 -command=stop

----------------------------------------------------------------------------------------------------------------------------------------

I really did no what happened to Vtune. 

 

 

Peter_W_Intel
Employee
89 Views

Maybe it was a VTune bug for ctrl-C handler. Is it possible that you can help to send us? Thank you.

Our developer asked to do:

  1.        export AMPLXE_LOG_LEVEL=TRACE
  2.        reproduce the scenario
  3.        send me collected result folder and related logs from /tmp/amplxe-log-<username> (last 3 folders)

I have an idea to use advance-hotspots instead of hotspots, with system wide profiling. Performance data of all active processes will be collected.

For example:

amplxe-cl -collect advanced-hotspots -r r009ah -duration 30

Using amplxe-gui to view report, and you can filter data to narrow down your interest of process(es). 

View solution in original post

Max_C_
Beginner
88 Views

Hi Peter,

Thanks for your reply.

I dowloaded the update 2, and will do what you mentioned and send you the log file later.

 

Max_C_
Beginner
88 Views

Hi Peter,

I had did as you mentioned.:

export AMPLXE_LOG_LEVEL=TRACE

amplxe-cl -collect advanced-hotspots -r max_test_2_11 -duration 30 --target-pid 7991

 

Everything is OK!!!

The errors before disappeard......

Thanks for your support!

 

 

Max_C_
Beginner
88 Views

I considered why did I success.

As I run advanced hotspots instead of basic hotspots. I found the following description:

  • Advanced Hotspots: Event-based sampling analysis that monitors all the software executing on your system including the operating system modules. The collector interrupts the processor at the specified sampling interval and collects samples of instruction addresses.

  • Basic Hotspots: Performance analysis based on user-mode sampling and tracing collection. It focuses on a particular target, identifies functions that took the most CPU time to execute, restores the call tree for each function, and shows thread activity.      

One thing I can make sure is my binary has tight connection with operation system. Some how make the basic hotspots fail....

Anyway, I will keep on searching.

Thanks for your reply.

BTW: Out company is considering to buy this software. What is the meaning of Floating – 1 user.

I want to konw how many users can share one floating license.

 

Peter_W_Intel
Employee
88 Views

Thanks for this update - everything is OK after your trying Update 2.

Also thank you for considering to buy this software...you are welcome to discuss any technical issue on this forum, in future:-)

 

Bernard
Black Belt
88 Views

>>>One thing I can make sure is my binary has tight connection with operation system. Some how make the basic hotspots fail....

Anyway, I will keep on searching.>>>

Do you mean that your application is modifying/hooking some Linux data structures?