Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4975 Discussions

No call stack information for some threads

Girish_M_
Beginner
1,440 Views

I have two systems with vtune installed and i am trying to collect hardware events and then generate a report grouped by thread. I use the following two commands:

amplxe-cl -collect general-exploration -knob enable-stack-collection=true -data-limit=0 -d='unlimited' -target-duration-type=long -r vresult -app-working-dir . --search-dir sym:p=. -- ./myapp myarg

amplxe-cl -report hw-events -group-by thread -r vresult >result.tx

The two systems are

System A - Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz

System B -  Intel(R) Xeon(R) CPU E5530  @ 2.40GHz

In system A i get all thread information, for example if 8 threads were created i get all thread information however on the other i do not get the information for all the threads. The report generated has lesser number of threads than there should be.

When i try doing the same thing through the GUI in the system B i see that some threads have no call stack information and thus the Hw events for these threads are NIL. 

I have dbg library packages installed as well. Appreciate any help. Thanks

0 Kudos
1 Solution
Peter_W_Intel
Employee
1,422 Views

That is true for your "System B -  Intel(R) Xeon(R) CPU E5530  @ 2.40GHz". This is a Nehalem-EP processor.

I can reproduce this on my side.

# amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir /home/peter/problem_report -- /home/peter/problem_report/primes.ia32
amplxe: Error: Cannot enable advanced capabilities for Hardware Event-based Sampling: problem with the driver (vtss/vtsspp). Check that the driver is running and the driver group is in the current user group list. See "Building and Managing the Sampling Driver" help topic for further details.

# amplxe-cl -collect general-exploration -knob enable-stack-collection=false -app-working-dir /home/peter/problem_report -- /home/peter/problem_report/primes.ia32  ; it can work properly

Event-based sampling with stack collection can work only on SandBridge processors or later. You may try other supported processor. 

 

View solution in original post

0 Kudos
25 Replies
Peter_W_Intel
Employee
235 Views

Finally I found the root-cause after reviewing your code carefully and used your cmd to repeat the problem, pay attention on "atoi()" call,  .

amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID "8", should be changed to:

amplxe-cl" -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID 8

The problem have gone on my side.   

0 Kudos
Peter_W_Intel
Employee
235 Views

Peter Wang (Intel) wrote:

Finally I found the root-cause after reviewing your code carefully and used your cmd to repeat the problem, pay attention on "atoi()" call,  .

amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID "8", should be changed to:

amplxe-cl" -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID 8

The problem have gone on my side.   

I am not satisfied with result since last thread has tiny workload. If I work on Redhat Enterprise server, all workload on 8 threads are balanced. 

0 Kudos
Girish_M_
Beginner
235 Views

Hi Peter,

I tried  changing in fact i had not used "8" which will be erroneous, i have attached a tar file which contains a small python script to reproduce the Bug. The command is as follows

           ./CheckBug.py  application  number_of_threads  number_of_times_to_run

I had hardcoded these values which also reproduced the bug. Can you run this script for at least for at least number_of_times_to_run =500 on your setup. I have also included the nqueens binary and source. One more thing to try would be to check on OS other than debian based ones as you mentioned.

0 Kudos
Peter_W_Intel
Employee
235 Views

You used custom-analysis, I will modify to general-exploration then try later...I leave for 6 hours.

0 Kudos
Peter_W_Intel
Employee
235 Views

OK. I changed your custom-collection to general-exploration in your script, didn't change others. But I only ran one time, on Ubuntu and RHEL

# ./CheckBug.py threadStats 8 1

Actually result from RHEL was excellent - I mean you can see 8 threads both in hotsopts report & timeline report, and workloads in 8 threads are balanced. Yes, I still can see the problem in the report from Ubuntu, only 7 threads in hotspots report, and 8 threads in time report.

As I explained to you before, it may be caused by task-scheduling of operation system, Ubuntu OS created 8 threads (see timeline report), but some threads started up early...continually pick up other tasks which may wait at other threads. I don't know them in detail...but you can see 8 threads were created in timeline report of Ubuntu, but last thread has no task to run so 7 treads showed in hotspots report.

I think that this is not VTune's bug, because it is OS's behavior.

I attached two results.   

 

0 Kudos
Reply