- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have two systems with vtune installed and i am trying to collect hardware events and then generate a report grouped by thread. I use the following two commands:
amplxe-cl -collect general-exploration -knob enable-stack-collection=true -data-limit=0 -d='unlimited' -target-duration-type=long -r vresult -app-working-dir . --search-dir sym:p=. -- ./myapp myarg
amplxe-cl -report hw-events -group-by thread -r vresult >result.tx
The two systems are
System A - Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz
System B - Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
In system A i get all thread information, for example if 8 threads were created i get all thread information however on the other i do not get the information for all the threads. The report generated has lesser number of threads than there should be.
When i try doing the same thing through the GUI in the system B i see that some threads have no call stack information and thus the Hw events for these threads are NIL.
I have dbg library packages installed as well. Appreciate any help. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is true for your "System B - Intel(R) Xeon(R) CPU E5530 @ 2.40GHz". This is a Nehalem-EP processor.
I can reproduce this on my side.
# amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir /home/peter/problem_report -- /home/peter/problem_report/primes.ia32
amplxe: Error: Cannot enable advanced capabilities for Hardware Event-based Sampling: problem with the driver (vtss/vtsspp). Check that the driver is running and the driver group is in the current user group list. See "Building and Managing the Sampling Driver" help topic for further details.
# amplxe-cl -collect general-exploration -knob enable-stack-collection=false -app-working-dir /home/peter/problem_report -- /home/peter/problem_report/primes.ia32 ; it can work properly
Event-based sampling with stack collection can work only on SandBridge processors or later. You may try other supported processor.
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Finally I found the root-cause after reviewing your code carefully and used your cmd to repeat the problem, pay attention on "atoi()" call, .
amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID "8", should be changed to:
amplxe-cl" -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID 8
The problem have gone on my side.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Peter Wang (Intel) wrote:
Finally I found the root-cause after reviewing your code carefully and used your cmd to repeat the problem, pay attention on "atoi()" call, .
amplxe-cl -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID "8", should be changed to:
amplxe-cl" -collect general-exploration -knob enable-stack-collection=true -app-working-dir . --search-dir sym:p=. -- ./NDAID 8
The problem have gone on my side.
I am not satisfied with result since last thread has tiny workload. If I work on Redhat Enterprise server, all workload on 8 threads are balanced.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Peter,
I tried changing in fact i had not used "8" which will be erroneous, i have attached a tar file which contains a small python script to reproduce the Bug. The command is as follows
./CheckBug.py application number_of_threads number_of_times_to_run
I had hardcoded these values which also reproduced the bug. Can you run this script for at least for at least number_of_times_to_run =500 on your setup. I have also included the nqueens binary and source. One more thing to try would be to check on OS other than debian based ones as you mentioned.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You used custom-analysis, I will modify to general-exploration then try later...I leave for 6 hours.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK. I changed your custom-collection to general-exploration in your script, didn't change others. But I only ran one time, on Ubuntu and RHEL
# ./CheckBug.py threadStats 8 1
Actually result from RHEL was excellent - I mean you can see 8 threads both in hotsopts report & timeline report, and workloads in 8 threads are balanced. Yes, I still can see the problem in the report from Ubuntu, only 7 threads in hotspots report, and 8 threads in time report.
As I explained to you before, it may be caused by task-scheduling of operation system, Ubuntu OS created 8 threads (see timeline report), but some threads started up early...continually pick up other tasks which may wait at other threads. I don't know them in detail...but you can see 8 threads were created in timeline report of Ubuntu, but last thread has no task to run so 7 treads showed in hotspots report.
I think that this is not VTune's bug, because it is OS's behavior.
I attached two results.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »