I'm trying to profile a sensitive spot in one of our tools.
I'm using the standalone Intel Vtune Amplifier XE 2013 on windows 7.
I configured the analysis to use the advanced hotspots analysis including the call counts, and left the defaults settings unchanged.
I captured a minute worth of execution and stopped the program.
Amplifier has been finalizing the results for over an hour now, and I'm still counting.
Even assuming it had completed this step, this is not useable.
Am I overlooking an option that would make it fast?
It is better if you do in cmd, "amplxe-cl -finalize -r r00?ah" ?
Is it better if you use "-knob collection-detail=hotspots-sampling" instead of "stack-sampling" and "stack-and-callcount"?
Is it better if you use "-knob sampling-interval=10" instead of default "1"?
I realized that stack-and-callcount (through GUI) is the one that does not work.
Running the commandline amplxe-cl -finalize -r r00?ah works fine.
Could you add an option to the GUI so it calls the commandline tools to process the results?
This is not as nice as fixing the bug?
Using stack-and-callcount will increase a little bit overhead, you can unselect it. This is not the bug I think.
You can select "Hotspots" on GUI as a level of collection detail, you also can click "Command Line..." button to pop-up a dialog, "Copy Command Line to Clipboard".
Other tip is you can add option "-no-auto-finalize", it will cancel finalization, you can do it later.
This most definitely is a bug. Even with a sample interval of 10ms, the GUI never completes finalizing the results if stack+callcounts are enabled. I left it running overnight last night, and the progress bar hadn't moved at all. (I'm running Update 15 on Windows 7.) Even if it does eventually complete processing (which I doubt) then the UX here could be greatly improved.
I can do that, exactly what I am looking for? I can already see the process is burning silicon...
Update: I started the command line amplxe-cl -finalize -r r001ah Friday morning. It hit 28% pretty quickly (~30 minutes) on Friday, but seemed to stall there. I left it running over the weekend, and this morning (almost three days later) it's still stuck on 28%. The process is still maxing out a single CPU, but whether it's actually doing something or not, I have no idea. I guess not...
This is the output from the process:
amplxe: Executing actions 0 %
amplxe: Warning: The result contains a lot of raw data. Finalization may take a long time to complete.
amplxe: Executing actions 0 % Finalizing results
amplxe: Executing actions 0 % Finalizing the result
amplxe: Executing actions 0 % Clearing the database
amplxe: Executing actions 16 % Clearing the database
amplxe: Executing actions 16 % Loading raw data to the database
amplxe: Executing actions 16 % Loading data files
amplxe: Executing actions 16 % Loading 'systemcollector-....s
amplxe: Executing actions 18 % Loading 'systemcollector-....s
amplxe: Executing actions 18 % Loading '7984-4500.vtss' file
amplxe: Executing actions 19 % Loading '7984-4500.vtss' file
amplxe: Executing actions 19 % Loading '7984-6456.vtss' file
amplxe: Executing actions 21 % Loading '7984-6456.vtss' file
amplxe: Executing actions 21 % Loading '7984-6672.vtss' file
amplxe: Executing actions 23 % Loading '7984-6672.vtss' file
amplxe: Executing actions 23 % Loading '7984-7304.vtss' file
amplxe: Executing actions 24 % Loading '7984-7304.vtss' file
amplxe: Executing actions 24 % Loading '7984-7388.vtss' file
amplxe: Executing actions 25 % Loading '7984-7388.vtss' file
amplxe: Executing actions 26 % Loading '7984-7388.vtss' file
amplxe: Executing actions 26 % Loading '7984-7880.vtss' file
amplxe: Executing actions 27 % Loading '7984-7880.vtss' file
amplxe: Executing actions 28 % Loading '7984-7880.vtss' file
amplxe: Executing actions 28 % Loading '7984.vtss' file
If you enabled stack collection and call counts option, please don't specify application's running over one hour (it's better within 15 minutes). For example,
amplxe-cl c advanced-hotspots -knob collection-detail=stack-and-callcount 900 -target-pid 31970
If you profile application overnight, finalizing stage will be extreme slow (it records more info for stack and call count), you may specify option "-data-limit=500" which collect limited raw data (reach 500MB, will stop data collection), to save finalize time. Or, you can use Pause/Resume API to control data collection.
What is size of r001ah? Is it huge size? option '-data-limit" can be used.
Is there any reason to run overnight? Program's behavior is similar during long run. Can you try to specify "-duration 3600" to run?
The profile time of the application was about 90 seconds. (I started it paused, profiled what I wanted to, and then stopped it.) The resulting profile data is about 1.2GB. It is the finalizing stage of Amplifier XE that I left running over night (GUI) and over the weekend (command line).
Oh. It is the finalizing stage for overnight...but profile time is only 90 seconds - it should not generate 1.2GB result. Did you insert Pause/Resume API in code, or just operated on GUI?
Could you please try VTune Amplifier XE 2013 Update 17 - there were improvements particularly for finalization time of stacks+call counts there.
Hope this should help.
Thanks & Regards, Dmitry
>>>I can do that, exactly what I am looking for? I can already see the process is burning silicon>>>
You can look for specific thread which is consuming a lot of CPU cycles. It will not solve the issue of course , but you will probably now at which stage the process execution is stuck spinning endlessly.
For more in-depth troubleshooting you can use Xperf.
Dmitry is right that you can try Update 17.
There are three methods you can try:
1. Use my post 06/23/2014 - 15:14 for your running application, with duration setting ( assume there is no Pause/Resume API in your code)
2. Use Pause/Resume API, reference this article, control data collection in your code
3. Use 1), but start with paused. but you can open a command prompt to do, "amplxe-cl -command resume -r r00?ah" then "amplxe-cl -command stop -r r00?ah" to control data collection
I was just using the GUI to control the profiler.
I've actually been trying to get hold of Update 17 for some time now, but the person with the log in details to download it is on vacation right now (and the Intel Software Manager claims it's unavailable to me - presumably because it doesn't understand we have floating licences managed by a local server and therefore thinks I'm running an expired trial version). But as soon as I can get it installed, I'll let you know if the issue has been fixed or not.
And by the law of the Internet, I receive an email with a link to the installer for Update 17 moments after posting my last message. :)
Running the command line (amplxe-cl -finalize -r r001ah) as before, it reaches 28% fairly quickly (a few seconds), hangs there for about 30 seconds, and then crashes. Which is new.
Not entirely helpful though. I've sent in the crash report, so hopefully someone at Intel will pick it up soon.
@ James R
You can use : (if internet is available)
amplxe-feedback -create-bug-report rpt-name
amplxe-feedback -send-crash-report rpt-name
Can you use U17 to profile an application which runs over 15 minutes, with stack & call count enabled? Trying a new profiling session to verify if the thing gets better:
1. Try advanced-hotspost (stack, call count enabled) with a long run loop (simple test) to know if the problem persists on
2. Try advanced-hotspost (stack, call count enabled) with your app to know if this is your application specific issue.
We got your crash reports. And to efficiently triage the problem we need the result and binaries. Could you do the following:
On the result directory that has a crash do finalization:
>amplxe-cl -finalize -r r001ah
This should not crash sinc ethe crash code will not be invoke because of the variable.
Then do result archive with binaries to triage the crash:
>amplxe-cl -archive -r r001ah
And then send the result directory for us to triage the issue.
Thanks & Regards, Dmitry