I am trying to gather some system-wide hardware counters for my application, X seconds after it has started, over a period of Y seconds. I am using the following command line:
amplxe-cl --collect my_custom_conf -target-duration-type=veryshort -duration 30 -no-auto-finalize -no-summary -data-limit=0 -resume-after=20000
and I expect the collection to start after 20s and last for 30s.
I have two questions:
a) I receive the following messages:
amplxe: Warning: Pause command is not supported for managed code profiling. Runtime overhead is still possible. Data size limit may be exceeded.
amplxe: Collection paused.
amplxe: Warning: To enable hardware event-base
amplxe: Collection resumed.
It seems that the first warning suggests that I cannot use the "-resume-after" option. However, the following messages suggest that vtune indeed paused the collection and resumed it at a later time. Is the warning a false alarm?
b) My second and most important question is related to the actual collection period. While I specify "-duration 30", the collection seems to last significantly longer. When I time the command above, it takes 2m16s, while I would expect something much closer to 50s (20s delay + 30s measurement).
Also, in my result's directory rXXX.amplxe file , I see the following entry:
which confirms the duration of 2m16s.
How can I know how long my profiling really lasts? I need to match the profiling results to the exact execution period of the application.
OS: RHEL 7.0
Vtune version: Intel(R) VTune(TM) Amplifier XE 2015 Update 1 (build 380310) Command Line Tool
Thank you in advance.
Hi Alexandros D.!
Regarding a), the default managed code mode is "auto", thus the warning is issued. While it probably has no impact on your collection, you can try adding the '-mrte-mode native' option to your command line to remove the warning.
Regarding b), have you opened the results in the GUI and viewed the duration on the timeline? I wonder if the collected data was for 30 seconds but it took 2:16 to collect it (i.e., due to overhead).
BTW, how many events do you have defined for your custom configuration? By default, your events will be multiplexed. On short runs with lots of events, this can reduce the accuracy of your results. Please refer to this article for more details (see Complexity 3).
As a sanity check, what kind of results do you get if you replace your custom config with 'advanced-hotspots'?
Thank you for your prompt reply and the useful pointers!
By using 'advanced-hotspots' and leaving all the other parameters the same, collection takes almost the same time as my custom config (2m15s). I currently have less than 30 events in my configuration.
When I open the results with the gui, I see that, indeed, the collected data was for 30 seconds, and I can see the exact start and end timestamps. So, you were right that "the collected data was for 30 seconds but it took 2:16 to collect it", but I'm still not sure I understand what that means. Could you please explain how that works?
For instance, I want measurements from time 20 to time 50, but the collection ends at time 135. The gui indeed reports that the measurements were take in the timeframe 20-50, so this should mean that all samples (multiplexed or not) were taken during that period and not after. Where did the remaining 85 seconds go? Is it time required for post-processing?
The only "paused" region is the initial 20s that I specified with -resume-after.
If I rerun without the -resume-after option, I don't get any paused regions at all, but still the execution time is a lot longer than the actual measurement time. It would be interesting to know why the additional time is needed, but if the reason is not clear, I'd just want to know whether it's a normal bahavior and whether I can trust that the results I see actually refer to the measurement period I request.
I don't have any other explanation, at this time. The best thing to do would be to provide us your zipped up results directory so that we can take a closer look at the details. You can submit an issue at Intel® Premier Support. Premier Support is free to all customers, including eval customers. If you have difficulty using the Premier Support web site, you can send the results to me via private message.
Also, if you can execute the command 'amplxe-feedback -create-bug-report report.zip' and send us the zip file, that will provide various system configuration information to help us troubleshoot the problem.
One thing to notice is that -target-duration-type=veryshort parameter will make SAV value 10 times less than defaul settings and this can increase collection overhead significantly. Could you please try to remove the option and make the measurement one more time? If it leads to the same results - it would be great to get the result directory for investigation as David pointed.
Thansk & Regards, Dmitry
Thanks for the suggestion. Unfortunately, that doesn't seem to be the source of the problem. I also tried to profile for 2mins with -target-duration-type=short, but the overhead seems similar. I sent these results to MrAnderson.
As a sanity check, I tried the exact same command line on an old setup of mine; older hardware, older vtune version (2013 instead of 2015) and this overhead doesn't exist, so it certainly seems that something's going wrong. Unfortunately the differences between the two setups are too many to easily pinpoint the culprits.
You can use sample interval directly in command line, such as - (for example)
amplxe-cl -collect-with runss -knob sampling-interval=100 ./matrix1 ; note that default interval is 1000, it means 1000 samples per second, now it will take 10,000 samples per second.