- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm seeing very inaccurate results produced by Parallel Amplifier on a large program. For golden results, I'm using VTune in sampling mode for CPU_CLK_UNHALTED.CORE counter. I also did some double-checking using internal timers inside the program to make sure VTune numbers are in the right ballpark.
SETUP
The program runs about 30 minutes on its own and under the Amplifier. It is a 64-bit executable that consists of over 200 DLLs. It is compiled from a few million lines of code. Early parts of the program exercise one set of DLLs, middle exercise another, and the end exercises another set of DLLs. The hardware is QuadCore Xeon with 8 GB of RAM. The peak virtual memory (include all the code, not just data) is just under 5 GB.
EXPERIMENT 1
With default Amplifier settings, I got completely inaccurate data. The top two DLLs marked by the Amplifier actually take less than 2% of the program's time. The size of the Amplifier data dir was 19 GB.
EXPERIMENT 2
When I increased maximum size of raw collector data to 1000 MB (from the default of 10 MB), the results came closer. The top DLL was said to take 23% and in VTune it takes 19%. So close enough. However, another DLL where almost 25% of time is spent (again, according to VTune) did not even show up on the Amplifier's list. The size of the Amplifier dir went up to 20 GB.
EXPERIMENT 3
Next thing I did was to check off "Enable accurate CPU Time detection" and kept 1000 MB raw data limit. The numbers became better and more DLLs showed up. However, the top DLL consumer according to VTune, got only 2.5% in Amplifier. The Amplifier dir size was now 49 GB.
I should mention that VTune's data dir for sampling of two counters (without calibration) is 241 MB.
QUESTIONS
1. Having maximum limit set too low seems to cut off data collection. How do we know when this happens? I couldn't find any indicators that this limit was reached?
2. Is there anything else I can do to improve the accuracy of the Amplifier?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm seeing very inaccurate results produced by Parallel Amplifier on a large program. For golden results, I'm using VTune in sampling mode for CPU_CLK_UNHALTED.CORE counter. I also did some double-checking using internal timers inside the program to make sure VTune numbers are in the right ballpark.
SETUP
The program runs about 30 minutes on its own and under the Amplifier. It is a 64-bit executable that consists of over 200 DLLs. It is compiled from a few million lines of code. Early parts of the program exercise one set of DLLs, middle exercise another, and the end exercises another set of DLLs. The hardware is QuadCore Xeon with 8 GB of RAM. The peak virtual memory (include all the code, not just data) is just under 5 GB.
EXPERIMENT 1
With default Amplifier settings, I got completely inaccurate data. The top two DLLs marked by the Amplifier actually take less than 2% of the program's time. The size of the Amplifier data dir was 19 GB.
EXPERIMENT 2
When I increased maximum size of raw collector data to 1000 MB (from the default of 10 MB), the results came closer. The top DLL was said to take 23% and in VTune it takes 19%. So close enough. However, another DLL where almost 25% of time is spent (again, according to VTune) did not even show up on the Amplifier's list. The size of the Amplifier dir went up to 20 GB.
EXPERIMENT 3
Next thing I did was to check off "Enable accurate CPU Time detection" and kept 1000 MB raw data limit. The numbers became better and more DLLs showed up. However, the top DLL consumer according to VTune, got only 2.5% in Amplifier. The Amplifier dir size was now 49 GB.
I should mention that VTune's data dir for sampling of two counters (without calibration) is 241 MB.
QUESTIONS
1. Having maximum limit set too low seems to cut off data collection. How do we know when this happens? I couldn't find any indicators that this limit was reached?
2. Is there anything else I can do to improve the accuracy of the Amplifier?
Hi Dmitry,
It seems that you ran a huge application on Parallel Amplifier. Result of Experiment 1 was inaccurate - it does make sense, because you had 10 MB size for raw collector data perresult directory, the data collection will be SUSPENDED once the memory of 10 MB was reached.
You did good change to increase limited memory size (1000 MB) for raw datadirectory, so data collector will have only a few times to suspend - Experiment2.
You also did right thing to enable "Enable accurate CPU time detection" - which includes post-processing time.
Have you used "Remove raw collector data after result finalization"? - which helps to remove raw data and makes result files smaller.
A1. The user doesn't know when data collector is suspended, when memory use reaches limit set.
A2. You can use "Start data collection paused" and use resume button on GUI, or use "Resume collection after n sec" - to collect data in your interest of code region. If the code region is small peice, you can use 10MB space for raw data.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A2. You can use "Start data collection paused" and use resume button on GUI, or use "Resume collection after n sec" - to collect data in your interest of code region. If the code region is small peice, you can use 10MB space for raw data.
Regards, Peter
Regarding A1: The Amplifier should somehow tell the user when the maximum raw data size is reached. Otherwise, the user will try to optimize the wrong thing, wasting his/her time.
Regarding A2: does it mean there is no way to correctly profile huge application in full? Often, the first step in optimizing is doing high-level profiling (like I tried to do here), and only then profile the slowest code in detail. In this case, by hiding the most critical DLL, Amplifier would lead me to optimize the wrong thing.
I am quite impressed with low overhead and easy-of-use of the Parallel Amplifier. However, I need to know when the Amplifier reaches its limits and starts producing bogus data. Otherwise, when do I know when to trust the tool?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes.In most of timethe user doesn't know when she/he should resume / pause, since doesn't knowwhere code runs...
I appreciate your opinions - actually it is hard for the user to select maximum size of memory for raw collector data, because the user doesn't know 10MB, 100MB, 1000MB - which one is proper for her/his program.
If we have some mechanism like as VTune Analyzer's "Calibration" to help the user to set maximum size of memory for raw collector data, it will be helpful. In case, Parallel Amplifier will run twice, first time not collect performance data...
Let's hear inputs from others.
Thanks again for comments.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
QUESTIONS
1. Having maximum limit set too low seems to cut off data collection. How do we know when this happens? I couldn't find any indicators that this limit was reached?
2. Is there anything else I can do to improve the accuracy of the Amplifier?
1. Once the raw data size limit is reached, Amplifier throw a message in the output window signaling that the collection was stopped. You canfigure out the current size of data by looking at size of data.0 directory in the results folder. The size of the whole result folder doesn't say anything.
2. Checking the switch "Enable the accurate CPU Time detection" is enough for achieving decent accuracy. But you have to be aware of difference in data collection between VTune and Amplifier. Amplifier collects CPU time only, whereas VTune attributes waiting time (thread blocked on synchronization, IO call, etc.) to the module as well. Switching on the button "Assign system function time to caller user function..." mitigates the problem with IO, but not likely the others.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2. Checking the switch "Enable the accurate CPU Time detection" is enough for achieving decent accuracy. But you have to be aware of difference in data collection between VTune and Amplifier. Amplifier collects CPU time only, whereas VTune attributes waiting time (thread blocked on synchronization, IO call, etc.) to the module as well. Switching on the button "Assign system function time to caller user function..." mitigates the problem with IO, but not likely the others.
2. My program is not IO intensive and only a part of it is parallelized. Most of the time is spent hammering CPU and RAM. I already had "Assign system function time to caller..." turned on.
I did another experiment: turned off "Assign system function time to caller..." and re-run the Amplifier. Just by chance, I was able to catch this part in the output window:
repositorytpsstpsssrctpssruntoolwindowscswitch_collector.cpp:955 tpss::processEvent: Assertion 'tIt != g_thrd_map->end()' failed.
My program kept going afterwards but no results were presented by the Amplifier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did another experiment: turned off "Assign system function time to caller..." and re-run the Amplifier. Just by chance, I was able to catch this part in the output window:
repositorytpsstpsssrctpssruntoolwindowscswitch_collector.cpp:955 tpss::processEvent: Assertion 'tIt != g_thrd_map->end()' failed.
My program kept going afterwards but no results were presented by the Amplifier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Amplifier version I have is Update 1, build 67513.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Amplifier version I have is Update 1, build 67513.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The crash with debug messages is still there but in a different form (and possibly different place). Now I got the option to submit a fancy crash report, so I did. I also put a note on what I saw in Update 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The crash with debug messages is still there but in a different form (and possibly different place). Now I got the option to submit a fancy crash report, so I did. I also put a note on what I saw in Update 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As for devenv.exe, I don't know. If you really need it, I can find out by re-running the experiment and watching Task Manager.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As for devenv.exe, I don't know. If you really need it, I can find out by re-running the experiment and watching Task Manager.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't tell if there were any messages in the output window. As soon as the crash happens, VS window disappears and the crash report window appears. If there is a way to redirect Amplifier output to a file (and make sure it's flushed all the time), I can try it again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can't tell if there were any messages in the output window. As soon as the crash happens, VS window disappears and the crash report window appears. If there is a way to redirect Amplifier output to a file (and make sure it's flushed all the time), I can try it again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In ordr to use command line collector and be ble toopen results in GUI, you need to use ampl-cl.exe (not ampl-runss.exe)
The command line would be something like that:
>path_to/ampl-cl.exe -collect hotspots --no-auto-finalize-r result_dirpath_to/your_app.exe
For commnd line optionas help see:
>path_to/ampl-cl.exe -help
Sorry for the confusion.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I changed bottom-up view to show functions grouped by module, VisualStudio went a bit crazy. It used all four cores on my CPUs for almost 3 hours, always kept a very large VM size (and it's a 32-bit exe, so it's a bit dangerous to have such a large memory footprint) and went through loooots of data. Below is the snapshop of taskmanager when I killed the devenv.exe (notice the 335 GB of I/O reads). The total size of the results directory is only 82 MB. Looks like the Amplifier has a quadratic (or worse!) loop somewhere.

When I re-run the collector from VisualStudio, I got the crash while my program was running. I am quite sure of it now. The crash came very soon, whereas it took almost an hour to do the collection from command line. When run from within VisualStudio, I did NOT see ampl-cl process, only ampl-runss.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
This is quite new behavior of the tool that we never observed before. Thanks a lot for your investigations. We will appreciate if you continue sending crash reports to us.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dmitry,
This is quite new behavior of the tool that we never observed before. Thanks a lot for your investigations. We will appreciate if you continue sending crash reports to us.
If you do, contact me privately at "ddenisen at altera dot com". We'll need to establish an NDA but that should not be a big problem.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page