- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I recently switched from VTune Amplifier XE 2011 to the 2016 version.
With the new version I see substantially different profiling results (regular Hotspots profiling) compared to the old one.
The 2016 version reports a high spin time (and more effective time) for functions SwitchToFiber and SwitchToThread, while the other functions show far less effective time than before.
The program I'm profiling is using Windows Fibers quite heavily, so it is not surprising that SwitchToFiber shows up in the profiler, but the high spin time is unexpected since Fibers are implementing only co-operative multitasking.
Any ideas?
Thanks,
Thomas
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Thomas,
Please copy the VTune Amplifier 2011 result and Re-resolve the copy with VTune Amplifier 2016.
Do you see the hotspots and time break down the same as in 2011 or changed substantially?
Would it be possible for you to attach the both old 2011 and new 2016 results here? Or please submit Intel Premier Support issue.
Regards, Katya
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Could you also do the following: change "Call Stack Mode" knob on filter bar to user/system functions (in this case VTune will not attribute system internals to "user function +1" that is by default and see if top functions are changed to define what exactly VTune defined as spinning if attaching results is problematic.
We need to take into account that frequent fiber switching itself that manipulates with fiber states can consume time as well as frequent calling of yielding execution to another thread that is in fact fiber/threading cost and not the "effective" time that should be spent in user code execution. This classification could be improved over time in VTune and we need to check if this is your case.
Thanks & Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Katya, Dmitry,
Thanks for responding to my question.
Re-resolving the results from 2011 in 2016 does not work, I only get an error message.
I played around with the "Call Stack Mode" option, but this does not change the differences in the results from the different VTune versions.
I created a small example to make the issue reproducible and easier to explain. This is the code:
#include <Windows.h> __declspec(noinline) void runBusyLoop() { for (volatile int i = 0; i < 5000; ++i) {} } LPVOID main_fiber = 0; void __stdcall runFiber(void*) { while (true) { runBusyLoop(); SwitchToFiber(main_fiber); } } int _tmain(int argc, _TCHAR* argv[]) { main_fiber = ConvertThreadToFiberEx(0, 0); LPVOID first_fiber = CreateFiberEx(0, 0, 0, &runFiber, 0); for (int i = 0; i < 5000000; ++i) { runBusyLoop(); SwitchToFiber(first_fiber); } ConvertFiberToThread(); return 0; }
The program is basically toggeling between two Fibers and does some busy-waiting work there. The busy-waiting is implemented in a extra function to be able to clearly detect it in the profile.
These are the results from VTune 2011:
runBusyLoop 91.679s
runFiber 0.499s
wmain 0.319s
VTune 2016:
runBusyLoop 73.755s
runFiber 10.475s (where 8.839s is Spin Time)
wmain 10.288s (8.563s)
The above results are only listing user functions, with "user functions +1", SwitchToThread is listed with 17.4s Spin Time for VTune 2016.
What I find really confusing is the big difference in absolute run-time for function "runBusyLoop".
Thanks again for your help,
Thomas
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thomas,
It would be helpful to run a different driver based collection method with VTune Amplifier XE 2016 - advanced hotspots with stacks and see what will be the time distribution in this case. Could you please check this and publish the timing?
Thank you, Regards, Dmitry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dmitry,
These are the numbers for VTune 2016 Advanced Hotspots:
runBusyLoop 88.965s
wmain 0.271s
runFiber 0.252.s
Spin time is 0s again for all functions...
Regards,
Thomas

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page