Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4999 Discussions

High Spin Time detected for Windows Fibers

Thomas_P_3
Beginner
550 Views

Hello,

I recently switched from VTune Amplifier XE 2011 to the 2016 version.
With the new version I see substantially different profiling results (regular Hotspots profiling) compared to the old one.

The 2016 version reports a high spin time (and more effective time) for functions SwitchToFiber and SwitchToThread, while the other functions show far less effective time than before.
The program I'm profiling is using Windows Fibers quite heavily, so it is not surprising that SwitchToFiber shows up in the profiler, but the high spin time is unexpected since Fibers are implementing only co-operative multitasking.

Any ideas?

Thanks,
Thomas

 

0 Kudos
5 Replies
Ekaterina_L_Intel
550 Views

Hello Thomas,

Please copy the VTune Amplifier 2011 result and Re-resolve  the copy with VTune Amplifier 2016.

Do you see the hotspots and time break down the same as in 2011 or changed substantially?

Would it be possible for you to attach the both old 2011 and new 2016 results here? Or please submit Intel Premier Support issue.

Regards, Katya

0 Kudos
Dmitry_P_Intel1
Employee
550 Views

Hello,

Could you also do the following: change "Call Stack Mode" knob on filter bar to user/system functions (in this case VTune will not attribute system internals to "user function +1" that is by default and see if top functions are changed to define what exactly VTune defined as spinning if attaching results is problematic.

We need to take into account that frequent fiber switching itself that manipulates with fiber states can consume time as well as frequent calling of yielding execution to another thread that is in fact fiber/threading cost and not the "effective" time that should be spent in user code execution. This classification could be improved over time in VTune and we need to check if this is your case.

Thanks & Regards, Dmitry

0 Kudos
Thomas_P_3
Beginner
550 Views

Hello Katya, Dmitry,

Thanks for responding to my question.

Re-resolving the results from 2011 in 2016 does not work, I only get an error message.
I played around with the "Call Stack Mode" option, but this does not change the differences in the results from the different VTune versions.

I created a small example to make the issue reproducible and easier to explain. This is the code:

#include <Windows.h>

__declspec(noinline) void runBusyLoop() {
    for (volatile int i = 0; i < 5000; ++i) {}
}

LPVOID main_fiber = 0;
void __stdcall runFiber(void*) {
    while (true) {
        runBusyLoop();
        SwitchToFiber(main_fiber);
    }
}

int _tmain(int argc, _TCHAR* argv[]) {
    main_fiber = ConvertThreadToFiberEx(0, 0);
    LPVOID first_fiber = CreateFiberEx(0, 0, 0, &runFiber, 0);

    for (int i = 0; i < 5000000; ++i) {
        runBusyLoop();
        SwitchToFiber(first_fiber);
    }

    ConvertFiberToThread();
    return 0;
}

The program is basically toggeling between two Fibers and does some busy-waiting work there. The busy-waiting is implemented  in a extra function to be able to clearly detect it in the profile.

These are the results from VTune 2011:
runBusyLoop  91.679s
runFiber 0.499s
wmain   0.319s

VTune 2016:
runBusyLoop 73.755s

runFiber 10.475s  (where 8.839s is Spin Time)
wmain 10.288s  (8.563s)

The above results are only listing user functions, with "user functions +1", SwitchToThread is listed with 17.4s Spin Time for VTune 2016.

What I find really confusing is the big difference in absolute run-time for function "runBusyLoop".

Thanks again for your help,

Thomas

 

 

 

0 Kudos
Dmitry_P_Intel1
Employee
550 Views

Thomas,

It would be helpful to run a different driver based collection method with VTune Amplifier XE 2016 - advanced hotspots with stacks and see what will be the time distribution in this case. Could you please check this and publish the timing?

Thank you, Regards, Dmitry

 

0 Kudos
Thomas_P_3
Beginner
550 Views

Dmitry,

These are the numbers for VTune 2016 Advanced Hotspots:
runBusyLoop 88.965s
wmain 0.271s
runFiber 0.252.s

Spin time is 0s again for all functions...

Regards,
Thomas

 

0 Kudos
Reply