Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

AVX transition penalties and OS support

Christian_M_2
Beginner
2,403 Views

Hello,

I already got some experience with SSE to AVX transition penalties and read the following article: http://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf

There is written, only zeroall or zeroupper gets the cpu in the safe state where no penalties can occure.

Isn't this a problem in multithreading, multiprocessing? I mean, assume process A is running with SSE legacy code. For example normal floating point operations with scalar SSE code. And process B is using AVX and only at the end of function has a zeroupper.

What if context switch occurs in the middle of AVX code? The OS will switch context including YMM registers. But even if the upper are all zero, wouldn't the cpu remain in the other state? So context switches might lead to penalties for process A without any influene of the programmer. Or is there something I missunderstood?

This scenario just came to my mind and I don't know how one could solve this. Or is there a possibility for the OS to avoid this problem?

0 Kudos
54 Replies
TimP
Honored Contributor III
1,407 Views

An AVX enabled OS is supposed to protect from and hide all upper register contents during context switches.  So, if the OS supports AVX properly, it can also run "legacy" SSE jobs without interactions with AVX jobs.  If you run an OS which doesn't support AVX (Windows XP, Win7 without SP, Red Hat 5 come to mind), your concerns about AVX are well founded.

0 Kudos
SergeyKostrov
Valued Contributor II
1,407 Views
>>...What if context switch occurs in the middle of AVX code?.. It is a highly possible scenario. However, if your AVX code is small ( let's assume it... ) and doesn't do too much calculations then you could use a synchronization object, for example a Critical Section, to prevent the context switching. Does it make sense?
0 Kudos
Bernard
Valued Contributor I
1,407 Views

>>>you could use a synchronization object, for example a Critical Section, to prevent the context switching. Does it make sense?>>>

Can you prevent the context switching when the Scheduler code runs at DPC level way above the normal thread execution level? Sometime hardware event will trigger an ISR and all the normal activity below or equal to DIRQL will be postponed.

IIRC floating point state(including xmm and ymm register context) will be saved KPCR structure.

0 Kudos
Christian_M_2
Beginner
1,407 Views

Ok, lets say the OS supports it, otherwise I does not make sense, as even calculation errors might appear.

But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?

I know small real times OS. They, for example, save all registers and CPU state register on task stack and restore it from another task stack. Then processing time is passed to this task. But internal AVX or SSE state of CPU is not in a register, is it? So simply changing register values for the according task might not be enough.

Or can this state also be saved in restored by special cpu instruction etc? If so, then I agree, that all conditions for a proper context switch are full filled.

0 Kudos
Bernard
Valued Contributor I
1,407 Views

>>>But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>

Windows OS will save processor context which includes also SSEn registers into special data structure which is called "Kernel Processor Region Control Block" if you know how to work with windbg you can dump this structure with the following commands "!pcr" and dt nt_!KPCR "address of pcr" and look for the pointer to KPRCB structure.IIRC KPRCB should contain saved SSE registers context.There is also special routine called "KeSaveFloatingPointState" which is called by the driver and is used to store volatile floating point context.

0 Kudos
Bernard
Valued Contributor I
1,407 Views

iliyapolak wrote:

>>>But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>

Windows OS will save processor context which includes also SSEn registers into special data structure which is called "Kernel Processor Region Control Block" if you know how to work with windbg you can dump this structure with the following commands "!pcr" and dt nt_!KPCR "address of pcr" and look for the pointer to KPRCB structure.IIRC KPRCB should contain saved SSE registers context.There is also special routine called "KeSaveFloatingPointState" which is called by the driver and is used to store volatile floating point context.

I think that SSE context is saved in FXSAVE_FORMAT structure.

0 Kudos
SergeyKostrov
Valued Contributor II
1,407 Views
>>...Can you prevent the context switching when the Scheduler code runs at DPC level way above the normal thread execution level? It is impossible to answer Yes or No. But, If a priority of the thread is raised to Time Critical all threads with lower priorities will be preemted. On Windows XP, for example, mouse and keyboard, UI updates and Task Manager are preemted completely. It is not recommended to do it in cases when calculations or processing take too much time and it has to be done for a really critical and small pieces of codes.
0 Kudos
Bernard
Valued Contributor I
1,407 Views

>>>It is impossible to answer Yes or No. But, If a priority of the thread is raised to Time Critical all threads with lower priorities will be preemted.>>>

Yes you are right.

>>>On Windows XP, for example, mouse and keyboard>>>

If you mean a driver's routine which is servicing a hardware mouse or keyboard event is not the same as a thread's priority.User mode threads usually run at IRQL = passive level and cannot block ISR DPC routine.Only thread which run in kernel mode can raise IRQL  above DPC level and preempt the execution of vital system code.

0 Kudos
SergeyKostrov
Valued Contributor II
1,407 Views
>>...If you mean a driver's routine which is servicing a hardware mouse or keyboard... I think Yes. Unfortunately, we're moving away from AVX related issues...
0 Kudos
Bernard
Valued Contributor I
1,407 Views

 >>>But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>

By saving the content of the SSE and AVX registers to the special structure probably this structure "FXSAVE_FORMAT"

0 Kudos
Bernard
Valued Contributor I
1,407 Views

>>>Unfortunately, we're moving away from AVX related issues...>>>

Sometime in the heat of discussion small deviations from the main topic are unavoidable:)

0 Kudos
Christian_M_2
Beginner
1,407 Views

Thanks for the inputs,

I looked in the manual you suggested in the other thread. Now I understand it much better.

Modern Intel CPUs are quite complex (yes, my comparision with an AVR 8 bit was not very lucky). And all the states including SSE and AVX and FPU state (and for sure a lot of other things) can be stored. Thus, context switches that save and restore this information and the registersmanage everything correct.

Sorry for the late answer, but I was away a few days and had to do some things for university.

0 Kudos
Bernard
Valued Contributor I
1,407 Views

I would recommend you to read Windows Internals books.You can find there a lot of deep technical information.

0 Kudos
Christian_M_2
Beginner
1,407 Views

Could you recommend anything special or a certain one?

0 Kudos
Bernard
Valued Contributor I
1,407 Views
>>>Could you recommend anything special or a certain one?>>> Yes of course. Please follow this link http://www.amazon.co.uk/Windows-Internals-PRO-Developer-Mark-Russinovich/dp/0735625301/ref=sr_1_1?s=books&ie=UTF8&qid=1360391204&sr=1-1 This book is filled with advanced technical information about innerworkings of the Windows OS.As excpected from the Microsoft you won't find there any code,but the level of explanation is going very deep into kernel.
0 Kudos
Christian_M_2
Beginner
1,407 Views

One more question: I have been looking for information about exact transition penalty information in cycles. Unfortunaltey I could not find any, although I have in mind something about 60-80 cycles. Where do I find this information?

0 Kudos
Bernard
Valued Contributor I
1,407 Views
Do you mean a transition from user mode to kernel mode? Here is a very interested post ://forum.osdev.org/viewtopic.php?p=117933#p117933
0 Kudos
SergeyKostrov
Valued Contributor II
1,407 Views
>>... I have in mind something about 60-80 cycles... There is some estimate value in the article you've referenced in your 1st post of the thread. These numbers, I mean 60-80 cycles, look right.
0 Kudos
Christian_M_2
Beginner
1,407 Views

iliyapola,

I think I did not state my question clear enough. I am talking about AVX state transistion because of mixing avx with sse legacy.

Sergey,

thanks for that, I thought I read it in one of the manuals, did not think of this article. But the manuals do not mention anything is this right? I mean concrete cycle counts.

0 Kudos
Christian_M_2
Beginner
1,187 Views

But can I find any concrete informations about cycles for both directions?

0 Kudos
Reply