- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I already got some experience with SSE to AVX transition penalties and read the following article: http://software.intel.com/sites/default/files/m/d/4/1/d/8/11MC12_Avoiding_2BAVX-SSE_2BTransition_2BPenalties_2Brh_2Bfinal.pdf
There is written, only zeroall or zeroupper gets the cpu in the safe state where no penalties can occure.
Isn't this a problem in multithreading, multiprocessing? I mean, assume process A is running with SSE legacy code. For example normal floating point operations with scalar SSE code. And process B is using AVX and only at the end of function has a zeroupper.
What if context switch occurs in the middle of AVX code? The OS will switch context including YMM registers. But even if the upper are all zero, wouldn't the cpu remain in the other state? So context switches might lead to penalties for process A without any influene of the programmer. Or is there something I missunderstood?
This scenario just came to my mind and I don't know how one could solve this. Or is there a possibility for the OS to avoid this problem?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An AVX enabled OS is supposed to protect from and hide all upper register contents during context switches. So, if the OS supports AVX properly, it can also run "legacy" SSE jobs without interactions with AVX jobs. If you run an OS which doesn't support AVX (Windows XP, Win7 without SP, Red Hat 5 come to mind), your concerns about AVX are well founded.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>you could use a synchronization object, for example a Critical Section, to prevent the context switching. Does it make sense?>>>
Can you prevent the context switching when the Scheduler code runs at DPC level way above the normal thread execution level? Sometime hardware event will trigger an ISR and all the normal activity below or equal to DIRQL will be postponed.
IIRC floating point state(including xmm and ymm register context) will be saved KPCR structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, lets say the OS supports it, otherwise I does not make sense, as even calculation errors might appear.
But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?
I know small real times OS. They, for example, save all registers and CPU state register on task stack and restore it from another task stack. Then processing time is passed to this task. But internal AVX or SSE state of CPU is not in a register, is it? So simply changing register values for the according task might not be enough.
Or can this state also be saved in restored by special cpu instruction etc? If so, then I agree, that all conditions for a proper context switch are full filled.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>
Windows OS will save processor context which includes also SSEn registers into special data structure which is called "Kernel Processor Region Control Block" if you know how to work with windbg you can dump this structure with the following commands "!pcr" and dt nt_!KPCR "address of pcr" and look for the pointer to KPRCB structure.IIRC KPRCB should contain saved SSE registers context.There is also special routine called "KeSaveFloatingPointState" which is called by the driver and is used to store volatile floating point context.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapolak wrote:
>>>But I do not really understand how OS can hide this? I mean the context switch is transparent for the software and this accounts also for the AVX registers. But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>
Windows OS will save processor context which includes also SSEn registers into special data structure which is called "Kernel Processor Region Control Block" if you know how to work with windbg you can dump this structure with the following commands "!pcr" and dt nt_!KPCR "address of pcr" and look for the pointer to KPRCB structure.IIRC KPRCB should contain saved SSE registers context.There is also special routine called "KeSaveFloatingPointState" which is called by the driver and is used to store volatile floating point context.
I think that SSE context is saved in FXSAVE_FORMAT structure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>It is impossible to answer Yes or No. But, If a priority of the thread is raised to Time Critical all threads with lower priorities will be preemted.>>>
Yes you are right.
>>>On Windows XP, for example, mouse and keyboard>>>
If you mean a driver's routine which is servicing a hardware mouse or keyboard event is not the same as a thread's priority.User mode threads usually run at IRQL = passive level and cannot block ISR DPC routine.Only thread which run in kernel mode can raise IRQL above DPC level and preempt the execution of vital system code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>But the state of the CPU concerning AVX, SSE must be saved, too. How should the OS do this?>>>
By saving the content of the SSE and AVX registers to the special structure probably this structure "FXSAVE_FORMAT"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>>Unfortunately, we're moving away from AVX related issues...>>>
Sometime in the heat of discussion small deviations from the main topic are unavoidable:)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the inputs,
I looked in the manual you suggested in the other thread. Now I understand it much better.
Modern Intel CPUs are quite complex (yes, my comparision with an AVR 8 bit was not very lucky). And all the states including SSE and AVX and FPU state (and for sure a lot of other things) can be stored. Thus, context switches that save and restore this information and the registersmanage everything correct.
Sorry for the late answer, but I was away a few days and had to do some things for university.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would recommend you to read Windows Internals books.You can find there a lot of deep technical information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you recommend anything special or a certain one?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One more question: I have been looking for information about exact transition penalty information in cycles. Unfortunaltey I could not find any, although I have in mind something about 60-80 cycles. Where do I find this information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
iliyapola,
I think I did not state my question clear enough. I am talking about AVX state transistion because of mixing avx with sse legacy.
Sergey,
thanks for that, I thought I read it in one of the manuals, did not think of this article. But the manuals do not mention anything is this right? I mean concrete cycle counts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But can I find any concrete informations about cycles for both directions?

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page