04-04-2009 06:01 PM
It is partial register access, but Intel micro-architecture does not have partial register stall issue as bad as original P6 had for a long time already, FrontEnd just issues sync uop to collect value of eax before uop for the last instruction as if code had one more instruction.
04-05-2009 11:59 AM
I should add though that we encourage you to avoid coding with writing to AH/BH/CH/DH registers followed by read of any other part of same register or in general avoiding any kind of write the part then read the full register is the most efficient coding.
Use MOVZX (or MOVSX) instruction if you need to convert 8- or 16-bit unsigned (or signed) value to 32/64-bit one.
Use explicit shifts if you need to combine data in a one register:
movzx eax, byte ptr [esi+A]
movzx ebx, byte ptr [esi+B]
shl eax, 8
or eax, ebx-Max
New Contributor II
04-27-2009 03:13 AM
later read from the whole register or a bigger part of it.
Any use of the high 8-bit registers AH, BH, CH, DH should be avoided because it can cause false dependences and less efficient code. Prevent false dependences by writing to a full register rather than a partial register.
One should be aware of partial stalls whenever there is a mix different data sizes (8, 16, and 32 bits).
We don't get a stall when reading a partial register after writing to the full register, or a bigger part of it as -
mov eax, [mem32]
add bl, al ; No stall
add bh, ah ; No stall
mov cx, ax ; No stall
mov dx, bx ; Stall
The easiest way to avoid partial register stalls is to always use full registers and use MOVZX or MOVSX when reading from smaller memory operands.