Using LEA instructions

roysykes · ‎02-16-2005

In his useful paper "Assembly Language Tips & Tricks for the Intel Pentium 4 Processor", Khang Nguyen suggested the following:

Using lea Instructions:

mov edx,ecx
sal edx,3

Faster:

lea edx, [ecx + ecx]
add edx, edx
add edx, edx

While I suppose

XOR EDX,EDX

SHRD EDX,ECX,29

has the same issues as first fragment, would not

LEA EDX,[ECX*8] (seven bytes)

or, if you have another spare zero register, say EBX, a denser encoding would be

LEA EDX,[EBX+ECX*8] (three bytes)

Would these not be faster, with the further advantage of changing no flags? /Roy Sykes

Intel_Software_Netw1 · ‎02-22-2005

We will forward your question to the author and let you know what response we receive.

For those following along, here is a link to the article.

Regards,

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

Message Edited by intel.software.network.support on 12-09-2005 10:48 AM

Intel_Software_Netw1 · ‎02-22-2005

The author responded as follows:

The reason the combination of lea and add is faster because it gets away from the shifting instruction.

Lets look at the statement:

LEA EDX,[ECX*8] (seven bytes)

This operation involves multiplication (shifting) which is known to be slowed.

The following statement is even worse:

LEA EDX,[EBX+ECX*8] (three bytes)

This operation also involves multiplication and another operation to set the register EBX to zero.

Hope this helps!

==

Regards,

Lexi S.

IntelSoftware NetworkSupport

http://www.intel.com/software

Contact us

Message Edited by intel.software.network.support on 12-02-2005 08:50 PM

jim_dempsey · ‎06-14-2005

This is for Lexi

The "LEA EDX,[ECX*8]" instructionalthough appears text wise to involve multiplication (*) it doesnot. Also, this does not involve a shift operation. If either were true then your processor wizards would need to go back to school. Simple shift-like operations for *1, *2, *4, *8 are so common that are hardwired into the archetecture. All permutations arealways presenta multiplexer selects the desired result.

Jim Dempsey