Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.

mul instruction latency

tthsqe
Beginner
1,316 Views

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?

0 Kudos
3 Replies
tthsqe
Beginner
1,316 Views
Quoting - tthsqe

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?


If I'm not mistaken, the intel processors can do a 32 bit multiplyin 4 cycles. Given four of these multipliers anda 128 bit adder, wouldn't it then be easy do a64 bit multiply in 4+2 cycles?


x2 . x1
* y2 . y1
-------------------
[x2*y2].[x1*y1]
+ [x1*y2]
+ [x2*y1]

0 Kudos
capens__nicolas
New Contributor I
1,316 Views
Quoting tthsqe

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?

Are you sure you're looking at the numbers for the Core architecture(s)? On NetBurst (Pentium 4) mul took 10+ cycles, but on Core it should be around 4 I believe.

The optimization guide with the processor model numbers can be quite confusing...

0 Kudos
Max_L
Employee
1,316 Views

Hello, I understand you are speaking of MUL r64, RAX instruction producing 128-bit result in RDX:RAX it is only 3-cycle latency for low 64-bit RAX part of the result (sameas for most of the rest of integer multiplies scalar and SIMD ones) and 7-cycle latency for the high 64-bit (RDX) part of the result. This instruction is used in the long precision integer arithmetic where latency of high 64-bit part of the result can be hidden, what is your usage of it?

Thank you,

-Max

0 Kudos
Reply