Community
cancel
Showing results for 
Search instead for 
Did you mean: 
tthsqe
Beginner
118 Views

mul instruction latency

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?

0 Kudos
3 Replies
tthsqe
Beginner
118 Views

Quoting - tthsqe

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?


If I'm not mistaken, the intel processors can do a 32 bit multiplyin 4 cycles. Given four of these multipliers anda 128 bit adder, wouldn't it then be easy do a64 bit multiply in 4+2 cycles?


x2 . x1
* y2 . y1
-------------------
[x2*y2].[x1*y1]
+ [x1*y2]
+ [x2*y1]

capens__nicolas
New Contributor I
118 Views

Quoting tthsqe

Does the multiply instruction really have a latency of 10-15 cycles on the newer intel processors? I think the opteron cantake the 128 bit product of two qwords in 5 cycles ...why the big gap?

Are you sure you're looking at the numbers for the Core architecture(s)? On NetBurst (Pentium 4) mul took 10+ cycles, but on Core it should be around 4 I believe.

The optimization guide with the processor model numbers can be quite confusing...

Maxim_L_Intel1
Employee
118 Views

Hello, I understand you are speaking of MUL r64, RAX instruction producing 128-bit result in RDX:RAX it is only 3-cycle latency for low 64-bit RAX part of the result (sameas for most of the rest of integer multiplies scalar and SIMD ones) and 7-cycle latency for the high 64-bit (RDX) part of the result. This instruction is used in the long precision integer arithmetic where latency of high 64-bit part of the result can be hidden, what is your usage of it?

Thank you,

-Max