Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Technologies
- Intel® ISA Extensions
- mul instruction latency

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

tthsqe

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-09-2010
10:52 PM

118 Views

mul instruction latency

Link Copied

3 Replies

tthsqe

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-10-2010
11:11 AM

118 Views

Quoting - tthsqe

If I'm not mistaken, the intel processors can do a 32 bit multiplyin 4 cycles. Given four of these multipliers anda 128 bit adder, wouldn't it then be easy do a64 bit multiply in 4+2 cycles?

x2 . x1

* y2 . y1

-------------------

[x2*y2].[x1*y1]

+ [x1*y2]

+ [x2*y1]

capens__nicolas

New Contributor I

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-18-2010
09:17 AM

118 Views

Quoting tthsqe

Are you sure you're looking at the numbers for the Core architecture(s)? On NetBurst (Pentium 4) mul took 10+ cycles, but on Core it should be around 4 I believe.

The optimization guide with the processor model numbers can be quite confusing...

Maxim_L_Intel1

Employee

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-05-2010
12:47 PM

118 Views

Hello, I understand you are speaking of MUL r64, RAX instruction producing 128-bit result in RDX:RAX it is only 3-cycle latency for low 64-bit RAX part of the result (sameas for most of the rest of integer multiplies scalar and SIMD ones) and 7-cycle latency for the high 64-bit (RDX) part of the result. This instruction is used in the long precision integer arithmetic where latency of high 64-bit part of the result can be hidden, what is your usage of it?

Thank you,

-Max

For more complete information about compiler optimizations, see our Optimization Notice.