- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:
2.2.5.2 L1 DCache - Loads
The common load latency is five cycles. When using a simple addressing mode, base plus offset
that is smaller than 2048, the load latency can be four cycles.
Table 2.12
Data Type/Addressing Mode Base + Offset > 2048; Base + Offset < 2048
Base + Index [+ Offset]
Integer 5 4
MMX, SSE, 128-bit AVX 6 5
X87 7 6
256-bit AVX 7 7
I'm not sure how to interpret this. Adding some parentheses for clarity, is the faster case ((Base + Offset) < 2048), a condition that user code is unlikely to achieve, or (Base + (Offset < 2048)), something that can often be accomodated?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Nathan,
Sorry to take so long to reply... end/start of year deadlines, etc.
I think this section of the manual is differentiating instructions like 'mov edx,[eax+4]' and 'mov edx,[eax+4096]'. The +4 case (displacement==4) should load in 4 clocks. The +4096 case (so the displacement is 4096) should load in 5 cycles.
Hope this helps... someone... if it is not too late for you.
Pat
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page