Clarification: Sandy Bridge Load Latency

Nathan_K_3 · ‎10-28-2013

I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:

2.2.5.2 L1 DCache - Loads

The common load latency is five cycles. When using a simple addressing mode, base plus offset

that is smaller than 2048, the load latency can be four cycles.

Table 2.12

Data Type/Addressing Mode Base + Offset > 2048;   Base + Offset < 2048

Base + Index [+ Offset]

Integer   5 4

MMX, SSE, 128-bit AVX 6 5

X87 7   6

256-bit AVX   7   7

I'm not sure how to interpret this. Adding some parentheses for clarity, is the faster case ((Base + Offset) < 2048), a condition that user code is unlikely to achieve, or (Base + (Offset < 2048)), something that can often be accomodated?

Patrick_F_Intel1 · ‎01-24-2014

Hello Nathan,

Sorry to take so long to reply... end/start of year deadlines, etc.

I think this section of the manual is differentiating instructions like 'mov edx,[eax+4]' and 'mov edx,[eax+4096]'. The +4 case (displacement==4) should load in 4 clocks. The +4096 case (so the displacement is 4096) should load in 5 cycles.

Hope this helps... someone... if it is not too late for you.

Pat