Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.

Clarification: Sandy Bridge Load Latency

Nathan_K_3
Beginner
632 Views

I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:

2.2.5.2 L1 DCache - Loads

The common load latency is five cycles. When using a simple addressing mode, base plus offset

that is smaller than 2048, the load latency can be four cycles. 

Table 2.12

Data Type/Addressing Mode    Base + Offset > 2048;   Base + Offset < 2048 

                                                Base + Index [+ Offset]  

Integer                                                   5                                    4

MMX, SSE, 128-bit AVX                        6                                   5

X87                                                        7                                   6

256-bit AVX                                           7                                   7

I'm not sure how to interpret this.  Adding some parentheses for clarity, is the faster case ((Base + Offset) < 2048), a condition that user code is unlikely to achieve, or (Base + (Offset < 2048)), something that can often be accomodated?  

0 Kudos
1 Reply
Patrick_F_Intel1
Employee
632 Views

Hello Nathan,

Sorry to take so long to reply... end/start of year deadlines, etc.

I think this section of the manual is differentiating instructions like 'mov edx,[eax+4]' and 'mov edx,[eax+4096]'. The +4 case (displacement==4) should load in 4 clocks. The +4096 case (so the displacement is 4096) should load in 5 cycles.

Hope this helps... someone... if it is not too late for you.

Pat

0 Kudos
Reply