Software Tuning, Performance Optimization & Platform Monitoring
Discussion around monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform monitoring
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.

Clarification: Sandy Bridge Load Latency

Nathan_K_3
Beginner
134 Views

I'm confused by a passage in the Intel Architecture Optimization Manual about load latencies:

2.2.5.2 L1 DCache - Loads

The common load latency is five cycles. When using a simple addressing mode, base plus offset

that is smaller than 2048, the load latency can be four cycles. 

Table 2.12

Data Type/Addressing Mode    Base + Offset > 2048;   Base + Offset < 2048 

                                                Base + Index [+ Offset]  

Integer                                                   5                                    4

MMX, SSE, 128-bit AVX                        6                                   5

X87                                                        7                                   6

256-bit AVX                                           7                                   7

I'm not sure how to interpret this.  Adding some parentheses for clarity, is the faster case ((Base + Offset) < 2048), a condition that user code is unlikely to achieve, or (Base + (Offset < 2048)), something that can often be accomodated?  

0 Kudos
1 Reply
Patrick_F_Intel1
Employee
134 Views

Hello Nathan,

Sorry to take so long to reply... end/start of year deadlines, etc.

I think this section of the manual is differentiating instructions like 'mov edx,[eax+4]' and 'mov edx,[eax+4096]'. The +4 case (displacement==4) should load in 4 clocks. The +4096 case (so the displacement is 4096) should load in 5 cycles.

Hope this helps... someone... if it is not too late for you.

Pat

Reply