Software Archive
Read-only legacy content
17061 Discussions

Confusing results for L2_DATA_READ_MISS_MEM_FILL from vtune

Alastair_M_
New Contributor I
539 Views

Dear all,

In trying to profile the cache performance of an application and noticed something strange in the Vtune results.

vpshufd instructions seem to have positive values for L2_DATA_READ_MISS_MEM_FILL when the source and destination operands are registers.

Address    Source Line    Assembly    L2_DATA_READ_MISS_MEM_FILL  CPU_CLK_UNHALTED 
0x407afa    367    vpshufd $0x44, %zmm26, %k0, %zmm27    1,600,000    24,000,036   

I noticed this statement about this event in the KNC PMU events reference "Can include promoted read misses that started as CODE accesses"

Is this likely to be the reason for this?  If so, what does it actually mean?

Best regards,

Alastair

0 Kudos
4 Replies
TimP
Honored Contributor III
539 Views

Did you look for memory data dependencies, including these registers?

0 Kudos
McCalpinJohn
Honored Contributor III
539 Views

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.
 

0 Kudos
Alastair_M_
New Contributor I
539 Views

Tim Prince wrote:

Did you look for memory data dependencies, including these registers?

Hi Tim,

Thanks for your response.  The zmm register in question is loaded from an _mm512_mask_i32logather_pd intrinsic. 

Does this mean that the L2 miss might originate from there?  Would those misses not show up assigned to the gather?

Best regards,

Alastair

0 Kudos
Alastair_M_
New Contributor I
539 Views

John D. McCalpin wrote:

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.

 

 

Hi John,

Thanks for replying.  That is a good point, I will go back and look at the code to see if there are any likely candidates.  I mentioned in my other reply this register is loaded from a gather so I will see if that is nearby.

Best regards,

Alastair

0 Kudos
Reply