Software Archive
Read-only legacy content
17061 讨论

Confusing results for L2_DATA_READ_MISS_MEM_FILL from vtune

Alastair_M_
新分销商 I
536 次查看

Dear all,

In trying to profile the cache performance of an application and noticed something strange in the Vtune results.

vpshufd instructions seem to have positive values for L2_DATA_READ_MISS_MEM_FILL when the source and destination operands are registers.

Address    Source Line    Assembly    L2_DATA_READ_MISS_MEM_FILL  CPU_CLK_UNHALTED 
0x407afa    367    vpshufd $0x44, %zmm26, %k0, %zmm27    1,600,000    24,000,036   

I noticed this statement about this event in the KNC PMU events reference "Can include promoted read misses that started as CODE accesses"

Is this likely to be the reason for this?  If so, what does it actually mean?

Best regards,

Alastair

0 项奖励
4 回复数
TimP
名誉分销商 III
536 次查看

Did you look for memory data dependencies, including these registers?

0 项奖励
McCalpinJohn
名誉分销商 III
536 次查看

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.
 

0 项奖励
Alastair_M_
新分销商 I
536 次查看

Tim Prince wrote:

Did you look for memory data dependencies, including these registers?

Hi Tim,

Thanks for your response.  The zmm register in question is loaded from an _mm512_mask_i32logather_pd intrinsic. 

Does this mean that the L2 miss might originate from there?  Would those misses not show up assigned to the gather?

Best regards,

Alastair

0 项奖励
Alastair_M_
新分销商 I
536 次查看

John D. McCalpin wrote:

I have not looked at this on Xeon Phi, but on most processors there is often a bit of skew between the instruction that caused a performance counter overflow and the instruction identified by the interrupt.   I think there is a good chance that the memory reference that incremented the L2_DATA_READ_MISS_MEM_FILL counter event is one or a few instructions upstream of the VPSHUFD instruction.

 

 

Hi John,

Thanks for replying.  That is a good point, I will go back and look at the code to see if there are any likely candidates.  I mentioned in my other reply this register is loaded from a gather so I will see if that is nearby.

Best regards,

Alastair

0 项奖励
回复