Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5038 Discussions

strange loads and stores analysis


I am trying to optimize my code. I use memory access analysis and pick up the top line function in the CPU time order. 

i open the function in source code and assembly code. At every line analysis, i found something strange.

The biggest time consumer line is a sample punpcklbw assembly code and the code have large loads/stores. I think it is impossible, the compute just access a xmm register. The code and analysis has upload as a image.

The block 70 assembly code is "if (left)" c code branch.

The  "punpcklbw  xmm2, xmm2" code do not access memory. So why this line has large loads/stores ?

so who can help me for the analysis result? and how can i decrease the time consume for this code block?

This block is biggest time consumer in the biggest time consumer function.





0 Kudos
1 Reply

You should take into account event skid:


For clockticks event which is used for CPU Time metric the skid is usually one instruction (but could be more in some cases). So most likely the most time consuming instruction is the one before the "punpcklbw  xmm2, xmm2".


Also processor has a limited set of 'precise' events which do not suffer from skid problem:

Memory access analysis contains several metrics like 'Loads', 'Stores', 'LLC Cache Misses' which are built on precise events and therefore should point exactly to instruction that caused them.

0 Kudos