Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Li__Yang
Beginner
39 Views

strange loads and stores analysis

I am trying to optimize my code. I use memory access analysis and pick up the top line function in the CPU time order. 

i open the function in source code and assembly code. At every line analysis, i found something strange.

The biggest time consumer line is a sample punpcklbw assembly code and the code have large loads/stores. I think it is impossible, the compute just access a xmm register. The code and analysis has upload as a image.

The block 70 assembly code is "if (left)" c code branch.

The  "punpcklbw  xmm2, xmm2" code do not access memory. So why this line has large loads/stores ?

so who can help me for the analysis result? and how can i decrease the time consume for this code block?

This block is biggest time consumer in the biggest time consumer function.

 

 

 

 

0 Kudos
1 Reply
Dmitry_R_Intel1
Employee
39 Views

You should take into account event skid:

https://software.intel.com/en-us/vtune-amplifier-help-hardware-event-skid

 

For clockticks event which is used for CPU Time metric the skid is usually one instruction (but could be more in some cases). So most likely the most time consuming instruction is the one before the "punpcklbw  xmm2, xmm2".

 

Also processor has a limited set of 'precise' events which do not suffer from skid problem: https://software.intel.com/en-us/vtune-amplifier-help-precise-events

Memory access analysis contains several metrics like 'Loads', 'Stores', 'LLC Cache Misses' which are built on precise events and therefore should point exactly to instruction that caused them.