## A newbi question about where the cache miss could end up

I'm teachable and will to do my homework, but, it has been a while since I looked closely at CPU perfromance.

I'm looking at t little bit of rendering code that is stepping down a scanline and
seeing if it is done with a pixel,
then getting the next texture value - dq

After the texture fetch it looks up the color in a 4096 RGBA lut and
tests if the color is zero. - test fzsl - this test takes 28% of the total time

I'm sure there are many ways this could be improved, but, my task now is just to understand it.
I've done several runs and the numbers at each statement are pretty stable.
So, can the test of fzsl be where the cache misses catch up? That is 28% of the total time is spent there.
Only 6% wher the fetch is initiated and another 8% when the result is used to look up the LUT.
Or, are these number really just more of a statistical neighborhood heisenberg type number and not specifically about the statements.

```for ( int x=s_x; x!=e_x; x+=n_x )      1.52946  15%
p += dx;                   0.64849       7%
int ss = sr[ x ];                0.0199 if ( ss==-1 ) continue;         0.18914 short sz = (ss>> 0) & 0xffff;       0.11956 short ez = (ss>>16) & 0xffff;       0.00992 if ( (sz>z) || (ez int dat = (dq[ p ]) & 0xffff;     0.585804            6%          <<<<<<< texture fetch             int msk_val = dat & 0x0000f000;      0.00993 __m128 m_lut = _mm_load_ps( &h_lut_buf[ dat<<2 ] );  0.75983988   8% <<<< LUT lookup int fzsl = _mm_ucomilt_ss( m_lut, m_zsl );  if ( fzsl )                     2.36337947  28%          <<<<<<<<<<< test fxsl  continue;Thank you for any breadcrumbs!YON ```  