FP Single Precision Packed SIMD operations in DP code

schorscherl · ‎07-01-2004

Hi,

This is an excerpt from a VTune (for linux) sampling activity with
'Packed Single-precision Floating-point Streaming SIMD Extension Instructions Retired' and 'Clockticks':


0x1eef6 219         1842  5006            for ( k=0; k "smaller than" n; k++ )                                          
        220         0     0                 {                                                            
0x1efd9 221         64438 12381               sre+=(m1r*m2r+m1i*m2i);                        
0x1f002 222         13996 4692                sim+=(m1r*m2i-m1i*m2r);                        
        223         0     0                 }                                                            
0x1f3cc 224         0     41              rr=sre; ri=sim;                                          

                    ^
                    SIMD
                          ^ 
                          Clockticks

(view in monospace font to make sense of it).

k,n,j are integers, the rest is double scalars or pointers - no
float involved. Nevertheless, VTune counts abovementioned events,
and they make up for a significant fraction of the overall FLOPs
count (measured by another tool by summing up x87, packed &
scalar SP & DP SIMD).

So what is going on here?

TIA,
Georg.