Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5025 Discussions

Effect of FPSWA on code performance

Is there a generic characterization describing how much FPSWA's can affect the performance of an application?
Iam using RHEL3.0 U1 with Intel compilers V8.0. Even ifort/icc generate FPSWA's when compiling some codes.
How much should I be worried about FPSWA's?
thank you.
warm regards,
0 Kudos
2 Replies
Honored Contributor III

The easiest way to evaluate effect of FPSWA's is to run your application in both modes, assuming that you don't care about separating the effect of SIR stalls from FPSWA events. However, it is quite possible for the (much more frequent) SIR stalls to take more time than FPSWA. Both events are eliminating by setting ftz. You can change the flush_to_zero mode by executing the intrinsic in , or, with Intel compilers which support it, by switching the compiler flag for main() between -ftz (abrupt underflow) and -ftz- (gradual underflow). For gcc, abrupt underflow is part of the -ffast-math package.

ifort -O3 implies -ftz, so you may want to set -ftz/-ftz- yourself, to over-ride what is implied by the optimization level.

Time spend in FPSWA's is likely to be included in the difference between your user and system time. So, if that difference is negligible, you needn't be concerned about FPSWA's.

SIR stalls typically take 2 cycles, FPSWA's hundreds. How much effect they have,according to their relative frequency,is strongly data and code dependent.
I don't know why EL3 chooses to throw up the FPSWA events on the screen. Given that EL3 doesn't support X server or alternate consoles on my box, this is particularly annoying. Yes, I've heard of VNC.
The 7.1 compilers do aboutthe same as 8.0. The compilers don't have enough FPSWA's to affect compile time, under normal circumstances, in fact the time spent putting them on the console could be more than the time they take for the events.
0 Kudos
note: the number of cycles the pipeline is stalling due to the fpswa can see with the event BE_L1D_FPU_Bubble.fpu. the frequency of the sir stalls (safe instruction retirement) can be counted with the events Fp_False_SIRstall and fp_True_SIRstall..these count the numbers of times the SIR stalled the pipeline due to numerical values on the edge but no fpswa assistance was required (false) and the number of times the fpswa was really needed (true).
This tends to be dominated by denormalizations in single precision fp codes and the stalls can be eliminated by the flush to zero compilation
0 Kudos