Poor performance with underflows

john_lord · ‎04-17-2003

My software implements an algorithm which is prone to floating point underflows. I get much better performance on a SUN workstation, when underflows are set to zero, than on an Intel-based PC. Can anyone suggest the best choice of CVF compiler options to get the optimal performance on the PC? Or do I need to consider modifications to the algorithm to achieve good performance on an Intel processor?

Steven_L_Intel1 · ‎04-17-2003

/fpe:0 or Floating-Point Exception Handling:0. Note that this will also make overflows and zerodivides an error rather than giving exceptional values.

Steve

TimP · ‎04-17-2003

If you have a choice in your algorithm, minimizing the frequency of stores to memory and reloads will help greatly, if you are using x87 code such as CVF generates. Use scalar variables for temporary stores within a loop. Switching from single to double precision should help significantly.

Assuming that you are running on a P4, which is particularly sensitive to this problem (next to IA64), the simplest way to improve performance is to use a compiler which generates SSE code, such as IFL -QxW, and use the abrupt underflow options (IFL -O3 or -Qftz).

Steven_L_Intel1 · ‎04-17-2003

I got mail from one of our local experts in this area and he reminded me that on X86, flushing underflows to zero is slower than allowing the denorms. The opposite is true of Itanium systems. Combine this with Tim Prince's good advice.

Steve