Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Flush To Zero and Performance

k_sarnath
Beginner
869 Views
Hi There!

We are developing a supernodal factorizer for a sparse symmetric system. (mainly to understand Intel optimizations and how to go about writing a neat solver..)

We use double precision andwe are ~2.5x slower than an equivalent MKL based system. (on a single core - we run only on single core systems as part of preliminary testing)

We use cache-aligned structures and SSE2.

In this context, I am just wondering what this FTZ (Flush to Zero) isall about.... The solver output does contain numbers with very very less magnitude ( x.yyyE-16 ). Will enabling FTZ give us some performance?

btw, Does MKL internally use single precision whenever possible? (just wondering..) Is there any envmt variables that can control such behaviors?

Thanks for any advice,
Best REgards,
Sarnath

0 Kudos
3 Replies
mecej4
Honored Contributor III
869 Views
0 Kudos
TimP
Honored Contributor III
869 Views
ftz is set by default in ifort. It actually affects only the initialization compiled into the main program. On current and past CPUs, it avoids the extra time spent producing subnormal results ( 0 < abs(x) < tiny(x) ). Unlike Itanium, ftz has no effect on performance when there is no gradual underflow.
MKL would normally be expected to use internally the same precision as the arguments.
Any library function which alters ftz setting (other than those meant for changing the setting) is expected to restore it to what it was at entry. You can change it at run time, at the cost of the function call and pipeline flush.
CPUs to be introduced next year are designed so that gradual underflow doesn't degrade performance in the usual contexts. This should preserve performance of vectorized code without requiring ftz.
0 Kudos
k_sarnath
Beginner
869 Views
Thanks for the link and the reply! It was useful.

I think I need to really block my matrix out completely. (right now ,I create blocks on the fly as the factorization proceeds...) may be, I should consider re-structuring the data structure from scratch...

MKL is realllly ossum.... It looks like a big wall in front of me.... All I have been doing is to bang my head against it all the time.... Grrr......May be, I should read some papers and get some basics right...

Anyway, Thanks to all you guys,

Best Regards,
Sarnath
0 Kudos
Reply