Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
8 Views

What happens if DAZ bit is set but isn't supported?

Hello,

I've been profiling some SSE instructions on our target hardware, and have stumbled into the FTZ and DAZ flags.  Turning on the FTZ flag greatly increases speed, and turning on DAZ  increases it a bit more (for that first instruction that gets denormal input). 

This site is awesome, http://software.intel.com/en-us/articles/x87-and-sse-floating-point-assists-in-ia-32-flush-to-zero-f..., and it notes that the DAZ flag was not supported on earlier hardware.  There's even a link to a document that tells me how to check for DAZ support.  Because of curiosity, I have to ask the question: what happens if you try to set the DAZ bit on hardware that doesn't support it?  Did the MXCSR register change?  Was it an unused bit and setting it is just inaffective?

0 Kudos
2 Replies
Highlighted
Black Belt
8 Views

I think I remember CPUs where it was possible to flip the DAZ bit with no effect. According to my understanding, the Corei7-2 architecture is supposed to eliminate the effect of FTZ and DAZ settings on performance in the cases normally encountered.
0 Kudos
Highlighted
Beginner
8 Views

Thanks for the info! My core I7-2600 does handle denormals the same as normal floats for certain instructions. I don't have an extensive list of how they all perform, but I profiled pairs of addps and mulps instructions over 100,000,000 iterations. Here are my results, they're estimates in milliseconds: addps 58.5 normals 58.5 denormals 58.5 FTZ+DAZ 58.5 DAZ 58.5 FTZ mulps 59 normals 8050 denormals 59 FTZ+DAZ 59 DAZ 4120 FTZ I can't complain about that, in fact I'm impressed that addps works just as fast with or without denormals. I was tipped off about the difference of denormal handling between certain instructions from research a man by the name of Bruce Dawson had done, http://www.altdevblogaday.com/2012/05/20/thats-not-normalthe-performance-of-odd-floats/. I've attached the code that is profiled, for anyone who is curious. Addps and Mulps are the important functions, the rest sets MXCSR with the right flags and copys normal/denormal into source.
0 Kudos