Re: can I disable SSE on Mac OS?

md2581 · ‎08-09-2007

Problem:

I got different results on a Mac compared to what I got on a LINUX laptop. On both machines I use the 9.1 compiler with default compile flags. My program is small and uses BLAS to compute a few *gigantic* matrix products. The program worked fine on LINUX, but behaved "weirdly" on the Mac. It seemed to be some kind of precision problem, simply from looking at iterations printed on the screen. So I have been working with compiler options all day to see if I could make the Mac program behave identically to the LINUX program, without luck.

Inverse solution:

Then I started to study this forum, and learned about SSE. Now, I can force my LINUX to behave equally bad by enforcing the -xP option. aha! So it seems that using the SSE reduces precision enough to make my program go bad.

Question:

Is there any way I can have the Mac behave without the SSE?

Or, will compiling the program with IFORT 10.0 solve the problem somehow?

Steven_L_Intel1 · ‎08-10-2007

What you're seeing on the Linux side is the use of the old x87 floating point registers and the tendency for single precision computations to be computed in double. This will give you inconsistent results, though the results tend to be "better" than if everything was done in declared precision.

I'm not aware that there is a way of disabling the use of SSE on the Mac compiler. I suggest that you look at using double precision more consistently (watch especially the use of single precision literals) so that your application behaves well on all platforms.

TimP · ‎08-10-2007

If you have written a matrix multiplication the normal way, with dot product accumulations, x87 code gives you useful extra precision, at significant expense in performance. You would have several options to regain the accuracy in SSE code, e.g. forcing the matrix multiply into double precision:
c = matmul(a,real(b,kind(1d0)))
probably faster as
c(i,j) = dot_product(a(i,:),real(b(:,j),kind(1d0)))
(write this out in Fortran 77 if you prefer, you may find a way to vectorize with ifort 10)
or (as in many commercial applications) make double precision copies of your arrays and use dgemm from a BLAS library such as MKL.

md2581 · ‎08-14-2007

Thanks for the comments. All variables and constants are declared as double precision. It seems that my application really needs the extra precision that the FPU gives, i.e. inconsistent 80bit. I understand that inconsistency is an issue, but still, my machine is equipped with FPU and I should be able to use it. Regarding speed, the Mac program (on a 64bit 3GHz, using SSE) is really not that much faster than my T60 laptop (32bit 2.3GHz, using the FPU). And it really doesn't matter how fast a not-working version of a program is. I read somewhere that Intels mission was making the Mac compiler not being a subset of the Linux compiler. Now I'm looking into exotic ventures with "extended precision BLAS" which is supposed to do e.g. DGEMM with arbitrary internal precision (while my FPU is asleep).

Steven_L_Intel1 · ‎08-14-2007

Your FPU is not "asleep". There's a lot of transistors on the chip dedicated to SSE processing. Please understand that the use of the x87 registers can yield inconsistent results, depending on when the compiler chooses to round to declared precision. It also may depend on what precision mode the OS initializes the x87 instructions to - I know this is different on Linux and Windows.

I'll ask the developers to see if there is a way to do this.

Steven_L_Intel1 · ‎08-14-2007

Ok, here's the story. Our C++ compiler supports a switch "-fp-model extended" which does what you want, but the Fortran compiler does not. I'm not sure why. I've been asked to ask you to submit a Feature Request to Intel Premier Support asking for it. If you can supply a test program that demonstrates the difference, it would be helpful.

You can try to see if the switch -mp helps. Our code generator people really don't want you to use that (it does horrible things to optimization.

In the meantime, I'll go back to my comment that relying on some calculations being done in 80-bit will lead to an unstable application whose results will likely change as optimizations change. I stronly recommend finding a different solution than relying on 80-bit x87 arithmetic.

md2581 · ‎08-14-2007

Thank you. I sincerely appreciate and trust your recommendations, I myself am very surprised to learn that my application was relying on something that could be inconsistent between compilations (I actually thought that all the flops were done in 64 bit precision). Now I just have to figure out how to do extended precision linear algebra, which I understand is not the compiler's problem.

thanks again. :) -

Steven_L_Intel1 · ‎08-14-2007

We do support REAL(16) which you can use for the specific calculations where you need more precision. It is done in software and will be much slower than single or double precision. This may be a better bet for you.