Forcing SSE use for floating point

marcos · ‎02-18-2009

Is there a way to force using SSE instead of FPU for all floating point operations? There seems to be an option to select that on Linux version of ICC, but I cant find it on Windows.

I am using ICC 10.0 on Windows Vista 64.

Thanks in advance.

TimP · ‎02-18-2009

Quoting - marcos

Is there a way to force using SSE instead of FPU
I am using ICC 10.0 on Windows Vista 64.

The Windows X64 compiler doesn't like to use x87 code at all. As on linux, the default is SSE2, but more so, no loopholes, not even /Od or /Op. Don't try /Qlongdouble, let the long doubles be set to double.

marcos · ‎02-19-2009

Quoting - tim18

The Windows X64 compiler doesn't like to use x87 code at all. As on linux, the default is SSE2, but more so, no loopholes, not even /Od or /Op. Don't try /Qlongdouble, let the long doubles be set to double.

Sorry, I forgot to say that I am using the 32 bit compiler, not the 64 bit one.

Also, the problem I am trying to solve (precision issues with float operation) dissapears whenever I increase precision of floating point operation (/fp:double, /fp:extended, ...) but then performance gets hurt... Normally, my command line parameters are:

[cpp]/EHsc /Gd /GS /GR /Qprec /Ob2 /MD -G7 -O3 -QaxW -Qipo[/cpp]

[cpp]
[/cpp]

TimP · ‎02-19-2009

/fp:double forces expressions to be evaluated in double precision. In the 32-bit compiler, this might be done with x87 code, which would be faster than SSE2 promotions and normally give the same result. You should find out where your application requires double and write it in, so you don't suffer the performance loss everywhere.
Up through ICL 10.1, /QaxW generates both an x87 and an SSE2 code path, with the choice made by CPU run-time recognition. AMD CPUs would get the x87 path, so they would often get the effect of /fp:double.
/fp:extended would require expressions to be evaluated by x87, with precision mode set to 64.
These /fp promotions normally would disable vectorization where they take effect.

marcos · ‎02-19-2009

Quoting - tim18

/fp:double forces expressions to be evaluated in double precision. In the 32-bit compiler, this might be done with x87 code, which would be faster than SSE2 promotions and normally give the same result. You should find out where your application requires double and write it in, so you don't suffer the performance loss everywhere.
Up through ICL 10.1, /QaxW generates both an x87 and an SSE2 code path, with the choice made by CPU run-time recognition. AMD CPUs would get the x87 path, so they would often get the effect of /fp:double.
/fp:extended would require expressions to be evaluated by x87, with precision mode set to 64.
These /fp promotions normally would disable vectorization where they take effect.

What I would like to do is to remove any inconsistencies due to the mix of FPU and SSE code, by forcing use of SSE. Something similar to what -fpmath=sse does in gcc (someone told me that option actually exists on ICC linux). Is there something like that on Win32?

TimP · ‎02-19-2009

Quoting - marcos

What I would like to do is to remove any inconsistencies due to the mix of FPU and SSE code, by forcing use of SSE. Something similar to what -fpmath=sse does in gcc (someone told me that option actually exists on ICC linux). Is there something like that on Win32?

You should have seen enough hints by now. If you use Intel 11.x compilers, and don't set any options which imply x87, you get SSE everywhere it doesn't cost performance, and some places where it does. I think the most likely place for 11.x to produce x87 without you asking for it is in complex arithmetic. If you used gcc and set -ffast-math, you would get worse "inconsistencies." If you facilitate vectorization, you may even get SSE2 math functions where you would get more accurate x87 without vectorization. By the way, -fpmath=sse doesn't specify the libraries you get with gcc either, not on Windows, not on linux,....
The inconsistencies, if you call it that, which you get with /fp:double, aren't different between SSE2 and x87.
You seem to be using earlier compilers, with options specifically implying you want various mixtures of x87, so it's hard to know your goal.
The instructions you get with /Qprec-div- /Qprec-sqrt- (as implied by some of your quoted options) are SSE instructions, but they are inconsistent with IEEE standard.

marcos · ‎02-22-2009

Quoting - tim18

You should have seen enough hints by now. If you use Intel 11.x compilers, and don't set any options which imply x87, you get SSE everywhere it doesn't cost performance, and some places where it does. I think the most likely place for 11.x to produce x87 without you asking for it is in complex arithmetic. If you used gcc and set -ffast-math, you would get worse "inconsistencies." If you facilitate vectorization, you may even get SSE2 math functions where you would get more accurate x87 without vectorization. By the way, -fpmath=sse doesn't specify the libraries you get with gcc either, not on Windows, not on linux,....
The inconsistencies, if you call it that, which you get with /fp:double, aren't different between SSE2 and x87.
You seem to be using earlier compilers, with options specifically implying you want various mixtures of x87, so it's hard to know your goal.
The instructions you get with /Qprec-div- /Qprec-sqrt- (as implied by some of your quoted options) are SSE instructions, but they are inconsistent with IEEE standard.

I finally managed to try with ICC 11.0.72, and it fixed the problem.

Thanks.