Option -xcore-avx2

Maicon_A_ · ‎04-18-2016

Hello everyone!

I'm using ICC version 2016.1.150. I found some drawbacks when compiling for processors endowed with AVX2. When compiling with O0 and -xcore-avx2, code runs without problems. However, when compiling with O1 and -xcore-avx2, code runs, but presents a wrong result.

Furthermore, when compiling with O1 (O2 or O3) and -xavx, code produces a right result. In other words, I think ICC is producing an optimized assembly code that affects program execution behavior and, consequently, the final result.

At last, I tested with GCC 4.8.2 and did not find any problem when compiling with -mavx2 and O3.

Thanks!

TimP · ‎04-18-2016

There were bug fixes in 16.0.2 but in general avx2 code generation has been better than those old gcc releases. We would need a specific example. If you care it's possible to demonstrate the special roundoff of fma

KitturGanesh · ‎04-18-2016

Hi Maicon,
Also, if it involves floating point consistency issues, then you should use the option "-fimf-arch-consistency=true" option and try it out. This option will give the same results on all processors of the same architecture as well.

Kittur

Maicon_A_ · ‎04-19-2016

Thanks Kittur and Tim for replying my question.

I tried a few minutes ago compiling with "-fimf-arch-consistency=true", but, unfortunately, results still the same. It worth to mention that O1 flag does not enable auto-vectorization. Thus, I can not understand why program results changes when using -xavx or -xcore-avx2 which are instructions set for vectorization.

More specifically, my application performs several floating point calculations. I notice some calculations results in a not a number (-nan) when compiling with -xcore-avx2, while results are correct when compiling with -xavx or "-xcore-avx2 and O0".

Another doubt: what are optimization flags enabled by O1? I read ICC manual and did not find this information. If I knew that information, I could disable one-by-one in order to find out what optimization option is causing the problem.

Thanks again!

Maicon

TimP · ‎04-19-2016

Targeting a newer instruction set than your CPU supports is always a problem even without vectorization. For example, avx2 optimization with fma applies to scalar operations and would give the illegal instructions fault up through ivy bridge.

I try never to set sse4.2 when sse4.1 runs faster on westmere or avx-i which has no advantage other than to fault when run on sandy bridge.

KitturGanesh · ‎04-19-2016

Hi Maicon,
Without knowing what the test case is like it's hard to see what's going on. If FMAs are involved, then an FMA may give a different result from multiply and addition operations. The compiler would try to use an FMA replacing a multiply and add operations for example at -O1 and above in addition to avx2 etc leading to different results at different optimizations. The only solution for such a scenario is to disable FMA explicitly.

You can try the following options and give it a shot:

1) -fp-model precise -no-fma -fimf-arch-consistency=true

2) -fp-model strict (also disables but has wider impact)

If the above still doesn't resolve then you should attach a small reproducer for us to try out and file an issue accordingly - appreciate much.

Thanks,
Kittur

Maicon_A_ · ‎04-19-2016

Hi Kittur and Tim!

It works when compiling with -xcore-avx2 -no-fma -prec-div -prec-sqrt.

As you said, I think is a problem related to FMA e float precision.

I will test another options (fp-model) in order to see whether performance rises.

Thank you very much!

Maicon

KitturGanesh · ‎04-19-2016

Great, glad to know it resolved your issue. In the next 17.0 compiler release there'll be just one switch with those options that you can use to achieve the same with disabling FMA, fyi.

Kittur

TimP · ‎04-19-2016

It seems in my testing of 17 that prec-div has come on unexpectedly. I couldn't find documentation of any changes there. Typical that beta test features don't get documented in time.

KitturGanesh · ‎04-19-2016

Hi Tim, can you file a separate issue (like in IPS) against the beta product on the doc change you're referring to so it's triaged and filed with the developers and recorded as part of the beta, appreciate much.
Cheers,
Kittur

TimP · ‎04-19-2016

Ips 6000160333

KitturGanesh · ‎04-20-2016

Hi Tim, thanks much for filing that issue in IPS which is already being addressed by Devorah - appreciate much.

Kittur