Bug report

pascalc · ‎07-15-2012

I would like to report the following behavior.

[cpp]#include #include void test(double const * A) { double M[9]; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } std::copy(M, M + 9, std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } int main() { double A[16]; for (int k = 0; k < 16; ++k) A = k; test(A); return 0; } [/cpp]

In debug mode the ouput is:

0 1 3 4 5 7 8 9 11

which is the expected behavior. In Release mode however, the output is:

0 0 1 4 4 5 8 9 11

I am using ICLVersion 12.1 Build 20120130 with Visual Studio Version 10.0.40219.1 SP1Rel on Windows 7 running on an HP EliteBook 8540w (Intel Core i7 Q840 CPU).

Hubert_H_Intel · ‎07-15-2012

I do not see a problem with the release build. The output is the same as for the debug build
Could you please provide the full options list you used for project build?
Regards,
Hbuert.

pascalc · ‎07-16-2012

Compiler flags:

/Zi /nologo /W3 /O2 /Oi /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\intel_compiler_bug.pch" /Fa"Release" /Fo"Release" /Fd"Release\vc100.pdb" /Gd

Linker flags:

/OUT:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.exe" /INCREMENTAL:NO /NOLOGO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MANIFEST /ManifestFile:"Release\intel_compiler_bug.exe.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /PGD:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.pgd" /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X86

I created the project from scratch so these should be the default options.

Hubert_H_Intel · ‎07-16-2012

The option /fp:presice is causing the issue. With default /fp:fast you don't see the problem. Do you really needvalue safetyin your application?
Hubert.

pascalc · ‎07-16-2012

This answer is surprising -- and also incorrect, with /fp:fast I now get :

0 0 0 4 4 4 8 8 8

pascalc · ‎07-16-2012

Sorry, my bad, the answer is now correct with /fp:fast in this example.

However, I doubt this is the cause of the problem, because I found this bug in a project that is actually compiled with /fp:fast. Moving things around indeed make the bug disappear, including slightly shifting code.

In any case I would expect a correct behavior with any reasonable compiler option.

Hubert_H_Intel · ‎07-16-2012

Thanks for clarification. Let me investigate further.
Hubrt.

Hubert_H_Intel · ‎07-16-2012

It looks like an optimizer/vectorizerbug. I tested the current Intel Composer XE 2011 Update 11 (Compiler XE 12.1 Update 5). The problem is that thearray M is not being carried out correctly from the nested for loops although it's function global.
Workarounds would be todisable the vectorizer for the inner loop (add #pragma novector in front of the inner for loop) or use /O1 for the whole funtion test (add#pragma optimize("", off) / #pragma optimize("", on)) around the function.
Did you see the problem recently only (with acompiler update) or was it existing for longer time?
Hubert.

pascalc · ‎07-17-2012

Thank you for your answer. I could not tell if this bug is present in earlier versions of the compiler -- I found this bug while attempting to compile with Intel Compiler a project that has always been compiled with Visual C++ compiler so far.

Fixing a function once the bug has been found to affect it is indeed easy. The solutions you propose work fine. I found that changing the code into

[cpp] for (int i = 0; i < 3; ++i) { int i0 = (i < 2 ? i : i+1); for (int j = 0; j < 3; ++j) { M[i+3*j] = A[i0 + 4 * j]; } } [/cpp]

works also fine. However my concern is to make sure this bug does not affect other functions silently. If you would have a more specific description of the bug and guidelines to avoid it, that would be great, because if I understand correctly the only safe way right now is to completely disable vectorization or /O2 mode.

Hubert_H_Intel · ‎07-17-2012

From the coding (and for the autovectorizer/optimizer) perspective it's better anyway to "outsource" the termary operator and assign the value to a temp var and use it in the loop.

Disabling the vectorizer or high optimizer as a workaround should be applied for the respective loops/functions only. Switching them off globally may hurt the overall performance significantly.

But it's definitely a bug in the Intel Compiler; it should calculate the code and optimize/vectorize correctly in any case. I'm going to file a defect.

The optimizer workaroundon function levellooks like:

[cpp]#pragma optimize("", off) void test(double const * A) { double M[9]; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } std::copy(M, M + 9, std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } #pragma optimize("", on)[/cpp]

The workaround for disabling the vectorizer (on inner loop) looks like:

[cpp]void test(double const * A) { double M[25]; for (int i = 0; i < 5; ++i) { #pragma novector for (int j = 0; j < 5; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } [/cpp]

I hope this helps. I'll let you know once I have news about a bugfix.
Regards,
Hubert.