Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Bug report

pascalc
Beginner
541 Views

I would like to report the following behavior.

[cpp]#include #include void test(double const * A) { double M[9]; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } std::copy(M, M + 9, std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } int main() { double A[16]; for (int k = 0; k < 16; ++k) A = k; test(A); return 0; } [/cpp]

In debug mode the ouput is:

0 1 3 4 5 7 8 9 11

which is the expected behavior. In Release mode however, the output is:

0 0 1 4 4 5 8 9 11

I am using ICLVersion 12.1 Build 20120130 with Visual Studio Version 10.0.40219.1 SP1Rel on Windows 7 running on an HP EliteBook 8540w (Intel Core i7 Q840 CPU).

0 Kudos
9 Replies
Hubert_H_Intel
Employee
541 Views
I do not see a problem with the release build. The output is the same as for the debug build
Could you please provide the full options list you used for project build?
Regards,
Hbuert.
0 Kudos
pascalc
Beginner
541 Views

Compiler flags:

/Zi /nologo /W3 /O2 /Oi /Qipo /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /GS /Gy /fp:precise /Zc:wchar_t /Zc:forScope /Fp"Release\intel_compiler_bug.pch" /Fa"Release" /Fo"Release" /Fd"Release\vc100.pdb" /Gd

Linker flags:

/OUT:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.exe" /INCREMENTAL:NO /NOLOGO "kernel32.lib" "user32.lib" "gdi32.lib" "winspool.lib" "comdlg32.lib" "advapi32.lib" "shell32.lib" "ole32.lib" "oleaut32.lib" "uuid.lib" "odbc32.lib" "odbccp32.lib" /MANIFEST /ManifestFile:"Release\intel_compiler_bug.exe.intermediate.manifest" /ALLOWISOLATION /MANIFESTUAC:"level='asInvoker' uiAccess='false'" /DEBUG /PDB:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.pdb" /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /PGD:"Visual Studio 2010\Projects\intel_compiler_bug\Release\intel_compiler_bug.pgd" /LTCG /TLBID:1 /DYNAMICBASE /NXCOMPAT /MACHINE:X86

I created the project from scratch so these should be the default options.

0 Kudos
Hubert_H_Intel
Employee
541 Views
The option /fp:presice is causing the issue. With default /fp:fast you don't see the problem. Do you really needvalue safetyin your application?
Hubert.
0 Kudos
pascalc
Beginner
541 Views

This answer is surprising -- and also incorrect, with /fp:fast I now get :

0 0 0 4 4 4 8 8 8

0 Kudos
pascalc
Beginner
541 Views
Sorry, my bad, the answer is now correct with /fp:fast in this example.
However, I doubt this is the cause of the problem, because I found this bug in a project that is actually compiled with /fp:fast. Moving things around indeed make the bug disappear, including slightly shifting code.
In any case I would expect a correct behavior with any reasonable compiler option.
0 Kudos
Hubert_H_Intel
Employee
541 Views

Thanks for clarification. Let me investigate further.
Hubrt.

0 Kudos
Hubert_H_Intel
Employee
541 Views
It looks like an optimizer/vectorizerbug. I tested the current Intel Composer XE 2011 Update 11 (Compiler XE 12.1 Update 5). The problem is that thearray M is not being carried out correctly from the nested for loops although it's function global.
Workarounds would be todisable the vectorizer for the inner loop (add #pragma novector in front of the inner for loop) or use /O1 for the whole funtion test (add#pragma optimize("", off) / #pragma optimize("", on)) around the function.
Did you see the problem recently only (with acompiler update) or was it existing for longer time?
Hubert.
0 Kudos
pascalc
Beginner
541 Views

Thank you for your answer. I could not tell if this bug is present in earlier versions of the compiler -- I found this bug while attempting to compile with Intel Compiler a project that has always been compiled with Visual C++ compiler so far.

Fixing a function once the bug has been found to affect it is indeed easy. The solutions you propose work fine. I found that changing the code into

[cpp] for (int i = 0; i < 3; ++i) { int i0 = (i < 2 ? i : i+1); for (int j = 0; j < 3; ++j) { M[i+3*j] = A[i0 + 4 * j]; } } [/cpp]

works also fine. However my concern is to make sure this bug does not affect other functions silently. If you would have a more specific description of the bug and guidelines to avoid it, that would be great, because if I understand correctly the only safe way right now is to completely disable vectorization or /O2 mode.

0 Kudos
Hubert_H_Intel
Employee
541 Views

From the coding (and for the autovectorizer/optimizer) perspective it's better anyway to "outsource" the termary operator and assign the value to a temp var and use it in the loop.

Disabling the vectorizer or high optimizer as a workaround should be applied for the respective loops/functions only. Switching them off globally may hurt the overall performance significantly.

But it's definitely a bug in the Intel Compiler; it should calculate the code and optimize/vectorize correctly in any case. I'm going to file a defect.

The optimizer workaroundon function levellooks like:

[cpp]#pragma optimize("", off) void test(double const * A) { double M[9]; for (int i = 0; i < 3; ++i) { for (int j = 0; j < 3; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } std::copy(M, M + 9, std::ostream_iterator(std::cout, " ")); std::cout << std::endl; } #pragma optimize("", on)[/cpp]


The workaround for disabling the vectorizer (on inner loop) looks like:

[cpp]void test(double const * A) { double M[25]; for (int i = 0; i < 5; ++i) { #pragma novector for (int j = 0; j < 5; ++j) { M[i+3*j] = A[(i < 2 ? i : i+1) + 4 * j]; } } [/cpp]


I hope this helps. I'll let you know once I have news about a bugfix.
Regards,
Hubert.

0 Kudos
Reply