Software Archive
Read-only legacy content

Internal compiler error 010101_239

Manuel_P_
Beginner
1,213 Views
Hi guys, I condensed our project down to a piece of code that lets you reproduce the following issue. When I compile this in Release configuration (Debug works), I get this compiler error: 1>------ Build started: Project: ng-gtest, Configuration: Release x64 ------ 1> CilkTest.cpp 1>" : error : 010101_239 1> 1> compilation aborted for General\CilkTest.cpp (code 4) ========== Build: 0 succeeded, 1 failed, 3 up-to-date, 0 skipped ========== This is our compiler: Intel(R) C++ Intel(R) 64 Compiler XE for Intel(R) 64, version 14.0.3 Package ID: w_ccompxe_2013_sp1.3.202 OS: Windows 7, x64. This is the code: #include const int VecSize = 8; const short* acdata; const short* lowdata; const unsigned short* meas_data; const unsigned short* rdval; short trident[2 * VecSize]; short speed[2 * VecSize]; float spdfact[VecSize]; float spdfact2[VecSize]; float tdat[VecSize]; float array1[VecSize]; float array2[VecSize]; const float *input_01; const float *input_02; float val0; float vvv9; float agn; void get_g(float ag[VecSize], const float ae[VecSize], const float* pp) { float a01[VecSize]; a01[:] = ae[:]; if (a01[:] >= 360.0f) a01[:] -= 360.0f; if (a01[:] >= 360.0f) a01[:] -= 360.0f; if (a01[:] < 0.0f) a01[:] += 360.0f; if (a01[:] < 0.0f) a01[:] += 360.0f; int i0[VecSize], i1[VecSize]; i1[:] = static_cast(a01[:]); i0[:] = i1[:] + 1; float g0[VecSize], g1[VecSize]; g1[:] = pp[i1[:]]; g0[:] = pp[i0[:]]; ag[:] = g0[:] - (g1[:] - g0[:]) * (a01[:] - static_cast(i0[:])); } void f(float prlo[VecSize], const int cntr) { float cop[VecSize]; short cod[2 * VecSize]; short maxm[2 * VecSize]; cod[0:VecSize:2] = acdata[cntr:VecSize]; cod[1:VecSize:2] = lowdata[cntr:VecSize]; maxm[0:VecSize:2] = lowdata[cntr:VecSize]; maxm[1:VecSize:2] = acdata[cntr:VecSize]; cop[:] = (1.0f / float(255 * 255)) * static_cast( cod[0:VecSize:2] * trident[0:VecSize:2] + cod[1:VecSize:2] * trident[1:VecSize:2]); float music[VecSize]; music[:] = (1.0f / float(16383 * 16383)) * static_cast( maxm[0:VecSize:2] * speed[0:VecSize:2] + maxm[1:VecSize:2] * speed[1:VecSize:2]); float velo2[VecSize]; float brigh[VecSize]; velo2[:] = static_cast(rdval[cntr:VecSize]) * spdfact[:]; brigh[:] = asinf(velo2[:]); float denom[VecSize]; denom[:] = cop[:] * array1[:] + array2[:] / brigh[:]; float accel[VecSize]; accel[:] = atanf(music[:] / denom[:]); accel[:] = atan2f(accel[:], velo2[:]); bool haMask[VecSize]; haMask[:] = accel[:] < 0.0f; if (haMask[:] & (music[:] > 0.0f)) accel[:] += float(9.81 / 2); if (!haMask[:] & (music[:] < 0.0f)) accel[:] -= float(9.81 / 2); float diff[VecSize]; diff[:] = array1[:] * brigh[:] - array2[:] * cop[:] * velo2[:]; float prod[VecSize]; prod[:] = sinf(accel[:]) * diff[:]; float accel2[VecSize]; accel2[:] = atanf(prod[:] / music[:]); float halter[VecSize] = { 0.0f }; if (cop[:] <= 0.0f) halter[:] = 2.8182963f; float valter[VecSize]; valter[:] = cop[:] > 0 ? velo2[:] - tdat[:] : velo2[:] + tdat[:]; if (music[:] == 0) { accel[:] = halter[:]; accel2[:] = valter[:]; } accel[:] *= float(9.81); accel2[:] *= float(9.81); float v8[VecSize]; v8[:] = 25.7385f - accel2[:]; float hxx[VecSize]; hxx[:] = fabsf(accel[:]) * float(1.38e-23); float hnn[VecSize]; hnn[:] = -accel[:]; float vgg[VecSize], gx2[VecSize], hms[VecSize], sv[VecSize]; get_g(vgg, accel2, input_02); get_g(gx2, v8, input_02); get_g(hms, hnn, input_01); sv[:] = ((1.0f - hxx[:]) * (val0 - vgg[:])) + (hxx[:] * (vvv9 - gx2[:])); float p0[VecSize]; p0[:] = hms[:] - sv[:]; prlo[:] = static_cast(meas_data[cntr:VecSize]) * spdfact2[:] - p0[:] - agn; }
0 Kudos
12 Replies
Barry_T_Intel
Employee
1,213 Views

I've been unable to reproduce the problem. I extracted the code into a file, and issued the following command:

bash-3.2$ icl test.cpp
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0 Build 20140303
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

test.cpp
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
test.obj
LIBCMT.lib(crt0.obj) : error LNK2019: unresolved external symbol main referenced in function __tmainCRTStartup
test.exe : fatal error LNK1120: 1 unresolved externals

So the file was successfully compiled and failed in the linker. The problem may already be fixed. I'm using a nightly compiler build from March 3 which may be newer than the compiler you've got. To be sure, I'll need a Visual Studio build log to reproduce the command line options.

  - Barry

0 Kudos
Brandon_H_Intel
Employee
1,213 Views

I'm using the same compiler with Microsoft Visual Studio* 2013, and I'm not seeing a problem:

 

1>------ Build started: Project: test, Configuration: Release x64 ------

1> icl /Qvc12 "/Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64" /Zi /W3 /O2 /Oi /Qipo /Qftz- -D __INTEL_COMPILER=1400 -D WIN32 -D NDEBUG -D _CONSOLE -D _LIB -D _UNICODE -D UNICODE /EHsc /MD /GS /Gy /Zc:wchar_t /Zc:forScope /Fox64\Release\ /Fdx64\Release\vc120.pdb /TP test.cpp

1>

1> Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.3.202 Build 20140422

1> Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

1>

1> test.cpp

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

 

Can you turn off the Startup banner (/nologo) and see if your command line matches mine above? And are you using a different Visual Studio version?

0 Kudos
Manuel_P_
Beginner
1,213 Views
Hi I think you need vectorization to reproduce the problem. I use Visual Studio Premium 2013. This is my output: 1>------ Build started: Project: ng-gtest, Configuration: Release x64 ------ 1> icl /Qvc12 "/Qlocation,link,D:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64" /Zi /W3 /MP /O3 /Oi /Qip /Qftz -D __INTEL_COMPILER=1400 -D WIN32 -D NDEBUG -D _CONSOLE -D NG_EXPORTS -D NG_DLL_ID=1 -D USE_TBB_PARALLEL -D USE_TBB_RWLOCK -D NOMINMAX -D BOOST_FILESYSTEM_NO_DEPRECATED -D _WINDLL -D _SCL_SECURE_NO_WARNINGS -D BOOST_MULTI_INDEX_DISABLE_SERIALIZATION -D NOMINMAX -D _WINDLL -D _VARIADIC_MAX=10 -D _UNICODE -D UNICODE /EHsc /MD /GS /fp:precise /QxSSE3 /Zc:wchar_t /Zc:forScope /Qstd=c++11 /Qrestrict /Fo.\tmp\msvc_x64_ur\ /Fd.\tmp\msvc_x64_ur\vc120.pdb /TP General\CilkTest.cpp 1> 1> Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.3.202 Build 20140422 1> Copyright (C) 1985-2014 Intel Corporation. All rights reserved. 1> 1> CilkTest.cpp 1> *** Compiling Cilk test code in Debug (fails to compile in Release with w_ccompxe_2013_sp1.3.202. 1>" : error : 010101_239 1> 1> compilation aborted for General\CilkTest.cpp (code 4) ========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
0 Kudos
Barry_T_Intel
Employee
1,213 Views

/fp:precise seems to be the critical option.  I've submitted CQ256573 on this problem.

Thank you for reporting it.

   - Barry

0 Kudos
Manuel_P_
Beginner
1,213 Views

You are right, /fp:precise and /fp:strict don't work. /fp:fast and /fp:fast=2 both work.

Thanks, Barry, for your investigations.

Cheers,

Martin

0 Kudos
Brandon_H_Intel
Employee
1,213 Views

I see it too. Looks like something that broke between the 12.1 and 13.0 compilers. Unfortunately, I don't see any easy workarounds beyond not using /fp:precise, which isn't ideal. When there's progress made on the investigation here, I'll update the thread.

0 Kudos
Manuel_P_
Beginner
1,213 Views

Hi Brandon,

unfortunately it's worse than I thought. Although /fp:fast compiles successfully, the resulting code is buggy.

I was able to get the code to compile with /fp:precise by splitting the function into smaller pieces and using __declspec(noinline), and this binary passed our tests. The binary with /fp:fast caused an access violation, no matter if I split the function or not.

From that I would conclude that there is an issue with CilkPlus which leads to either an internal error or buggy code.

Unfortunately I cannot provide test data at this stage. It's hard to extract this from our test environment.

Regards,
Martin

0 Kudos
Brandon_H_Intel
Employee
1,213 Views

Martin,

It may be a good idea to check what kind of access violation it is. If it's a bad address, then we're probably stuck until we can get a test case from you, but if it's a stack overflow or something along those lines, it might be easier to workaround and reproduce

0 Kudos
Manuel_P_
Beginner
1,213 Views

Hi Brandon,

the stack seems to be corrupt. The line "get_g(gx2, v8, input_02);" is compiled into these instructions.

000007FEE36DA28A  movsxd      rbp,dword ptr [rsp+20h]  
000007FEE36DA28F  movsxd      r10,dword ptr [rsp+30h]  
000007FEE36DA294  movsxd      rdi,dword ptr [rsp+24h]  
000007FEE36DA299  movsxd      r11,dword ptr [rsp+34h]  
000007FEE36DA29E  movsxd      r8,dword ptr [rsp+28h]  
000007FEE36DA2A3  movsxd      r12,dword ptr [rsp+38h]  
000007FEE36DA2A8  vmovss      xmm2,dword ptr [rax+rbp*4]  
000007FEE36DA2AD  vmovss      xmm15,dword ptr [rax+rbp*4+4]  
000007FEE36DA2B3  vmovss      xmm9,dword ptr [rax+r10*4]  
000007FEE36DA2B9  vmovss      xmm4,dword ptr [rax+r10*4+4]  
000007FEE36DA2C0  movsxd      r9,dword ptr [rsp+2Ch]  
000007FEE36DA2C5  movsxd      r14,dword ptr [rsp+3Ch]  
000007FEE36DA2CA  vinsertps   xmm7,xmm2,dword ptr [rax+rdi*4],10h  
000007FEE36DA2D1  vinsertps   xmm5,xmm15,dword ptr [rax+rdi*4+4],10h  
000007FEE36DA2D9  vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h  
000007FEE36DA2E0  vinsertps   xmm2,xmm4,dword ptr [rax+r11*4+4],10h  
000007FEE36DA2E8  vinsertps   xmm6,xmm7,dword ptr [rax+r8*4],20h  
000007FEE36DA2EF  vinsertps   xmm3,xmm5,dword ptr [rax+r8*4+4],20h  
000007FEE36DA2F7  vinsertps   xmm1,xmm8,dword ptr [rax+r12*4],20h  
000007FEE36DA2FE  vinsertps   xmm7,xmm2,dword ptr [rax+r12*4+4],20h  
000007FEE36DA306  vinsertps   xmm10,xmm6,dword ptr [rax+r9*4],30h  
000007FEE36DA30D  vinsertps   xmm6,xmm3,dword ptr [rax+r9*4+4],30h  
000007FEE36DA315  vinsertps   xmm14,xmm1,dword ptr [rax+r14*4],30h  
000007FEE36DA31C  vinsertps   xmm9,xmm7,dword ptr [rax+r14*4+4],30h 

 

r11 has this value: 0xffffffff80000000, therefore the instruction at address 000007FEE36DA2D9 (vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h) causes an access violation.

As you can see, r11 is loaded from the stack with movsxd      r11,dword ptr [rsp+34h], and the stack at address [rsp+34h] is: 00 00 00 80.

 

Regards,

Martin

0 Kudos
Manuel_P_
Beginner
1,213 Views

Hi Brandon,

the stack seems to be corrupt.

The source code line "get_g(gx2, v8, input_02);" gets translated into these instructions:

000007FEE36DA299  movsxd      r11,dword ptr [rsp+34h]  
000007FEE36DA29E  movsxd      r8,dword ptr [rsp+28h]  
000007FEE36DA2A3  movsxd      r12,dword ptr [rsp+38h]  
000007FEE36DA2A8  vmovss      xmm2,dword ptr [rax+rbp*4]  
000007FEE36DA2AD  vmovss      xmm15,dword ptr [rax+rbp*4+4]  
000007FEE36DA2B3  vmovss      xmm9,dword ptr [rax+r10*4]  
000007FEE36DA2B9  vmovss      xmm4,dword ptr [rax+r10*4+4]  
000007FEE36DA2C0  movsxd      r9,dword ptr [rsp+2Ch]  
000007FEE36DA2C5  movsxd      r14,dword ptr [rsp+3Ch]  
000007FEE36DA2CA  vinsertps   xmm7,xmm2,dword ptr [rax+rdi*4],10h  
000007FEE36DA2D1  vinsertps   xmm5,xmm15,dword ptr [rax+rdi*4+4],10h  
000007FEE36DA2D9  vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h  
000007FEE36DA2E0  vinsertps   xmm2,xmm4,dword ptr [rax+r11*4+4],10h 

 

The instruction at address 000007FEE36DA2D9 (vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h) causes the access violation, because r11 has the value 0xffffffff80000000. As you can see, r11 is loaded from the stack (movsxd      r11,dword ptr [rsp+34h]), and the stack at address [rsp+34h] is: 00 00 00 80.

 

Regards,

Martin

 

0 Kudos
Manuel_P_
Beginner
1,213 Views

Hi again Brandon,

after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.

 

So we do need to get that code compiled with /fp:precise. Did you raise a bug?

 

Regards,

Martin

0 Kudos
Nick_M_3
New Contributor I
1,213 Views

martin.toeltsch@symena.com wrote:

after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.

Well, perhaps.  However, I suggest that you would be better off attending to its robustness.  In one place, you have:

    float hxx[VecSize];
    hxx[:] = fabsf(accel[:]) * float(1.38e-23);
    ...
    sv[:] = ((1.0f - hxx[:]) * (val0 - vgg[:])) + (hxx[:] * (vvv9 - gx2[:]));

That raises alarm bells in my head!  At best, you have a very poorly scaled problem, and poor scaling is a classic cause of numerical problems.  Despite modern belief, the use of floating-point (as distinct from fixed-point) is NOT a solution to all scaling problems, though you would have to look at some very serious (and probably 1960s or 1970s) textbooks to see discussions of the issues.

 

0 Kudos
Reply