- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've been unable to reproduce the problem. I extracted the code into a file, and issued the following command:
bash-3.2$ icl test.cpp Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0 Build 20140303 Copyright (C) 1985-2014 Intel Corporation. All rights reserved. test.cpp Microsoft (R) Incremental Linker Version 9.00.30729.01 Copyright (C) Microsoft Corporation. All rights reserved. -out:test.exe test.obj LIBCMT.lib(crt0.obj) : error LNK2019: unresolved external symbol main referenced in function __tmainCRTStartup test.exe : fatal error LNK1120: 1 unresolved externals
So the file was successfully compiled and failed in the linker. The problem may already be fixed. I'm using a nightly compiler build from March 3 which may be newer than the compiler you've got. To be sure, I'll need a Visual Studio build log to reproduce the command line options.
- Barry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using the same compiler with Microsoft Visual Studio* 2013, and I'm not seeing a problem:
1>------ Build started: Project: test, Configuration: Release x64 ------
1> icl /Qvc12 "/Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64" /Zi /W3 /O2 /Oi /Qipo /Qftz- -D __INTEL_COMPILER=1400 -D WIN32 -D NDEBUG -D _CONSOLE -D _LIB -D _UNICODE -D UNICODE /EHsc /MD /GS /Gy /Zc:wchar_t /Zc:forScope /Fox64\Release\ /Fdx64\Release\vc120.pdb /TP test.cpp
1>
1> Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.3.202 Build 20140422
1> Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
1>
1> test.cpp
========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========
Can you turn off the Startup banner (/nologo) and see if your command line matches mine above? And are you using a different Visual Studio version?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
/fp:precise seems to be the critical option. I've submitted CQ256573 on this problem.
Thank you for reporting it.
- Barry
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right, /fp:precise and /fp:strict don't work. /fp:fast and /fp:fast=2 both work.
Thanks, Barry, for your investigations.
Cheers,
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I see it too. Looks like something that broke between the 12.1 and 13.0 compilers. Unfortunately, I don't see any easy workarounds beyond not using /fp:precise, which isn't ideal. When there's progress made on the investigation here, I'll update the thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brandon,
unfortunately it's worse than I thought. Although /fp:fast compiles successfully, the resulting code is buggy.
I was able to get the code to compile with /fp:precise by splitting the function into smaller pieces and using __declspec(noinline), and this binary passed our tests. The binary with /fp:fast caused an access violation, no matter if I split the function or not.
From that I would conclude that there is an issue with CilkPlus which leads to either an internal error or buggy code.
Unfortunately I cannot provide test data at this stage. It's hard to extract this from our test environment.
Regards,
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Martin,
It may be a good idea to check what kind of access violation it is. If it's a bad address, then we're probably stuck until we can get a test case from you, but if it's a stack overflow or something along those lines, it might be easier to workaround and reproduce
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brandon,
the stack seems to be corrupt. The line "get_g(gx2, v8, input_02);" is compiled into these instructions.
000007FEE36DA28A movsxd rbp,dword ptr [rsp+20h]
000007FEE36DA28F movsxd r10,dword ptr [rsp+30h]
000007FEE36DA294 movsxd rdi,dword ptr [rsp+24h]
000007FEE36DA299 movsxd r11,dword ptr [rsp+34h]
000007FEE36DA29E movsxd r8,dword ptr [rsp+28h]
000007FEE36DA2A3 movsxd r12,dword ptr [rsp+38h]
000007FEE36DA2A8 vmovss xmm2,dword ptr [rax+rbp*4]
000007FEE36DA2AD vmovss xmm15,dword ptr [rax+rbp*4+4]
000007FEE36DA2B3 vmovss xmm9,dword ptr [rax+r10*4]
000007FEE36DA2B9 vmovss xmm4,dword ptr [rax+r10*4+4]
000007FEE36DA2C0 movsxd r9,dword ptr [rsp+2Ch]
000007FEE36DA2C5 movsxd r14,dword ptr [rsp+3Ch]
000007FEE36DA2CA vinsertps xmm7,xmm2,dword ptr [rax+rdi*4],10h
000007FEE36DA2D1 vinsertps xmm5,xmm15,dword ptr [rax+rdi*4+4],10h
000007FEE36DA2D9 vinsertps xmm8,xmm9,dword ptr [rax+r11*4],10h
000007FEE36DA2E0 vinsertps xmm2,xmm4,dword ptr [rax+r11*4+4],10h
000007FEE36DA2E8 vinsertps xmm6,xmm7,dword ptr [rax+r8*4],20h
000007FEE36DA2EF vinsertps xmm3,xmm5,dword ptr [rax+r8*4+4],20h
000007FEE36DA2F7 vinsertps xmm1,xmm8,dword ptr [rax+r12*4],20h
000007FEE36DA2FE vinsertps xmm7,xmm2,dword ptr [rax+r12*4+4],20h
000007FEE36DA306 vinsertps xmm10,xmm6,dword ptr [rax+r9*4],30h
000007FEE36DA30D vinsertps xmm6,xmm3,dword ptr [rax+r9*4+4],30h
000007FEE36DA315 vinsertps xmm14,xmm1,dword ptr [rax+r14*4],30h
000007FEE36DA31C vinsertps xmm9,xmm7,dword ptr [rax+r14*4+4],30h
r11 has this value: 0xffffffff80000000, therefore the instruction at address 000007FEE36DA2D9 (vinsertps xmm8,xmm9,dword ptr [rax+r11*4],10h) causes an access violation.
As you can see, r11 is loaded from the stack with movsxd r11,dword ptr [rsp+34h], and the stack at address [rsp+34h] is: 00 00 00 80.
Regards,
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Brandon,
the stack seems to be corrupt.
The source code line "get_g(gx2, v8, input_02);" gets translated into these instructions:
000007FEE36DA299 movsxd r11,dword ptr [rsp+34h]
000007FEE36DA29E movsxd r8,dword ptr [rsp+28h]
000007FEE36DA2A3 movsxd r12,dword ptr [rsp+38h]
000007FEE36DA2A8 vmovss xmm2,dword ptr [rax+rbp*4]
000007FEE36DA2AD vmovss xmm15,dword ptr [rax+rbp*4+4]
000007FEE36DA2B3 vmovss xmm9,dword ptr [rax+r10*4]
000007FEE36DA2B9 vmovss xmm4,dword ptr [rax+r10*4+4]
000007FEE36DA2C0 movsxd r9,dword ptr [rsp+2Ch]
000007FEE36DA2C5 movsxd r14,dword ptr [rsp+3Ch]
000007FEE36DA2CA vinsertps xmm7,xmm2,dword ptr [rax+rdi*4],10h
000007FEE36DA2D1 vinsertps xmm5,xmm15,dword ptr [rax+rdi*4+4],10h
000007FEE36DA2D9 vinsertps xmm8,xmm9,dword ptr [rax+r11*4],10h
000007FEE36DA2E0 vinsertps xmm2,xmm4,dword ptr [rax+r11*4+4],10h
The instruction at address 000007FEE36DA2D9 (vinsertps xmm8,xmm9,dword ptr [rax+r11*4],10h) causes the access violation, because r11 has the value 0xffffffff80000000. As you can see, r11 is loaded from the stack (movsxd r11,dword ptr [rsp+34h]), and the stack at address [rsp+34h] is: 00 00 00 80.
Regards,
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi again Brandon,
after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.
So we do need to get that code compiled with /fp:precise. Did you raise a bug?
Regards,
Martin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
martin.toeltsch@symena.com wrote:
after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.
Well, perhaps. However, I suggest that you would be better off attending to its robustness. In one place, you have:
float hxx[VecSize];
hxx[:] = fabsf(accel[:]) * float(1.38e-23);
...
sv[:] = ((1.0f - hxx[:]) * (val0 - vgg[:])) + (hxx[:] * (vvv9 - gx2[:]));
That raises alarm bells in my head! At best, you have a very poorly scaled problem, and poor scaling is a classic cause of numerical problems. Despite modern belief, the use of floating-point (as distinct from fixed-point) is NOT a solution to all scaling problems, though you would have to look at some very serious (and probably 1960s or 1970s) textbooks to see discussions of the issues.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page