- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, so this is the code:
/// Integral processing. MNOALIAS MFORCEINLINE ValueType GetFrom01Cubic_Integral(MUINT32 posf) const { // Get the new wavetable index and interpolation coefficient. const int index = posf >> Integral_FloatBits; static const float Integral_Coef = float(1.0 / (1 << Integral_FloatBits)); const float x = float(int(posf & (MFloatINT32::Max >> PrecomputeCountLog)) * Integral_Coef); // Get output sample. const float y0 = PrecomputedPtr[index-1]; const float y1 = PrecomputedPtr[index+0]; const float y2 = PrecomputedPtr[index+1]; const float y3 = PrecomputedPtr[index+2]; return MInterpolation::GetCubic(y0, y1, y2, y3, x); // For some reason this doesn't get vectorized... //return MInterpolation::GetCubic(PrecomputedPtr[index-1], PrecomputedPtr[index+0], PrecomputedPtr[index+1], PrecomputedPtr[index+2], x); }; /// Retrieves multiple samples of the generator at specified positions from interval 0..1 (even excluding the range). /// Returns new position after the interval. template<class type> MNOALIAS double GetFrom01Cubic(MNOALIAS type* dst, double pos, double posinc, int cnt) const { MFloatINT32 posf = pos, posincf = posinc; MVECTORALWAYS for (int i=0; i<cnt; i++) { dst = GetFrom01Cubic_Integral(posf); posf += posincf; }; // Get new position. pos = posf.GetFloat(); return pos; };
And this is where it crashed:
00007FFEF307E151 vpaddd xmm3,xmm3,xmm1 00007FFEF307E155 vmovd r15,xmm13 00007FFEF307E15A mov edi,r11d 00007FFEF307E15D shr r11,20h 00007FFEF307E161 mov r12d,r15d 00007FFEF307E164 shr r15,20h 00007FFEF307E168 vmovd xmm15,dword ptr [rbx+rdi*4+0Ch] <<<<<<<<<<<<<<<< 00007FFEF307E16E vmovd xmm11,dword ptr [rbx+r11*4+0Ch] 00007FFEF307E175 vmovd xmm6,dword ptr [rbx+r12*4+0Ch] 00007FFEF307E17C vmovd xmm14,dword ptr [rbx+r15*4+0Ch] 00007FFEF307E183 vmovd xmm10,dword ptr [rbx+r15*4+8] 00007FFEF307E18A vinsertf128 ymm12,ymm7,xmm8,1 00007FFEF307E190 vcvtdq2ps ymm5,ymm12 00007FFEF307E195 vpunpcklqdq xmm8,xmm15,xmm11
For simple reason - rdi = 0xFFFFFFFF, which indeed is -1 in 32-bit, but not in 64-bit environment... It worked fine with 2015 upd 3 (or which was it).
Really ICC is like a minefield... If MSVC would have a good dispatchable vectorizer, there would be no need to ICC at all...
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @meldaproduction,
Pardon the delay, I was just forwarded to look into this issue. I'll investigate this and get back to you soon. (I've interacted with you many times before on this forum. Thanks for filing the issue and appreciate your patience, as always).
Regards,
Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kithur,
very well, please let me know!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @medaprodution: Pardon me, can you please attach the preprocessed file (compiler with /QP option) that should generate the .I file. Also, can you please also let me know the full command line options you used for the compile as well. Addionally, the OS and VS version you compiled with as well. Appreciate your help and for your patience, as always.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kittur, sorry bu as always, I cannot send you the source codes. The command line wouldn't solve much. But it is using profile guided optimizations and -ip.
Anyways it IS working correctly with 2015 update 3, so it must have been introduced recently. It is also only caused in the vectorizer, normal code works fine. And the cause is, as I explained, because it didn't initialize correctly the rdi register to 0xfffffffffffffff (gazzilion f meaning -1), but instead just 0xffffffff, which is -1 in 32-bit, but not in 64-bit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, looks like the issue may have been fixed then. Appreciate your prompt response and letting me know.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, when will an update fixing this be available?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The latest release is 16.0 Update 2 which you should try out and I need to check if there's any upcoming release at all. I based on your response that it worked in 15.0 update 3. The problem is I don't have a reproducer so I can't check if an issue is filed that's related to your issue so I can't file any issue with the developers :-( Appreciate your understanding....;.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry but I don't see update 2 in my downloads, there's just update 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oops, I meant update 2 which will be released next week. Yes, you're correct update 1 is what you will see presently which is the current release. Update 2 is schedule to be out next week, thanks!
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, please remind me here when it is released.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Absolutely, thanks for your patience again.
_kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, so the Update 2 does NOT help! The vectorizer is still buggy!... I'm actually surprised you don't have more reports. I mean using -1 as index isn't so unordinary and the results are well... horryfying... This is a really massive bug...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@vojtech: Sorry to hear update 2 still shows the issue. The problem is I couldn't reproduce the issue on our end and unfortunately you're not able to attach a reproducer :-( I've again touched base with our vectorizer group to see if they can investigate based on the small code snippet you have in this thread. I'll keep you updated accordingly, appreciate much.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, I'm afraid since V2016 the compiler is unusable here, just for this one case. Sure I could try tampering with it, offseting the whole array by 1 or something, so that the faulty "-1" won't be a problem, but I fear ICL is just way too dangerous... Who knows how many more bugs are there... For the record I tried compiling it without the profile generated stuff, same results, same errorneous vectorized code. I'm actually surprised you couldn't simulate it, I might be wrong of course, but the reason of implementing -1 as +0xFFFFFFFF is just strikingly obvious.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, so bad news... Check this out:
00007FFE50BD7CA9 lea esi,[rsi+rcx*8] 00007FFE50BD7CAC vmovd r11,xmm10 00007FFE50BD7CB1 vpaddd xmm3,xmm3,xmm1 00007FFE50BD7CB5 vmovd r15,xmm13 00007FFE50BD7CBA mov edi,r11d 00007FFE50BD7CBD shr r11,20h 00007FFE50BD7CC1 mov r12d,r15d 00007FFE50BD7CC4 shr r15,20h 00007FFE50BD7CC8 vmovd xmm15,dword ptr [rbx+rdi*4+0Ch] <<<<<< CRASH 00007FFE50BD7CCE vmovd xmm11,dword ptr [rbx+r11*4+0Ch] 00007FFE50BD7CD5 vmovd xmm6,dword ptr [rbx+r12*4+0Ch] 00007FFE50BD7CDC vmovd xmm14,dword ptr [rbx+r15*4+0Ch] 00007FFE50BD7CE3 vmovd xmm10,dword ptr [rbx+r15*4+8]
In this case rdi is 0x000000000014bff0, r11-15 are 12, 26... so logical index numbers. The thing is, rdi isn't even set anywhere in the function! It looks like it's a faulty code, not rdi trying to be -1 after all, that one before was probably just a coincidence.
And I repeat: it DOES work with compiler v2015 and with v2016 it works in 32-bit, just 64-bit is not correct. I'm trying v2016 update 2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @vojtech,
Thanks for the detailed input which I've passed on to the vec team again. Appreciate your patience and I'll update you as soon as they are done investigating accordingly.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kittur,
ok, so I have a testcase for you. See attached. I basically stripped the whole thing to a tiny little source code, that I could send you, which is attached, along with the building script (which you'd need to fix, but it gives you the idea about options I used, but it actually does that with pretty much any options...). It will crash, differently, so it is not just wrongly set rdi as it seemed before.
Then if you change line 142 from
const float y0 = PrecomputedPtr[index - 1];
to
const float y0 = PrecomputedPtr[index + 0];
It will work!! So right now the compiler is pretty much unusable. I could probably hack this somehow, but the reason why it creates wrong code is unknown to me, so I cannot be sure the bug doesn't occur anywhere else...
I'm using VS express 2015, ICC 2016 update 1 or 2, both exhibit the problem, only in 64-bit, 32-bit works fine, ICC 2015 worked fine as well.
Please let me know. Also my updates will expire in May afaik, so I hope you'll make it work until then. I won't pay extra just to make the software fixed of course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Understood, thanks much. I'll download the test case, try to reproduce and will pass on to the vectorizer team. Sure, I'll keep you updated accordingly. Again, thanks much for your help & patience.
_Kittur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much for a test case. Works if code is generated for SSE4.2, right? Greatly helped in narrowing down.
Sorry for the frustration it has caused.
Hideki --- vectorizer development
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Hideki, it's SSE2 plus AVX dispatching. The problems are with the AVX code (though I didn't check on any SSE2 no AVX machine).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page