ICC 2016 upd 1 generates incorrect vectorization code -> CRASH... again...

meldaproduction · ‎02-04-2016

Ok, so this is the code:

		
	/// Integral processing.
	MNOALIAS MFORCEINLINE ValueType GetFrom01Cubic_Integral(MUINT32 posf) const
	{ 
		// Get the new wavetable index and interpolation coefficient.
		const int index = posf >> Integral_FloatBits;
		static const float Integral_Coef = float(1.0 / (1 << Integral_FloatBits));
		const float x = float(int(posf & (MFloatINT32::Max >> PrecomputeCountLog)) * Integral_Coef);

		// Get output sample.
		const float y0 = PrecomputedPtr[index-1];
		const float y1 = PrecomputedPtr[index+0];
		const float y2 = PrecomputedPtr[index+1];
		const float y3 = PrecomputedPtr[index+2];
		return MInterpolation::GetCubic(y0, y1, y2, y3, x);

		// For some reason this doesn't get vectorized...
		//return MInterpolation::GetCubic(PrecomputedPtr[index-1], PrecomputedPtr[index+0], PrecomputedPtr[index+1], PrecomputedPtr[index+2], x);
	};
	
	/// Retrieves multiple samples of the generator at specified positions from interval 0..1 (even excluding the range).
	/// Returns new position after the interval.
	template<class type>
		MNOALIAS double GetFrom01Cubic(MNOALIAS type* dst, double pos, double posinc, int cnt) const
	{
		MFloatINT32 posf = pos, posincf = posinc;

		MVECTORALWAYS
		for (int i=0; i<cnt; i++)
		{
			dst = GetFrom01Cubic_Integral(posf);
			posf += posincf;
		};

		// Get new position.
		pos = posf.GetFloat();
		return pos;
	};

And this is where it crashed:

00007FFEF307E151  vpaddd      xmm3,xmm3,xmm1  
00007FFEF307E155  vmovd       r15,xmm13  
00007FFEF307E15A  mov         edi,r11d  
00007FFEF307E15D  shr         r11,20h  
00007FFEF307E161  mov         r12d,r15d  
00007FFEF307E164  shr         r15,20h  
00007FFEF307E168  vmovd       xmm15,dword ptr [rbx+rdi*4+0Ch]  <<<<<<<<<<<<<<<<
00007FFEF307E16E  vmovd       xmm11,dword ptr [rbx+r11*4+0Ch]  
00007FFEF307E175  vmovd       xmm6,dword ptr [rbx+r12*4+0Ch]  
00007FFEF307E17C  vmovd       xmm14,dword ptr [rbx+r15*4+0Ch]  
00007FFEF307E183  vmovd       xmm10,dword ptr [rbx+r15*4+8]  
00007FFEF307E18A  vinsertf128 ymm12,ymm7,xmm8,1  
00007FFEF307E190  vcvtdq2ps   ymm5,ymm12  
00007FFEF307E195  vpunpcklqdq xmm8,xmm15,xmm11

For simple reason - rdi = 0xFFFFFFFF, which indeed is -1 in 32-bit, but not in 64-bit environment... It worked fine with 2015 upd 3 (or which was it).

Really ICC is like a minefield... If MSVC would have a good dispatchable vectorizer, there would be no need to ICC at all...

KitturGanesh · ‎02-05-2016

Hi @meldaproduction,
Pardon the delay, I was just forwarded to look into this issue. I'll investigate this and get back to you soon. (I've interacted with you many times before on this forum. Thanks for filing the issue and appreciate your patience, as always).
Regards,
Kittur

meldaproduction · ‎02-06-2016

Hi Kithur,

very well, please let me know!

KitturGanesh · ‎02-08-2016

Hi @medaprodution: Pardon me, can you please attach the preprocessed file (compiler with /QP option) that should generate the .I file. Also, can you please also let me know the full command line options you used for the compile as well. Addionally, the OS and VS version you compiled with as well. Appreciate your help and for your patience, as always.
_Kittur

meldaproduction · ‎02-08-2016

Hi Kittur, sorry bu as always, I cannot send you the source codes. The command line wouldn't solve much. But it is using profile guided optimizations and -ip.

Anyways it IS working correctly with 2015 update 3, so it must have been introduced recently. It is also only caused in the vectorizer, normal code works fine. And the cause is, as I explained, because it didn't initialize correctly the rdi register to 0xfffffffffffffff (gazzilion f meaning -1), but instead just 0xffffffff, which is -1 in 32-bit, but not in 64-bit.

KitturGanesh · ‎02-08-2016

Thanks, looks like the issue may have been fixed then. Appreciate your prompt response and letting me know.
_Kittur

meldaproduction · ‎02-08-2016

Ok, when will an update fixing this be available?

KitturGanesh · ‎02-09-2016

The latest release is 16.0 Update 2 which you should try out and I need to check if there's any upcoming release at all. I based on your response that it worked in 15.0 update 3. The problem is I don't have a reproducer so I can't check if an issue is filed that's related to your issue so I can't file any issue with the developers :-( Appreciate your understanding....;.

_Kittur

meldaproduction · ‎02-09-2016

Sorry but I don't see update 2 in my downloads, there's just update 1

KitturGanesh · ‎02-09-2016

Oops, I meant update 2 which will be released next week. Yes, you're correct update 1 is what you will see presently which is the current release. Update 2 is schedule to be out next week, thanks!

_Kittur

meldaproduction · ‎02-09-2016

Ok, please remind me here when it is released.

KitturGanesh · ‎02-09-2016

Absolutely, thanks for your patience again.

_kittur

vojtech_m_ · ‎02-18-2016

Ok, so the Update 2 does NOT help! The vectorizer is still buggy!... I'm actually surprised you don't have more reports. I mean using -1 as index isn't so unordinary and the results are well... horryfying... This is a really massive bug...

KitturGanesh · ‎02-18-2016

@vojtech: Sorry to hear update 2 still shows the issue. The problem is I couldn't reproduce the issue on our end and unfortunately you're not able to attach a reproducer :-( I've again touched base with our vectorizer group to see if they can investigate based on the small code snippet you have in this thread. I'll keep you updated accordingly, appreciate much.

_Kittur

vojtech_m_ · ‎02-18-2016

Well, I'm afraid since V2016 the compiler is unusable here, just for this one case. Sure I could try tampering with it, offseting the whole array by 1 or something, so that the faulty "-1" won't be a problem, but I fear ICL is just way too dangerous... Who knows how many more bugs are there... For the record I tried compiling it without the profile generated stuff, same results, same errorneous vectorized code. I'm actually surprised you couldn't simulate it, I might be wrong of course, but the reason of implementing -1 as +0xFFFFFFFF is just strikingly obvious.

vojtech_m_ · ‎02-23-2016

Ok, so bad news... Check this out:

00007FFE50BD7CA9  lea         esi,[rsi+rcx*8]  
00007FFE50BD7CAC  vmovd       r11,xmm10  
00007FFE50BD7CB1  vpaddd      xmm3,xmm3,xmm1  
00007FFE50BD7CB5  vmovd       r15,xmm13  
00007FFE50BD7CBA  mov         edi,r11d  
00007FFE50BD7CBD  shr         r11,20h  
00007FFE50BD7CC1  mov         r12d,r15d  
00007FFE50BD7CC4  shr         r15,20h  
00007FFE50BD7CC8  vmovd       xmm15,dword ptr [rbx+rdi*4+0Ch]  <<<<<< CRASH
00007FFE50BD7CCE  vmovd       xmm11,dword ptr [rbx+r11*4+0Ch]  
00007FFE50BD7CD5  vmovd       xmm6,dword ptr [rbx+r12*4+0Ch]  
00007FFE50BD7CDC  vmovd       xmm14,dword ptr [rbx+r15*4+0Ch]  
00007FFE50BD7CE3  vmovd       xmm10,dword ptr [rbx+r15*4+8]

In this case rdi is 0x000000000014bff0, r11-15 are 12, 26... so logical index numbers. The thing is, rdi isn't even set anywhere in the function! It looks like it's a faulty code, not rdi trying to be -1 after all, that one before was probably just a coincidence.

And I repeat: it DOES work with compiler v2015 and with v2016 it works in 32-bit, just 64-bit is not correct. I'm trying v2016 update 2.

KitturGanesh · ‎02-23-2016

Hi @vojtech,
Thanks for the detailed input which I've passed on to the vec team again. Appreciate your patience and I'll update you as soon as they are done investigating accordingly.
_Kittur

vojtech_m_ · ‎02-23-2016

Hi Kittur,

ok, so I have a testcase for you. See attached. I basically stripped the whole thing to a tiny little source code, that I could send you, which is attached, along with the building script (which you'd need to fix, but it gives you the idea about options I used, but it actually does that with pretty much any options...). It will crash, differently, so it is not just wrongly set rdi as it seemed before.

Then if you change line 142 from

const float y0 = PrecomputedPtr[index - 1];

to

const float y0 = PrecomputedPtr[index + 0];

It will work!! So right now the compiler is pretty much unusable. I could probably hack this somehow, but the reason why it creates wrong code is unknown to me, so I cannot be sure the bug doesn't occur anywhere else...

I'm using VS express 2015, ICC 2016 update 1 or 2, both exhibit the problem, only in 64-bit, 32-bit works fine, ICC 2015 worked fine as well.

Please let me know. Also my updates will expire in May afaik, so I hope you'll make it work until then. I won't pay extra just to make the software fixed of course.

KitturGanesh · ‎02-23-2016

Understood, thanks much. I'll download the test case, try to reproduce and will pass on to the vectorizer team. Sure, I'll keep you updated accordingly. Again, thanks much for your help & patience.

_Kittur

Hideki_I_Intel · ‎02-23-2016

Thank you very much for a test case. Works if code is generated for SSE4.2, right? Greatly helped in narrowing down.
Sorry for the frustration it has caused.

Hideki --- vectorizer development

vojtech_m_ · ‎02-24-2016

Hi Hideki, it's SSE2 plus AVX dispatching. The problems are with the AVX code (though I didn't check on any SSE2 no AVX machine).