Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
2,086 Views

i350-T4 Windows Server 2012 R2 VMQ blue screens during live migration

i350-T4 NIC

Windows Server 2012 R2

All 4 i350 ports configured as a Windows LBFO Team (switch independent / dynamic load balancing)

Converged networking (HyperV vSwitch bound to the LBFO team, with vNICs configured on the vSwitch for Host OS operations (Management, Cluster/CSV. and Live Migration)

VLAN tagging in use on VM's and vNICs except the vNIC used for management which is 'native'

VMQ enabled on all i350 ports

SR-IOV disabled on all i350 ports

Server 2012 R2 HyperV cluster

Fully patched with update rollups and hotfixes currently available

Drivers 19.3 (latest from intel website)

In the above configuration the destination server blue screens during live migration. I can sometimes get 1 live migration to work, but a second attempt to live migrate a different VM to the same destination host will cause the host to blue screen.

I can reproduce this issue very easily on any host in the cluster. They all have the same behaviour

If i disable VMQ then the issue stops

Also we dont see this issue with thie same hardware and same configuration using Server 2012 (non R2) though i note that the NIC driver is diferent on this Server 2012 (e1r63x64.sys on 2012 as opposed to e1r64x64.sys on 2012 R2)

crashdup analysis always shows the faulting driver as e1r64x64.sys

BugCheck 1E, {ffffffffc0000005, fffff802be6a2550, ffffd000575b3b58, ffffd000575b3360}

*** ERROR: Module load completed but symbols could not be loaded for e1r64x64.sys

 

Probably caused by : e1r64x64.sys ( e1r64x64+280e7 )

Followup: MachineOwner

 

---------

18: kd> !analyze -v

 

*******************************************************************************

 

* *

 

* Bugcheck Analysis *

 

* *

 

*******************************************************************************

KMODE_EXCEPTION_NOT_HANDLED (1e)

 

This is a very common bugcheck. Usually the exception address pinpoints

 

the driver/function that caused the problem. Always note this address

 

as well as the link date of the driver/image that contains this address.

 

Arguments:

 

Arg1: ffffffffc0000005, The exception code that was not handled

 

Arg2: fffff802be6a2550, The address that the exception occurred at

 

Arg3: ffffd000575b3b58, Parameter 0 of the exception

 

Arg4: ffffd000575b3360, Parameter 1 of the exception

Debugging Details:

 

------------------

 

WRITE_ADDRESS: unable to get nt!MmNonPagedPoolStart

 

unable to get nt!MmSizeOfNonPagedPoolInBytes

 

ffffd000575b3360

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP:

 

nt!ExQueryDepthSList+0

 

fffff802`be6a2550 8b01 mov eax,dword ptr [rcx]

EXCEPTION_PARAMETER1: ffffd000575b3b58

EXCEPTION_PARAMETER2: ffffd000575b3360

BUGCHECK_STR: 0x1E_c0000005

DEFAULT_BUCKET_ID: WIN8_DRIVER_FAULT

PROCESS_NAME: System

CURRENT_IRQL: 0

ANALYSIS_VERSION: 6.3.9600.17237 (debuggers(dbg).140716-0327) amd64fre

EXCEPTION_RECORD: 0000000000000001 -- (.exr 0x1)

 

Cannot read Exception record @ 0000000000000001

TRAP_FRAME: ffffe800b6200000 -- (.trap 0xffffe800b6200000)

 

Unable to read trap frame at ffffe800`b6200000

LAST_CONTROL_TRANSFER: from fffff802be7efefb to fffff802be768ca0

STACK_TEXT:

 

ffffd000`575b2b38 fffff802`be7efefb : 00000000`0000001e ffffffff`c0000005 fffff802`be6a2550 ffffd000`575b3b58 : nt!KeBugCheckEx

 

ffffd000`575b2b40 fffff802`be779846 : 00000000`00000000 fffff800`35d0c991 ffffe800`b1172d02 ffffd000`575b2e29 : nt!KiFatalFilter+0x1f

 

ffffd000`575b2b80 fffff802`be757d56 : 00000000`00000000 fffff802`be6e19a6 ffffe000`516d3f90 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x696

 

ffffd000`575b2bc0 fffff802`be7701ed : 00000000`00000000 ffffd000`575b2d60 ffffd000`575b3b58 ffffd000`575b2d60 : nt!_C_specific_handler+0x86

 

ffffd000`575b2c30 fffff802`be6fd3a5 : 00000000`00000001 fffff802`be615000 ffffd000`575b3b00 fffff800`00000000 : nt!RtlpExecuteHandlerForException+0xd

 

ffffd000`575b2c60 fffff802`be6fc25f : ffffd000`575b3b58 ffffd000`575b3860 ffffd000`575b3b58 ffffe800`b12ee480 : nt!RtlDispatchException+0x1a5

 

ffffd000`575b3330 fffff802`be7748c2 : 00000000`00000001 fffffa80`1b6de000 ffffe800`b6200000 00000000`00000000 : nt!KiDispatchException+0x61f

 

ffffd000`575b3a20 fffff802`be772dfe : 00000000`00000011 00000000`00000002 00000000`00000001 fffff802`be8a929a : nt!KiExceptionDispatch+0xc2

 

ffffd000`575b3c00 fffff802`be6a2550 : fffff800`35d04875 ffffe800`b0f3c870 ffffd000`575b3e00 ffffe000`517cd000 : nt!KiGeneralProtectionFault+0xfe

 

ffffd000`575b3d98 fffff800`35d04875 : ffffe800`b0f3c870 ffffd000`575b3e00 ffffe000`517cd000 00000000`00000000 : nt!ExQueryDepthSList

 

ffffd000`575b3da0 fffff800`372520e7 : ffffe000`517ce540 ffffe000`517cd000 ffffe800`b1496c60 00000000`00000000 : NDIS!NdisFreeNetBufferList+0xb5

 

ffffd000`575b3e20 fffff800`372528a9 : ffffe000`517ce540 ffffe000`517cd000 00000000`00000001 00000000`00000000 : e1r64x64+0x280e7

 

ffffd000`575b3e50 fffff800`37252c00 : ffffe000`517ce540 00000000`00000001 00000000`00000000 ffffe000`517cd000 : e1r64x64+0x288a9

 

ffffd000`575b3e90 fffff800`37264a9d : ffffe000`517cd000 ffffe000`00000001 ffffe000`00000001 ffff0001`00000001 : e1r64x64+0x28c00

 

ffffd000`575b3ec0 fffff800`37261c7b : 00000000`00000000 ffffd000`575469a0 ffffe000`517cd000 00000000`00000000 : e1r64x64+0x3aa9d

 

ffffd000`575b3f00 fffff800`3725a909 : 00000000`00000002 00000000`00000000 ffffe000`517cd000 ffffd000`575469a0 : e1r64x64+0x37c7b

 

ffffd000`575b3f50 fffff800`3725b02b : ffffe800`b528cde0 fffff800`35d04671 ffffd000`575b40f0 ffffe000`51105ad0 : e1r64x64+0x30909

 

ffffd000`575b3fc0 fffff800`35d8f0fa : ffffe800`b5b87868 ffffe800`b5b87858 ffffe800`b5b87854 ffffe800`b0d501a0 : e1r64x64+0x3102b

 

ffffd000`575b4030 fffff800`35d033a3 : ffffe800`b0d501a0 ffffd000`575b40e9 ffffe800`b5b87820 00000000`00000011 : NDIS!ndisMInvokeOidRequest+0x4e

 

ffffd000`575b4070 fffff800`35d04324 : 00000000`00000000 ffffe800`b0d501a0 ffffe800`b5b87868 00000000`00000000 : NDIS!ndisMDoOidRequest+0x39b

 

ffffd000`575b4150 fffff800`35d0475e : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : NDIS!ndisQueueOidRequest+0x4c4

 

ffffd000`575b42f0 fffff800`3679719e : ffffe800`b147b8c0 00000000`00010224 ffffe800`b147b8c0 ffffe000`52bf4010 : NDIS!NdisFOidRequest+0xc2

 

ffffd000`575b43b0 fffff800`35d038de : ffffe800`b5b87820 ffffe000`51105ad0 00000000`00000000 ffffe000`52bea010 : wfplwfs!LwfLowerOidRequest+0x6e

 

ffffd000`575b43e0 fffff802`be6e19a6 : ffffd000`575b46d0 ffffd000`575af000 00000000`00000000 00000000`00000000 : NDIS!ndisFDoOidRequestInternal+0x2ee

 

ffffd000`57...
0 Kudos
7 Replies
Highlighted
77 Views

Thanx for posting to our blog site.

I appreciate your frustration. I asked our virtualization guru, he indicated that this issue has been fixed and will be available in the next release of the drivers that I belive is scheduled for Q4 of this year.

Thanx,

Patrick

0 Kudos
Highlighted
Beginner
77 Views

Hi

thanks for the information

could i ask:

is it possible for you to post any technical detail about the issue viz a vis the cause and some more detailed technical information about why the driver is faulting (Private message is fine - i dont indend to republish this information. Its just for my knowledge)

Is it possible for me to get hold of a release candidate of anything prior to official release. This issue is serious and its preventing us from going into production on this and another cluster we are about to build as part of our transition to server 2012 R2

i would be happy to sign NDA or do anything else you might need in that regard. Also happy to feed back my testing results for your use

Failing that, can you be more specific about release date. because we're in Q4 now.... so this could mean anytime between now and January. Thats a very wide time window indeed

Many thanks for your help.

0 Kudos
Highlighted
77 Views

Unfortunately since this is not an open source OS, the details of issues are not available to the public, as they are for our open source drivers.

The next release is going to happen in the extreme near future. They are doing the absolute final regression testing as I type this. Can't give an exact date, but if I were a betting man (which I'm not), I'd guess in the next week or two.

Hope that helps a bit.

- Patrick

0 Kudos
Highlighted
Beginner
77 Views

Can i double check which driver version this fix went into?

Thanks.

0 Kudos
Highlighted
Beginner
77 Views

Hi. This is serving as a bump - i am currently talking to Microsoft support and the kernel debugging team, but i would really like to know if this has made it into a driver version and if so which was the first one? Thanks.

0 Kudos
Highlighted
Valued Contributor I
77 Views

Dear mrrorschach,

Thanks for writing back. I will further check on this.

Sincerely,

Sandy

0 Kudos
Highlighted
Valued Contributor I
77 Views

Dear mrrorschach,

 

This issue is fixed with the network card's driver version 20.0.

You may download here - https://downloadcenter.intel.com/product/59062/Intel-Ethernet-Server-Adapter-I350-T2 Intel® Download Center.

 

Have a great day!

 

 

Sincerely,

 

Sandy

 

 

0 Kudos