Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JJob
Beginner
837 Views

NetEffect 10Gb Server 2012 SMBDirect BSOD under load (0x0D1)

I've built a proof of concept environment for Server 2012, Scale Out File Servers, SMBDirect (using iWarp) and Hyper-V 2012 nodes.

Essentially I've got 4 Scale out file servers that host the Fiberchannel CSV volumes (with CSV Caching enabled, 20% RAM), then share out the storage via Continuously Available shares using SMB3 and RDMA/SMBDirect.

Each File Server (4) and Each Hyper-V Server (6) have single 10Gb RDMA adapters (Hyper-V servers also use dedicated X520-DA2 NICs for VM networking). File & Hyper-V server RDMA adapters are on the same L2 VLAN on a common Cisco Nexus 5K switch.

Everything was working pretty well, until I reached about 250 concurrent VMs. Periodically, a file server node would BSOD (0x0D1, IRQ_NOT_LESS_OR_EQUAL, smbdirect.sys). But the file cluster handled these failures gracefully.

As I increased load further, Hyper-V servers started failing with the same error.

At one point in load, the hyper-v failures would cause VMs to fail over to other nodes, cause great load, and BSOD them (in a cascade that even happened in the file servers).

I was able to stabilize the environment by disabling NetworkDirect in the Adapter properties (essentially turning off RDMA), and have taken the workload to over 535 running VMs.

While I understand that the crashdump isn't directly pointing to the N2E63x64.sys driver, these errors are typically driver related. I am using the "latest" drivers (v1.185.11.11, 10/19/2012) and the issue only appears at load. I am fully patch compliant and have installed all recommended 2012 & Hyper-V Cluster Hotfixes outlined in KB2784261.

File servers are HP380G6 Servers (2x L5630, 48GB RAM), and Hyper-V servers are HP585G7 Servers (4x AMD 6172, 256GB RAM) Latest BIOS, drivers from HP.

Has anyone else seen similar behavior? And most importantly... how do we fix it?

Thanks!

Tags (2)
0 Kudos
0 Replies