Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
4811 Discussions

Intel X710 vs VMWare ESX: crash and reboot

EZheg
Beginner
2,673 Views

Hi,

I have a bunch (actually around 50 boards) of Intel X710-DA2 adapters, and similar number of servers running ESX 6.0. The problem is: as soon as the server starts exchanging the traffic using X710, it reboots. Why I'm writing here instead of the VMWare support: because when the adapter stays idle (we use onboard copper gigabit i350 adapters to mitigate the issue), the server is rock stable. When using the Mellanox ConnextX-3 EN boards (recently we aquired a couple for testing purposes) the server doesn't crash either. So I'm quite sure either it's the board or it's driver.

As about the Intel drivers for ESX: the problem is persistent across all available versions of the driver from 1.2.48 to 2.0.6 (we also tried the 1.4.28 in the middle). The NVM firmware version also doesn't seem to solve this - today we performed the tests on the 5.05 firmware, with 2.0.6 drivers - and the uptime was just a couple of minutes before server rebooted. I've also tried to disable TSO and LRO, but this didn't change the result.

I would appreciate greatly if someone will help me to mitigate this issue, because right now the only possible solution for us is switching to the Mellanox boards, which is quite expensive, as the server number is way big.

Thanks.

0 Kudos
13 Replies
idata
Employee
1,440 Views

Hi drookie,

 

 

Thank you for the post. Can you share below information?

 

1) What is the server system used? brand and model

 

2) What is the brand and model of fiber module or SFP+ module used on the X710-DA2?

 

3) Is the X710-DA2 embedded on the system or separate adapter that can be plugged in and removed?

 

 

Thanks,

 

wb

 

 

0 Kudos
EZheg
Beginner
1,440 Views

Sure.

1) All of the servers are Supermicro SYS-1028GR-TR systems, the motherboard is X10DRG-H.

2) All of the servers are plugged into the Juniper EX4600 with passive DAC cables. We use the same cables to plug Mellanox. The SFPs on the Intel end are flashed with the Intel firmware, but the origin and version of the firmware is unknown. So is the exact DAC manufacturer. I guess this is the famous Chinese "Noname" brand.

3) These are the discrete adapters, plugged into the PCI-E x8 slot.

Recently we found a couple of stable servers using the X710 boards with the uptime measured in dozens of days (usually they crash within minutes). One this is common for both (we found two) - they are running the NVM version 4.25. Right now the lowest version on the Intel site is 4.42, so - any chance we could get the 4.25 tarball ? It seems to have been vanished from the internet. I am aware that the downgrade version from 5.05 is the 4.42, but some of my boards report they have the "0.00" version of the NVM, so I'm pretty sure I can flash them with 4.25.

And one more thing. While flashing one of the boards to the 4.42 version I got my session disconnected, and now the nvm utility reports that "Access error" happens every time the board is examining. Does this mean the board is now broken ?

Thanks.

0 Kudos
EZheg
Beginner
1,440 Views

Update: at least some of the boards, reporting NVM version of 0.00 refure to flash:

[root@hv07:/tmp/ESXi_x64_442] ./nvmupdate64e

Intel(R) Ethernet NVM Update Tool

NVMUpdate version 1.28.19.4

Copyright (C) 2013 - 2016 Intel Corporation.

WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.

Inventory in progress. Please wait [|.........]

Num Description Ver. DevId S:B Status

=== ======================================== ===== ===== ====== ===============

01) Intel(R) I350 Gigabit Network Connection 1521 00:001 Update not

available

02) Intel(R) Ethernet Converged Network 0.00 1572 00:129 Update not

Adapter X710-2 available

Tool execution completed with the following status: Device not found

Press any key to exit.

0 Kudos
idata
Employee
1,440 Views

 

Hi Drookie,

 

 

Thank you for providing the detail information. For X710-DA2, please use the supported fiber module which we recommend on our website at http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html.?

 

 

Can you check if you can use the supported fiber module??

 

 

Thanks,

 

wb
0 Kudos
EZheg
Beginner
1,440 Views

Well, I have an update and more questions.

1) I was told we are using Juniper-branded DACs there. So, no "made in China by Noname" brand, definitely.

2) I'm sorry to misinform you, because I was misinformed myself, - now it's possible that _some_ of the boards aren't Intel-manufactured, especially the ones that are reporting th NVM 0.00 version. It's possible that these are manufactured by some other vendors. Some boards, however, are definitely Intel BLKs, I saw the Intel stickers myself on the photo our field engineer sent me from site. We are investigating further. However, I'm seeing one of the adapters with NVM version 4.53 that is refusing to flash 5.05 with "Access error", though it's operable.

3) Could you please clarify for me what does the phrase "Other brands of SFP optical modules do not work with the Intel® Ethernet CNA X710 Series." mean ? Do they just lack the physical connectivity or does it mean these modules/DACs could lead to the ESX crash ? In the same time these DACs work just fine with Mellanox. As about whether we can you SPF+ modules instead of the DAC cables - I guess we cannot, because DAC cables are way cheaper than a pair of SFP+ modules. So, this doesn't seem to be an option. As about the DACs, the document that is mentioned in the link you provided, states that only Leoni and Amphenol DACs aren't supported, I guess that leaves any Juniper DACs as supported, right ?

Concluding - we are still having the problem, and my initial statement about Intel X710 boards still stands, because the board the ESX 6.0 was crashing with is proven to be an Intel-manufactured X710 adapter.

Looking forward to hear from you about whether it could be solved.

0 Kudos
idata
Employee
1,440 Views

Hi Drookie,

 

Thank you for the additional information.

 

 

1) You can use direct attach cable that complies with the SFF-8431 v4.1 and SFF 8472v10.4 specification. Please refer to FAQ http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html

 

" Any SFP passive or active limiting direct attach copper cable that complies with the SFF-8431 v4.1 and SFF-8472 v10.4 specifications is compatible. We participate in testing with other members of the Ethernet Alliance to make sure there is interoperability between cables and host ports that meet these specifications."

 

 

2) The SFP optical cable is the optic type of connection which I think is not applicable in your case since you mentioned you need to use DAC. The supported SPT+ optical cable are the one stated on our website

 

http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html

 

 

3) With regards to the firmware upgrade, please reply to my private message.

 

 

Thanks,

 

wb

 

 

 

 

0 Kudos
idata
Employee
1,440 Views

Hi Drookie,

 

 

Please execute below command then provide the output.

 

1) lspci -vv | grep "ethernet controller"

 

 

2) ethtool -i

 

 

Thanks,

 

wb

 

0 Kudos
EZheg
Beginner
1,440 Views

Hello,

we decided to give a KVM a chance on this machine (very same), we replaced a fallen adapter with a new Intel X710 one (however, NVM version is 4.53, but as we saw earlire this doesn't affect the stability of ESX in any way ). Under Linux I get:

[root@kvm15 Linux_x64]# lspci -vv | grep -i "ethernet controller"

01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

[root@kvm15 Linux_x64]# ethtool ens5f0

Settings for ens5f0:

Supported ports: [ FIBRE ]

Supported link modes: 10000baseT/Full

Supported pause frame use: Symmetric

Supports auto-negotiation: No

Advertised link modes: Not reported

Advertised pause frame use: No

Advertised auto-negotiation: No

Speed: 10000Mb/s

Duplex: Full

Port: Direct Attach Copper

PHYAD: 0

Transceiver: external

Auto-negotiation: off

Supports Wake-on: d

Wake-on: d

Current message level: 0x0000000f (15)

drv probe link timer

Link detected: yes

[root@kvm15 Linux_x64]# ethtool ens5f1

Settings for ens5f1:

Supported ports: [ ]

Supported link modes: 1000baseT/Full

10000baseT/Full

Supported pause frame use: Symmetric

Supports auto-negotiation: Yes

Advertised link modes: 1000baseT/Full

10000baseT/Full

Advertised pause frame use: No

Advertised auto-negotiation: Yes

Speed: Unknown!

Duplex: Unknown! (255)

Port: Other

PHYAD: 0

Transceiver: external

Auto-negotiation: off

Supports Wake-on: d

Wake-on: d

Current message level: 0x0000000f (15)

drv probe link timer

Link detected: no

0 Kudos
idata
Employee
1,440 Views

Hi Drookie,

 

 

Thank you for the information provided.

 

 

Rgds,

 

wb

 

0 Kudos
idata
Employee
1,440 Views

Hi Drookie,

 

 

As mentioned there is no issue when using firmware 4.25, can you

 

help provide the marking (serial number) of working X710 vs the non-working X710?

 

The serial number is found on the white sticker on the physical network adapter

 

Format: 15 digits + 6 digits + 6-3

 

 

 

You can try use the SSU tool below to extract the system information:

 

 

https://downloadcenter.intel.com/download/26735/Intel-System-Support-Utility-for-the-Linux-Operating-System

 

 

regards,

 

wb

 

0 Kudos
idata
Employee
1,440 Views

Hi Drookie,

 

 

Please feel free to provide the information.

 

 

Rgds,

 

wb

 

0 Kudos
MNuhf
Beginner
1,440 Views

FYI- We had to replace every single Juniper branded DAC cable with Tripp Lite, either no link or strange results. Errors stated SFP was incompatible within the HPE UEFI error log. Tripp Lite worked just fine. I think we had to do this after updating the NVM/Driver, possibly 4.53.

0 Kudos
idata
Employee
1,440 Views

Hi Hypervision,

 

 

Thank you for sharing the information.

 

 

Rgds,

 

wb

 

0 Kudos
Reply