Community
cancel
Showing results for 
Search instead for 
Did you mean: 
DHend8
Beginner
8,236 Views

Intel X710 woes

We have 8 brand new HPE DL380 Gen10 servers. Each of these servers has two HPE Ethernet 10Gb 2-port 562SFP+ Adapters. One is embedded and one on a PCI card. This card is based on the Intel X710 controller

https://www.hpe.com/us/en/product-catalog/servers/server-adapters/pip.hpe-ethernet-10gb-2-port-562sf... HPE Ethernet 10Gb 2-port 562SFP+ Adapter OID8245220 | HPE™

We hooked two DAC cables, one from each card to our Juniper switch. The two ports on the Juniper switch are standard access ports. After installed the HPE customized version of ESXi 6.5 U1 we went into the console and added the two nics. We put in an IP address, gateway, mask, and DNS servers and then rebooted the host. During reboot we had a continuous ping going to the IP address of the ESXi management interface. The ping returned part way through the boot process and after fully booted the ping stopped. If I remove one of the 10Gb nics from the management network, the ping returns. If I use two of the 1Gb interfaces on this server which are based on a Broadcom chipset, the management interface works fine. The server has had all of its firmware upgraded. We are running driver version 1.5.8 and firmware version 10.2.5. The firmware came from here, I think it corresponds to VMWare's version of firmware 6.00

https://support.hpe.com/hpsc/swd/public/detail?sp4ts.oid=1008830010&swItemId=MTX_87c83853cb5a4bc5949... tab1 Drivers & Software - HPE Support Center.

I have a ticket opened with HPE but wanted to find out if Intel might have a solution for this

0 Kudos
39 Replies
idata
Community Manager
944 Views

Hi HendersonD,

 

 

Thank you for posting in Wired Communities. I am sorry to hear what happened and would like to clarify if this issue only occurred on Juniper switch. Have you tried checking with Juniper switch support?

 

 

I understand you already contacted HP* support, as this is an OEM NIC, we don't have control over the customization and modification of firmware and driver or other manufacturing changes done by the OEM NIC vendor, the OEM driver is more suitable for the OEM NIC. The driver and firmware version are different from the version for Intel X710 NIC.

 

 

As mentioned if you removed the 10 Gb NIC from the system, the ping will continue to work, did HP confirm the network card is the compatible model for this server in customized ESXi console?

 

 

I will try to check if there is any information, please continue to work with HP for further investigation.

 

 

Regards,

 

Sharon T

 

DHend8
Beginner
944 Views

This appears to be a vlan tagging issue. External Switch Tagging should work but does not, Virtual Switch tagging does work. All the details are shown below. I am thinking this has to be the nic in these servers which is based on the Intel X710. Two reasons:

  1. If I use two of the 1Gb ports on this server for the management network using External Switch Tagging, it works just fine. The 1Gb ports are Broadcom nics
  2. I have a 5 year old IBM server that has two 10Gb nics plugged into the exact same switch. It is setup with External Switch Tagging on the management network and it works fine. The 10Gb nics in this server are Emulex

Other people in this forum have reported vlan tagging issues with the X710 under various flavors of Linux. Could it be the driver for this card under ESXi is the issue?

idata
Community Manager
944 Views

Hi HendersonD,

Thank you for the additional information. There is an issue of using ESX i40en driver version 1.5.8.which other user post regarding Malicious driver detection using Intel X710 series. I am not sure if you encountered that error in the log, if this is relates to malicious driver detection, we are still waiting for the next driver release. With the additional information provided, we will do some checking. Thank you.

Regards,

 

Sharon T
DHend8
Beginner
944 Views

Which log would show the malicious driver detection error? vmkernel log or some other log?

idata
Community Manager
491 Views

Hi HendersonD,

 

 

It is in the Vmware logs.

 

 

Thanks,

 

Sharon T
DHend8
Beginner
944 Views

Sharon,

I am convinced that the driver for this nic card is the root cause of the issues we are seeing. Why do I say that?

  1. I have another server plugged into the same switch using External Switch Tagging and an active/active setup for the ESXi management network and it is working fine. This server uses 10Gb Emulex nics
  2. Even with the HPE servers using the Intel X710 nic, if I use two of the 1Gb ports on this server with External Switch Tagging and active/active for the ESXi management network, it works fine. The 1Gb nics on these HPE servers use a Broadcom nic
  3. I had a Juniper engineer look at our switch setup which is very simple and he said everything is configured correctly
  4. I had two different VMware engineers look at our setup and said it is not a problem with ESXi. Both of them said they have seen multiple problems with X710 nics
  5. I have spent nearly 20 hours of my time trying to troubleshoot this issue including opening tickets with Juniper, HPE, and VMware
  6. There are numerous reports of problems with the Intel X710 on this forum, VMware's forums, and several other sites. Several of these postings talk about vlan tagging issues

I am hoping that the new i40en driver fixes this issue. If you could please pass on this information (in particular the two diagrams in this thread that show External Switch Tagging not working and Virtual Switch Tagging working) I would appreciate it. It seems I cannot get anyone yet to accept ownership of this issue. Intel makes the controller that goes in these cards, VMware makes the driver the card uses, and HPE and Dell put this card in their server products. Surely everyone can get together to solve problems with this nic that stretch back two years.

Any ETA on when the newest i40en driver will be released?

idata
Community Manager
491 Views

Hi HendersonD,

 

 

Further checking, disabling LLDP HW engine does not mean we are dropping LLDP frames. It disables the hardware LLDP engine embedded in the silicon so the software can manage LLDP frames. X710 essentially will forward LLDP traffic to the network stack. Disabling hardware LLDP engine will likely resolve most of your issues.

 

 

We do not have the ETA for the driver, will update you once there is further information to share. Please feel free to update me the result. Thanks.

 

 

Regards,

 

Sharon T
DHend8
Beginner
491 Views

So the issue appears to be the LLDP offload functionality on this card. Will the new i40en driver fix:

  • The LLDP offload problem?
  • The Malicious Driver Detection issue?
idata
Community Manager
491 Views

Hi Henderson,

 

 

Thank you for the reply. As an additional information about LLDP:

 

 

Each Intel® Ethernet 700 Series Network Adapter has a built-in hardware LLDP engine, which is enabled by default. The LLDP Engine is responsible for receiving and consuming LLDP frames, and also replies to the LLDP frames that it receives. The LLDP engine does not forward LLDP frames to the network stack of the Operating System. Some applications may not function correctly in certain environments that require LLDP frames to be forwarded to the network stack. To avoid this situation, the user must disable the Intel® Ethernet 700 Series Network Adapter's hardware LLDP engine.

 

 

1) I will need to further check.

 

2) For the Malicious Driver Detection issue that we are aware of, this will be addressed in the next i40en driver release. The next driver is going through VMware Certification process and expected to release in a couple of weeks.

 

 

I understand Maverick85 did try the command but did not work. Have you had any chance to try the command? Please feel free to update me.

 

 

Thanks,

 

Sharon T

 

 

 

 

 

 

DHend8
Beginner
491 Views

Link Layer Discovery Protocol (LLDP) is a layer 2 neighbor discovery protocol that allows devices to advertise device information to their directly connected peers/neighbors. I want the nics in my HPE ESXi hosts to advertise their identity to the Juniper switches they are connected to. This makes it easy within the ESXi web interface to ensure that everything is connected correctly. It seems like odd behavior that the hardware engine in the X710 nic is set to receive and consume LLDP frames.

TShil
Beginner
491 Views

Sharon T,

The command did work but only with the below firmware/driver combo on our Dell x730xd system

Dell Firmware 18.3.6

i40en Driver 1.5.8-1OEM.650.0.0.4598673.vib (VMW-ESX-6.5.0-i40en-1.5.8-7759470.zip)

idata
Community Manager
491 Views

Hi Maverick85,

 

 

Thank you for the confirmation.

 

 

Regards,

 

Sharon T
DHend8
Beginner
491 Views

Here is an article that lays out nicely the LLDP problem with these nics

http://thenicholson.com/where-did-my-host-go/ Where did my host go.... - Virtual Ramblings

The Intel nic simply does not work correctly with LLDP unless you use some type of work around

We just put in one Broadcom nic today and it worked fine. May end up swapping out all the Intel nics for Broadcom nics

nbala3
Beginner
491 Views

Hi All,

I have re-installed the server with HPE Custom Image for ESXi 6.5U1 that comes with i40en version 1.5.6 driver.

When I deployed the VM's, it was not getting connectivity. when I update it with i40e version 2.0.7, I got connectivity back.

but the Intermittent performance issue persists. So the issue is same with both 6.0U3 and 6.5 U1 versions for Intelx710NIC

(HPE Ethernet 10Gb 2-port 562SFP+ Adapter). I am not saying that the adapter is faulty as I tested with multiple adapters of same model.

The issue is with adapter driver combination. I am not getting VM performance when using i40e and network does not even pass

traffic when i40en is used.This is the case for both 6.0 U3 and 6.5 U1 and U2 versions.

There is a bug fix released for this i40en adapter last week. https://kb.vmware.com/s/article/53080 VMware Knowledge Base

I have applied this too with no success. I think the only way out would be replacing the adapter

as the next update for the drivers will take long time for sure. I couldn't find malicious driver detection in the logs.

Also I have disabled TSO and LRO. VMware support also couldnt find the exact issue other than saying that

we need to check with vendor(HP) to remediate the driver issue. I am not using VLAN Tagging and port bindings in switch.

2018-05-09T08:57:23.295Z cpu82:65773)i40en: i40en_PfTxqWait:1979: Tx queue request timeout

2018-05-09T08:57:23.295Z cpu82:65773)WARNING: i40en: i40en_TxDisableEnableHw:2129: Tx ring 0 disable timeout

2018-05-09T08:57:23.295Z cpu82:65773)i40en: indrv_Stop:1950: stopping vmnic6

2018-05-09T08:57:42.807Z cpu28:68725 opID=db44c98e)World: 12235: VC opID esxcli-ab-a299 maps to vmkernel opID db44c98e

2018-05-09T08:57:42.807Z cpu28:68725 opID=db44c98e)Uplink: 14445: Setting speed/duplex to (0 AUTO) on vmnic6.

:~] esxcli network nic get -n vmnic7

Advertised Auto Negotiation: false

Advertised Link Modes: 10000BaseT/Full

Auto Negotiation: false

Cable Type: FIBRE

Current Message Level: 15

Driver Info:

Bus Info: 0000:4e:00.1

Driver: i40e

Firmware Version: 6.00 0x8000366c 1.1825.0

Version: 2.0.7

Link Detected: true

Link Status: Up

Name: vmnic7

PHYAddress: 0

Pause Autonegotiate: false

Pause RX: false

Pause TX: false

Supported Ports: FIBRE

Supports Auto Negotiation: false

Supports Pause: true

Supports Wakeon: false

Transceiver: external

Virtual Address: 00:50:56:5b:c8:f0

Wakeon: None

] esxcli network nic get -n vmnic7

Advertised Auto Negotiation: true

Advertised Link Modes: Auto, 10000BaseSR/Full

Auto Negotiation: true

Cable Type: FIBRE

Current Message Level: -1

Driver Info:

Bus Info: 0000:4e:00:1

Driver: i40en

Firmware Version: 10.2.5

Version: 1.5.6

Link Detected: true

Link Status: Up

Name: vmnic7

PHYAddress: 0

Pause Autonegotiate: false

Pause RX: true

Pause TX: true

Supported Ports: FIBRE

Supports Auto Negotiation: true

Supports Pause: true

Supports Wakeon: false

Transceiver:

Virtual Address: 00:50:56:5b:c8:f0

Wakeon: None

idata
Community Manager
491 Views

Hi HendersonD and all,

 

 

We have a blog post on the this link below that provide information about LLDP on Intel ethernet 700 series network adapter, hope you will find the information useful

 

https://communities.intel.com/community/tech/wired/blog/2018/05/22/using-intel-ethernet-700-series-n... https://communities.intel.com/community/tech/wired/blog/2018/05/22/using-intel-ethernet-700-series-n...

 

 

Please feel free to update me.

 

 

Thanks,

 

Sharon T
DHend8
Beginner
491 Views

I see a new i40en driver has been released for ESXi 6.7. Will there be a new driver released for ESXi 6.5 to fix the Malicious Driver Detection Event issue?

idata
Community Manager
491 Views

Hi HendersonD,

 

 

Please be informed drivers that address the MDD issue for ESXi 6.0 & ESXi 6.5 will be available in a future release. Thanks.

 

 

Regards,

 

Sharon T
ABank3
Beginner
491 Views

Sorry to resurrect an old post but I have been doing a lot of searching around these issues with these cards as we have a 3 node vSphere cluster with the Malicious Driver Detection Issue. VMWware are pretty much washing there hands of it. It was stated in a previous post that a driver update will be issued to resolve these issues back in May we are now 3 months on and still no sign of an update.

It seems the only real fix that people have with any success is to swap the NICs out for Broadcom NICs.

Is there any update to this issue and fix please?

idata
Community Manager
491 Views

Hi HendersonD,

 

 

We've sent you a private message containing the information regarding new driver for ESXi 6.5 to fix the Malicious Driver Detection issue.

 

 

 

Best Regards,

 

 

Vince T.

 

Intel Customer Support
TShil
Beginner
944 Views

HendersonD,

Thank you for your post. We were experiencing the same issues with the x710 and have spend weeks looking for a solution. We have working servers in the cluster but our new server with the x710 has management network failure after reboot and only multiple restarts on the management network would keep it up. We have confirmed with Juniper that the switch configuration looks good and VMware sees nothing of note. We are currently working with Dell when we came across your post.

We have tried different firmware/driver versions as well as switch configurations and nothing has fixed the issue. When we test by installing i350 copper cards connected to the same switch everything works fine.

So far your solution is working for us as we wait for a permanent. Thanks again.

Reply