Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Michal235
Beginner
121 Views

I219-V Hardware timestamping stops working under load

Hello,
I'm a software developer, trying to make use of hardware timestamping feature of my I219-V NIC (integrated on MSI Z270 PC MATE). So far I have a working PTPv2 master/slave service, but long-term tests showed random failures over time. With further testing I narrowed the issue down to a potential bug in Windows 10 driver for the I219-V (e1d68x64.sys). The problem is that timestamping on RX side stops working every time a timestamped packet is discarded. This happens randomly from 15-30 minutes to a few hours under normal network load, but can be reduced to about 5-30 seconds by running a high bandwidth load test. After the timestamping stops working, a device (driver) restart is needed to get it working again (until the next timestamped packet is dropped). I have prepared a minimal test application that I'm attaching along with its source code (IntelTimestampingTest). The test requires a PTPv2 grandmaster which can be a hardware device or simply a ptpd application running on a Linux vm. I'm also attaching a ptpd master config file that I used in my tests and a screenshot of the moment of failure.

Steps to reproduce the problem on a clean Windows 10 install:
1) Install Windows 10 (20H2 19042.928), wait for it to download and install all drivers and updates, install missing drivers manually (Intel chipset driver in my case).
2) Install Microsoft Visual C++ Redistributable 2019 (v142 needed to run the test application, https://aka.ms/vs/16/release/vc_redist.x64.exe).
3) Download and install latest (26.2) driver for I219-V from https://downloadcenter.intel.com/product/82186/intel-ethernet-connection-i219-v.
4) Find the driver settings key in Windows registry editor (under HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc1-08002be10318}), and add a DWORD value called TimeSync with a value of 1 (as described in the readme here: https://github.com/Avnu/gptp under Windows Specific section).
5) Enable PTP Hardware Timestamp in PROSet ACU.
6) To make the problem appear faster, make following changes in PROSet ACU (not needed but reduces testing time):
- Disable Flow Control,
- Disable Interrupt Moderation,
- Disable Packet Priority & VLAN,
- Set Receive Buffers to minimum (80),
- Disable Receive Side Scaling,
- Disable all offload options (Ipv4 Checksum Offload, Large Send Offload (Ipv4, Ipv6), Protocol ARP and NS Offloads, TCP/UDP Checksum Offloads (Ipv4, Ipv6)).
7) Reboot.
Run the attached IntelTimestampingTest.exe in command line, provide your interface name or GUID as argument (typically: "IntelTimestampingTest.exe Ethernet"), allow access in Windows firewall if prompted.
9) Set up a PTPv2 grandmaster (ptpd running on a mac in my case) to send UDP unicast sync messages to the PC being tested, adjust the sync rate to 32 packets per second. The sync messages should show up in IntelTimestampingTest.exe. Each line contains the RX timestamp, current value of SYSTIM register, PTP packet sequence number and current number of discarded packets (from NDIS).
10) Run a network stress test, increase the load until the discard counter in IntelTimestampingTest.exe starts rapidly increasing. I found a combination of iperf tool running at 900Mbps Rx and 400Mbps Tx and a 4K Youtube live video or a FullHD twitch.tv livestream running in MS Edge at the same time to be very effective at triggering the problem.
11) As soon as one of the timestamped packets is dropped (signalled by "Packet drop detected!" in IntelTimestampingTest.exe), RX timestamping stops working until the NIC is restarted (TX timestamping still works correctly).

I found a description of a similar issue in i217 ethernet controller datasheet (available here: https://www.mouser.com/datasheet/2/612/i217-ethernet-controller-datasheet-257741.pdf) on page 294 (12.1.2.2 Time stamping mechanism). The datasheet states that "In some cases on the RX path a packet that was timestamped might be lost and not get to the host, to avoid lock condition the SW should keep a watch dog timer to clear locking of the time stamp register." I believe that the implementation of this timer in the Windows driver is not working correctly because it won't restart timestamping, no matter how long I wait.
Same problem was discussed during Linux ixgbe driver development here: https://patchwork.ozlabs.org/project/netdev/patch/1336632413-19135-7-git-send-email-jeffrey.t.kirshe....
A solution to this issue is also implemented in current Linux drivers for I219-V (e1000e available here: https://github.com/torvalds/linux/blob/7f75285ca572eaabc028cf78c6ab5473d0d160be/drivers/net/ethernet...).
Unfortunately Windows drivers are not open-sourced like the Linux drivers so I can only ask here for a solution. I hope that the collected information will help in solving the problem quickly. If there will be a driver update that fixes this problem, please let me know which version fixes it.

0 Kudos
2 Replies
Michal235
Beginner
72 Views

Bump

I really need a fix for this problem.

Michal235
Beginner
50 Views

Bump

Reply