Ethernet Products
Determine ramifications of Intel® Ethernet products and technologies
5764 Discussions

I219-LM silent data corruption, Linux

Tomasz
Beginner
7,878 Views

Dear forum members,

The issue was first detected while transferring a large 3 TB file over scp. Since the ssh protocol has data integrity algorithms built-in, the transfer has failed repeatedly, often towards the end, complaining about "message authentication code incorrect". I tried different encryption schemes and the result was the same (error).

In order to troubleshoot the problem I performed the following steps:

1. Installed the newest driver from the Intel website (3.8.4-NAPI) on top of a stock Debian kernel 4.19.0-10-amd64. The issue persists.

2. Used the "socat" tool to eliminate potential issues with scp. I used raw TCP/IP transfer via socat and pipelined the output to get a sha256 fingerprint. The fingerprint of the transferred file was incorrect in most cases (I had one transfer that has completed with correct fingerprint, out of around six).

3. Installed a different Ethernet card based on a different chip, transfers were with no errors (tried 3 times).

4. Installed Windows 10 and run the same test using CygWin-based socat on Intel chip: no errors (repeated 5 times).

 

The problem happens on a Supermicro motherboard X11SCA-F, the chip comes with the motherboard. I run current stable version of Debian. I tried replacing the motherboard, but the replacement had the same issue.

There are no errors in the log files. The software does not detect any issues with the device and it apparently believes that all the data as passed up the TCP/IP stack is correct, while in fact some bits in this 4 TB-long stream are flipped. It seeps the error happens, on average, once per around 1 TB of data.

 

Any advice how to proceed would be greatly appreciated.

Thanks!

Tomasz

0 Kudos
24 Replies
Tomasz
Beginner
1,263 Views

Hello Alfred,

 

I am still collecting more data. I have some clues as to what is going on and I will let you know once I am sure. The problem is difficult due to long times needed for testing.

 

Best,

Tomasz

0 Kudos
AlfredoS_Intel
Moderator
1,260 Views

Hi Tomasz,

Thank you for your response.

No problem. You can have all the additional time that you need.

We will just do a routine check with you after 5 business days to check if you need more time.


Best Regards,

Alfred S

Intel® Customer Support


0 Kudos
AlfredoS_Intel
Moderator
1,245 Views

Hi Tomasz,

We are just following up.

It looks like you need more time to carry out the recommendations that we have provided.

We will follow up again after 3 business days. Should we not hear from you, our system may automatically close the thread.



Best Regards,

Alfred S

Intel Customer Support 


0 Kudos
AlfredoS_Intel
Moderator
1,235 Views

Hi Tomasz, 

We need to close this thread since we have not gotten a response from you: maybe because you are busy or preoccupied at the moment. We know that this is important for you to get it resolved and it is also equally important for us to give you the right solution; as much as we would like to assist you, we need to close it to attend to other customers. We hope for your consideration and understanding on this one.


If you need any additional information, please submit a new question as this thread will no longer being monitored.


Thank you for contacting Intel® and have a great week!




Best Regards,

Alfred S

Intel® Customer Support


0 Kudos
Reply