We have a problem with one server.
There are two absolutely identical servers in the same power network. There are redundant PSU in each server. There are 2 CPUs are installed in each server. Time to time one of PSU is unplugged for some period of time. The first server works correctly.
But the second one gives such warning:
The speed of processor 6 in group 0 is being limited by system firmware. The processor has been in this reduced performance state for 71 seconds since the last report.
"The network interface "Intel(R) I350 Gigabit Network Connection # 2" has begun resetting. There will be a momentary disruption in network connectivity while the hardware resets.
Reason: The network driver did not respond to an OID request in a timely fashion.
This network interface has reset 5 time(s) since it was last initialized."
"Intel(R) I350 Gigabit Network Connection # 2
Network link is disconnected."
As a result server hungs up once in 2-3 weeks. It is a surveillance server.
What we tried to do:
- BIOS, FW and drivers - all are latest and greatest. Didin't help.
- Changed power plan to High Performance and also changed CPU minimum speed to 100% - didn't help
- Untick option "Allow to disable this device to save power" for LAN adapters. - didn't help.
Most interesting is that when there is one PSU plugged and working, there is no errors. When you plug the second PSU without power cord - errors appear. And it does not make any difference which PSU you try to unplug.
Seems like this problem is hardware related. But what shall we try to change? Power Distribution Module or motherboard?
In an effort to better assist can you please provide some details of the software and infrastructure you have installed?
What is the chassis model?
Can you share us SysInfo logs from the failing server?
there is a power generator, which powers up 1 PSU in each server and UPS for another PSU whitch powers up another PCU in each server.
Chassis is: Supermicro CSE-846E16-R1200B
Event log is attached.
Windows Server 2016 Standard 16C is an OS in both servers.
Unfortunately, the file you sent to us is not from the board, we need the board logs.
In order to download the Sysinfo logs from the board please follow thishttps://www.intel.com/content/www/us/en/support/articles/000023940/server-products/server-boards.html link steps.
I apologize for the delay, we were trying to access the file but It seems to be encrypted.
Also, It appears to be different from the one that we require.
Can you please check the procedure with thishttps://www.intel.com/content/www/us/en/support/articles/000023940/server-products/server-boards.html link?
It will take time.
But right now I have here brand new Compute Card CD1M3128MK with latest BIOS and drivers and clean Windows 10 Pro installed and I have here exact same problem.
So it should be related to some driver, not hardware issue.
Can you please verify that you are using the latest driver version for the following:
a. Chipset (Motherboard).
b. Intel® Rapid Storage Technology.
Also, can you test disabling "Enhanced Intel SpeedStep" in the BIOS configuration?
As it is more simple for me to test with this ComputeCard here than with server on client side.
It is CD1M3128MK ComputeCard. Clean Windows 10 Pro 1803 build.
All latest drivers and BIOS instaloled from downloadcenter.intel.com
There is no RapidStorage driver for this card.
There is no such settings like SpeedStep fro this product in BIOS.
There is no errors related to LAN, but still only CPU error.
What can I try next?
I also have another ComputeCard model and I can test with that aswell.
The latest version for chipset is:
Version: 10.1.17479.8054 (Latest) the easier way is to download it and install it is available on our website.
Please let us know as soon as you verify the information provided.
In order to better assist you and get the more accurate information we will need to get the logs from the server, we need you to follow the steps in thishttps://downloadcenter.intel.com/download/26991/System-Information-Retrieval-Utility-SysInfo-?v=t link.