I have a S2600WFT (R1304WFTYS) with 2 x Xeon Silver 4108 and 2 x KINGSTON 32GB 2Rx4 2933MHZ PC4-23400 ECC DDR4 installed in CPU1 A1 and CPU2 A1. Total 64GB.
The machine has been in service for a year when last week the DIMM installed at CPU2 A1 failed. On reboot DIMM @ CPU2 A1 was reported disabled and shows installed&failed.
I pulled two working KINGSTON DIMMs (exact same) from a working machine and installed in the failed machine ie I replaced CPU1 and CPU2 A1 DIMMs. On reboot I get the same message that DIMM CPU2 A1 is disabled and that only 32GB is available.
What am I missing here to clear the disabled report?
Thank you for reaching Intel Communities. I will gladly help you.
Based on your description and troubleshooting, I would suggest taking a look into the article How to Troubleshoot Failed RAM Memory in a Server Node: https://www.intel.com/content/www/us/en/support/articles/000030516/server-products.html
However, it seems to me that you have approached this matter in the right way, and that is likely that the DIMM port is defective, as noted at the end if that article, and you would need to replace the board.
There is a tool online where you can find the warranty information for your product: https://supporttickets.intel.com/warrantyinfo
I would suggest getting support from a representative for Intel® Server Products if you need more information on how to process a warranty replacement: https://www.intel.com/content/www/us/en/support/products/1201/server-products.html
Thought I would give an update. To recap:
My S2600WFT system with 64GB (2 x KINGSTON 32GB 2Rx4 2933MHZ PC4-23400 ECC DDR4) installed in CPU1 A1 and CPU2 A1 reported a memory failure at 1am one morning with memory @ CPU2 A1. I powered down the server, removed both memory DIMMs and installed two new ones from a working machine. CPU2 A1 slot still reported "installed&failed".
At that point I removed server from the data center back to my office. Powered up server and same "installed&failed" condition remained for CPU2 A1 slot.
Moved CPU2 A1 DIMM to CPU1 A2 and powered up with no errors and 64GB reported.
Installed 2 additional KINGSTON 32GB DIMMs to CPU2 A1 and CPU A2 - 128GB reported no problems and server has been returned to the data center without further problems.
I can only conclude that the server needed to "see" the CPU2 A1 slot depopulated for it to reset itself. All very bizarre.