Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage; and Intel® Xeon® Processors
4482 Discussions

S2600WFT - Slow memory, intermittently since BIOS 02.01.0013/02.01.0014

bbs2web
Novice
438 Views

Hi,

 

We observed high system CPU utilisation under Linux since upgrading a R1208WFTYSR (S2600WFT) to BIOS 02.01.0013 in December (latest then), which continues after we upgraded to BIOS 02.01.0014 today (latest now).

 

Herewith a CPU utilisation graph for a virtualisation host node where approximately 80% of the guest utilisation (cyan) appears as system time (brown):

bbs2web_0-1642534986734.png

 

CPU benchmarks are however consistent on this node, when compared to identical siblings that are performing normally:

bbs2web_1-1642535202151.png

 

Okay, not 100% identical. Some nodes have 1 TiB (24 x 64 GiB) Samsung M386A8K40DM2-CVF modules whereas the problematic node has 2 DIMMs in each channel, so 1 TiB (24 x 64 GiB) Micron 36ASF8G72PZ-2G9B2 modules.

bbs2web_4-1642536513099.png

https://www.micron.com/products/dram-modules/rdimm/part-catalog/mta36asf8g72pz-2g9/mta36asf8g72pz-2g...

 

In 7 out of 8 cold boots (ie power out of PSUs or powering off and on system via BMC) memory benchmarks show a drastic difference. Performance of the systems with Samsung memory is consistently about 6.1 GiB/s, the problematic system with the Micro memory either runs at 133 MiB/s (7 out of 8 cold boots) or 6.1 GiB/s (1 out of 8 cold boots).

bbs2web_2-1642535756559.png

 

Performance after restarting, remains at this level until either cold or warm booted:

bbs2web_3-1642536237110.png

 

We've set the BIOS to log correctable errors and the system event logs generate no notifications. I presume an initialisation problem as the system performs absolutely normally when we warm boot it (10/15 of those initialise the memory at 6.1 GiB/s).

 

The platform has been in production a while, this issue is new since we upgraded the BIOS on the 21st of December, the system has operated at full performance for almost 2 years prior to upgrading the firmware.

 

We have syscfg dumps of the BMC and BIOS, should they be relevant.

 

PS: Yes, we did reflash, yes we ran startup.nsh twice, yes we reset BIOS to defaults and yes we set KCS to 'allow all' before starting the update process.

 

Regards

David Herselman

Labels (1)
0 Kudos
3 Replies
JoseH_Intel
Moderator
419 Views

Hello bbs2web,


Thank you for joining the Intel community.


Thank you for your detailed post. At a first glance I found that both of the modules used:


Samsung M386A8K40DM2-CVF

Micron 36ASF8G72PZ-2G9B2


does not show up as validated under the Intel Server Configurator tool Purley THOL (intel.com) for this system.

Based on that, there are chances that both of the systems (problematic and non-problematic) might exhibit the same issue eventually.


Is there any chance that you could try a validated memory under the problematic system? You can get the validated RAM list under the same website shared above.


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


JoseH_Intel
Moderator
414 Views

Hello bbs2web,


I am just following up to double-check if you found the provided information useful. If you have further questions please don't hesitate to ask. If you consider the issue to be completed please let us know so we can proceed to mark this thread as closed. I will try to reach you as a very last time on next Tuesday 25th.  After that we will mark the thread as closed


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


JoseH_Intel
Moderator
403 Views

Hello bbs2web,


We will proceed to mark this thread as closed. If you have further issues or questions just go ahead and submit a new topic.


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


Reply