Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage; and Intel® Xeon® Processors
4465 Discussions

Intel S2600STBR constant NVMe Smart Errors on Intel VROC

GTobi1
New Contributor II
1,033 Views

Hello,

Have an Intel S2600STBR with all BIOS, ME, BMC, FRU/SDR, SSD firmware etc updated to latest. 

 

  • Backplane is AUP8X25S3NVDK in an Intel P4304XXMUXX chassis. 
  • Boot drives are a RAID1 pair of Kioxia CD6 Enterprise Class - 960GB u.2 
  • Data drives are a RAID1 pair of Intel SSD DC P4510 - 2TB u.2 
  • OS is Server 2022 Std latest updates applied
  • VROC driver is latest 7.7.0.1273 as are chipset, LAN & video drivers.

Getting constant SMART errors randomly across all 4 drivers warning of at risk SMART event. If they errors are cleared they usually appear after a reboot but not on same drive. If errors are cleared and server shut down they don't appear again on boot but if server is rebooted then errors reappear. 

I suspect it is a bug in OS, a Windows Update or Intel VROC driver or software.

Anyone else experience same and find a fix?

GTVROC; RSTE

Labels (3)
0 Kudos
7 Replies
JoseH_Intel
Moderator
927 Views

Hello GTobi1,


Thank you for joining the Intel community


Can you tell where are getting the SMART errors from? Are those showing up to the Windows Event viewer? In the image it looks like a (bit blurry) Windows notification.

Could you please share the Kioxia SSD exact model? According to the Intel Server Configurator tool Purley Tested Hardware List (intel.com) the 960GB validated ones are the following:

  • KCD51LUG960G (960GB, NVMe, 2.5 inch, PCIe 3.0)
  • KCM51RUG960G (960GB, NVMe, 2.5 inch, PCIe 3.0)
  • KCM6XRUL960G (960GB, NVMe, 2.5 inch, 1DWPD)
  • KXD51RUE960G (960GB, NVMe, 2.5 inch, PCIe 3.0)

But among them I cannot see any CD6 one.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


GTobi1
New Contributor II
879 Views

Hello Jose,

 

Thanks for coming back to me.

 

The SMART errors are showing up in the VROC console with yellow bangs randomly across the 4 drives both the Intel and the Kioxia. The RAID1 arrays do not span controllers. The VROC console gives us the option to clear the SMART errors and sometimes clear META data. 

The drive model we have used is the KCD61LUL960G which is a U.3 supposedly backwards compatible with u.2 it was offered to us by Intel distributor when the 1TB were not available.

BTW I have found it difficult to find relevant intel HCLs and even drivers since the support website refresh.

GT 

JoseH_Intel
Moderator
839 Views

Hello GTobi1,


Please try the following steps:


  • Reset the SMART errors on the drives and modifiy the cooling policy of the server.
  • Download the update package . Reflash the BIOS and reupdate the FRU/SDR manually and separately by running UpdBIOS.nsh and once completed, run UpdS2600STBFRUSDR.nsh instead of running startup.nsh on a usual Firmware update process.
  • If using Windows Server 2019, downgrade the VROC driver from 7.6 to 7.5.9.1013. This version resolves a random reboot issue for NVMe VROC.

If the above does not help

  • Shutdown and remove power cables from the PSU. (This will reset the BMC)
  • Wait for 10 secs then reconnect the power cable.
  • Wait for the Blue LED on the front panel to go off then only power on the system. (roughly 60sec, this allows the BMC to finish booting)
  • Wait for around 30 seconds after power on, login to BMC and get new set of debug log.
  • Boot to windows and get the following then submit to Intel Support:
  1. OS event log
  2. Windows System Event log
  3. Windows Application Event log
  4. VROC system report


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


GTobi1
New Contributor II
747 Views

Thanks Jose,

Will attempt what you recommend.

 

You said 

  • If using Windows Server 2019, downgrade the VROC driver from 7.6 to 7.5.9.1013. This version resolves a random reboot issue for NVMe VROC.

We actually plan to use Windows Server 2022 what is the recommended driver version for 2022

 

GT

JoseH_Intel
Moderator
713 Views

Hello GTobi1,


Even when Windows Server 2022 is listed as a tested OS for this system: Tested Operating Systems for Intel® Server Board S2600ST Family... the only drivers available at the moment are Chipset, NIC and graphics. There are no VROC drivers available for Windows Server 2022 yet. Most likely this is been worked on, but so far the 2019 VROC driver needs to be used. Support for Intel® Server Board S2600STBR


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


JoseH_Intel
Moderator
689 Views

Hello GTobi1,


I am just following up to double-check if you found the provided information useful. If you have further questions please don't hesitate to ask. If you consider the issue to be completed please let us know so we can proceed to mark this thread as closed. I will try to reach you back on next Thursday 17th. After that the thread will be automatically archived. 


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


JoseH_Intel
Moderator
647 Views

Hello GTobi1,


We will proceed to mark this thread as closed. If you have further issues or questions just go ahead and submit a new topic.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


Reply