Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Youssef_Hilali
Employee
1,002 Views

CATERR issue (IPMI LOG) using VROC

Is there a limit on the qty of SSD's customer can use with VROC? see below question from my customer

A customer of ours have a server with the following specs below:

 

QTY

Product_Code

Product_Name

Notes/Remarks

1

SYS-1029U-TN10RT

Supermicro SuperServer 1029U-TN10RT

4x SAS/SATA and 6x NVME bays are pre-connected.

2

CD8069504214002

Intel Xeon Gold 5215 Processor (13.75M Cache, 2.50 GHz) FC-LGA14B

10 Cores / 20 Threads, 2.5GHz Base / 3.4GHz Turbo

12

NT32GA72D4NBX3P-HR

32GB 2666Mhz REG ECC server geheugen

 

1

AOC-S3008L-L8e

IO SUPERM AOC-S3008L-L8e HBA

SD will flash this HBA to IR Mode to be able to create RAID 1 for the OS Drives

1

CBL-SAST-0699

Supermicro Mini-SAS HD to 4 SATA (2x 75CM / 2x 90CM)

 

2

SSDSC2KB480G801

Intel® SSD D3-S4510 Series (480GB, 2.5in SATA 6Gb/s, 3D2, TLC)

RAID 1

1

AOC-VROCINTMOD

Intel SSD Only Upgrade module (RAID 0/1/10/5)

 

6

SSDPE2KX020T801

Intel P4510 2.0TB, SSD 2.5in 15mm, NVMe, PCIe3.0 x4

RAID 5

1

AOC-STGN-i2S

Standard LP, 2x 10GbE SFP+, PCI-E x8, Intel® 82599ES

 

1

_ASSEMBLY

---------- Assemblage ----------

 

 

The use MS Windows Server 2019 and they use it in combination with MS System Center DPM to backup their Virtual Machines.

 

* INTEL VROC:

 

The problem is when they want to backup with DPM to the RAID 5 (6x Intel P4510 2.0TB) then the server will crash and reboot. In the IPMI log it says “processor, CATERR issue”.

We asked Supermicro what this error means and they said it could be anything between CPU and PCIe device.

So what we did to trouble shoot was to replace all CPU’s, memory and also remove the HBA and NIC.

Leaving the RAID 5 (6x Intel P4510 2.0TB).

 

The issue keeps coming back.

 

So we thought maybe it’s the NVME Drives and we tried just a single NVME Drive and this seems to work.
Next we tried RAID 5 with 4 NVME drives and issue doesn’t appear anymore.

When using more then 4 NVME drives in RAID it causes issues i.c.m. MS DPM so it seems.

 

Are we doing something wrog or is this a known issue?

 

So now we configured the server with 5x NVME Drives “RAID 5 over 4 drives and 1 drive as Hot Spare”.

That seems to work in combination with DPM.

0 Kudos
10 Replies
BrusC_Intel
Moderator
987 Views

Hello, @Youssef_Hilali.

 

Good day,

 

Thank you for contacting the Intel Community Support.

 

I received your ticket regarding the error message involving an Intel® VROC configuration, I will be glad to assist you.

 

The number of drives in use is well within the limit of the supported configurations guide (page 5).

- Intel® Virtual RAID on CPU (Intel® VROC) Supported Configurations: https://www.intel.com/content/www/us/en/support/articles/000030310/memory-and-storage/ssd-software.h...

BrusC_Intel_0-1601677420598.png

 

I will need to investigate a more to check if there is more information we can provide, in the mean time, would you mind providing a VROC report (Helop > System report > Save)?

 

Regards,

 

Bruce C.

Intel Customer Support Technician

A Contingent Worker at Intel

Youssef_Hilali
Employee
977 Views

Hi Bruce,

 

Thanks for picking this up. Please see below question from customer on your request.

 

Which tool can they use for Windows Server 2019 Core Edition and where can they download this

 

regards

Youssef

Youssef_Hilali
Employee
973 Views

@BrusC_Intel 

 

Noticed I didn't tag you in my previous reply

 

regards

Youssef

BrusC_Intel
Moderator
966 Views

Hello, @Youssef_Hilali.

 

Thank you for the response.

 

I will let you know if more information is required, please review the following.

 

Regarding the Data Protection Manager and "IPMI log", this is out of our scope of support, the log may be reviewed by the motherboard manufacturer due to the error: "processor, CATERR issue”. They may find some hints and discover what the issue can be as there is no information we can share.

 

The error itself (CATERR) stands for catastrophic, many times this is seen when is having issues with CPU and RAM, however, and knowing that this happens exactly when they try to use 5 drives or more, we would like to suggest removing some devices from the PCI ports and just leaving the switch cards on the raiser to see if it is getting out of resources when adding more drives to the current configuration.

 

Let me know if it is possible to perform this test or if you have other questions and concerns.

 

Best regards,

 

Bruce C.

Intel Customer Support Technician

A Contingent Worker at Intel

Youssef_Hilali
Employee
958 Views

Hi @BrusC_Intel 

 

I will ask the customer to see if they can try your suggestion and get back to me. It is strange that the same issue happens with two different systems (ASUS and SuperMicro) how is that explainable.

will get back to you once I have more info back from customer

 

regards

Youssef

BrusC_Intel
Moderator
953 Views

Hello, @Youssef_Hilali.

 

Thank you for the response and information.

 

Let me know if you receive any updates from them, I will also let you know if I manage to get additional information.

 

Best regards,

 

Bruce C.

Intel Customer Support Technician

A contingent Worker at Intel

Youssef_Hilali
Employee
949 Views

@BrusC_Intel

See below feedback from customer, also they alles our that they already contacten SuoerMicro and there was no specific info in the log related tintjes issue and that it coups be anything

We already changed to different CPUs, Memory and removed all other components. That means only motherboard, CPU, Mem and NVME drives.
Since this issue was on a Asus and Supermicro server we don't think it's Vendor related. The problem only occur when it is used with more then 4 NVME drives in Raid in combination with Microsoft DPM. Just copying files won't crash the system but when using DPM to backup for example VMs it die crash. But using just 4 NVME drives in a RAID via VROC it dus nog crash. We replicated this over and over again. Same result.

Hope this feedback helps

Regards
Youssef
Youssef_Hilali
Employee
940 Views

hi @BrusC_Intel 

Attached the log file's customer customer exported from the server having issues.

 

regards

Youssef

Youssef_Hilali
Employee
932 Views

hi @BrusC_Intel 

 

Customer just shared with me that now even in 4SSD config it crashes. see below update and attached log file

Bad news Youssef.

 

Our conclusion that it only happens when using not more than 4 NVME drives in RAID with VROC has been debunked since it has crashed again.

 

They are using 4x NVME in RAID 5 and 1x drive Hot Spare.

 

It just crashed again after using DPM to backup their VMs. Below the IPMI message and attached the NVME log.

 

Youssef_Hilali_0-1602065415882.png

 

 

We really don’t know anymore what is causing this and we’re going to replace the whole servers again with new server based on SATA SSDs because we can’t wait fort he servers to crash again. The servers are used in their production environment.

 

 

regards

Youssef

BrusC_Intel
Moderator
923 Views

Hello, @Youssef_Hilali.

 

Thank you very much for all the additional information.

 

Please allow me to review the details and I will contact you privately via e-mail in order to continue with the support.

 

Best regards,

 

Bruce C.

Intel Customer Support Technician

A contingent Worker at Intel

Reply