Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4991 Discussions

RS3DC040 overheating warnings again

SAljoshin
New Contributor I
4,116 Views

Hello!

We have another server with this strange issue.

Server started to give RAID card overheating errors. But in reality it is not true.

We changed RAID card to a new one - still the same problem.

Motherboard is S2600CW2.

RAID controller logs attached. and RAID controller FW is the latest available.

How to fix it?

P.S::

We had familiar problem with S2600WT board and problem started in the same way but then another problem appeared with battery controller.

0 Kudos
26 Replies
idata
Employee
1,366 Views

Hello Sergei,

 

 

Thank you for contacting Intel(R) Technical Support.

 

 

I understand that the RAID controller will report overheating issues, it is not correct.

 

 

You mentioned that even when you replaced the RAID card you are still getting the error.

 

Have updated the motherboard firmware?

 

Can you please indicate the model of the RAID controller?

 

Have you tested the RAID controller in another Server Board?

 

 

Looking forward to your response.

 

 

Best regards,

 

Jeremiah A,

 

Intel(R) Technical Support
0 Kudos
SAljoshin
New Contributor I
1,366 Views

Hello Jeremiah,

The case is, that server was sold to customer and worked well for few months.

The started to give these errors. We updated FW of RAID controller, but it didn't help. so we decided to change the RAID controller to a new one.

But it didn't solve the issue. The RAID controller is RS3DC040 and latest FW installed.

We didn't have an opportunity to test it in another platform as we had to send it to AWR.

Motherboard FW is not the latest. We can try to upgrade it, but I am quite sure that it will not fix the issue.

Yes we had familiar issue in ticket that was opened recently but still not solution (there another issue occuered as well)

Problem started in the same way but after that server started to beep continiously when RAID BBU module connected, but no error nowhere logged

But there is the same controller model, but motherboard is different. So yes, we met identical problem with another board.

Sergei

0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

Thank you for your update.

 

 

In that case, the issue is related to the BBU and on this case you are reporting that the card is doing overheating notification even when you have replaced the RAID card. This looks like two different issues. Is this RAID using the same BBU? Have you checked for any fan issues or degraded disks? Can you please update the FW on the motherboard?

 

 

Looking forward to your results.

 

 

Best regards,

 

Jeremiah A.

 

Intel(R) Technical Support

 

0 Kudos
SAljoshin
New Contributor I
1,366 Views

Yes card gives error even if changed to a new one.

FAN are ok and Active System Console does not report any issues with them also.

Drives are fine and no warning in RAID controller logs.

RAID controller dosn not have any BBU here.

We may try to upgrade motherboard FW, but might be problematic.

Regards,

Sergei

0 Kudos
idata
Employee
1,366 Views

Hi Sergei,

 

 

Please let me know the outcome of the firmware update.

 

 

Regards,

 

Jeremiah A.

 

Intel(R) Technical Support
0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

I hope you are doing well today.

 

 

I'm following up with you to know if you were able to update the Server Board firmware and what was the outcome.

 

 

If you have any question, please let me know.

 

 

Regards,

 

Jeremiah A.

 

Intel(R) Technical Support
0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

I hope you are doing well today.

 

 

I'm following up with you to know if you were able to update the Server Board firmware and what was the outcome.

 

 

If you have any question, please let me know.

 

 

Regards,

 

Jeremiah A.

 

Intel(R) Technical Support
0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

I haven't heard back from in regards to this case.

 

 

I will proceed in closing this case.

 

 

If you still need more assistance from us, do not hesitate in contacting us again.

 

 

Best regards,

 

Jeremiah A.
0 Kudos
SAljoshin
New Contributor I
1,366 Views

Hello,

Customer had an opportunity to update the BIOS to the latest version of motherboard, but still RAID controller overheating sense present.

What next? As I mentioned before there is 16C temperature actually and no real overheating. RAID controller was changed to anew one.

Regards,

Sergei

0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

Thank you for your update.

 

 

I will investigate this issue in deep, as the issue doesn't seems to be the RAID controller since you already replace the controller and still getting the same result with the new one.

 

As soon as I get new information I will contact you right away.

 

 

Regards,

 

Jeremiah A.
0 Kudos
idata
Employee
1,366 Views

Sergei,

 

 

In the meantime, can you please ask you customer to perform a clear CMOS and clear the SEL.

 

 

Regards,

 

Jeremiah A.
0 Kudos
SAljoshin
New Contributor I
1,366 Views

Jeremiah,

That's done.

By the way my question is, where this RAID controller take this information from? it there RAID controller sensor or something else?

We have such theory that it might be wrong temperature provided by one of the hard drives?!

Sergei

0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

Thank you for your update.

 

 

How many HDD is the server system is using?

 

 

Regards,

 

Jeremiah A.
0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

I hope you are doing well today.

 

 

I'm still investigating this issue. I will get back to you as soon as possible with new information.

 

 

Your prompt response is highly appreciated.

 

 

Regards,
0 Kudos
SAljoshin
New Contributor I
1,366 Views

There are 3 SSDs + 9pcs of 6TB HDDs

Our technician will go there next week and change chassis, motherboard and RAID controller. So only drives, CPU and memory will stay the same.

If it will not solve the problem then apparently we have to change all drives.

Sergei

0 Kudos
idata
Employee
1,366 Views

Hello Sergei,

 

 

Thank you for your update.

 

 

I will appreciate if you can please get me the TTY RAID log as well as the sysinfo for the server board/system.

 

 

If you have any question, please let me know.

 

 

Regards,

 

Jeremiah A.

 

0 Kudos
idata
Employee
1,366 Views

Hi Sergei,

 

 

I hope you are doing well today.

 

 

Just following up with you to see if you were able to gather the information requested in the previous post.

 

 

Looking forward to your comments.

 

 

Regards,

 

Jeremiah A.
0 Kudos
SAljoshin
New Contributor I
1,366 Views

Hello,

TTY RAID log fail attached and also SEL log, but it is empty as you asked to reset it.

Sergei

0 Kudos
idata
Employee
1,366 Views

Hi Sergei,

 

 

Thank you for your update.

 

 

We will check into this and get back to you as soon as possible with new information.

 

 

Regards,

 

Jeremiah A.
0 Kudos
idata
Employee
1,338 Views

Hello Sergei,

 

 

I hope you are doing well today.

 

 

I was able to check the logs, and what I found is as follow:

 

There is a temperature related issue on it, but it points to a hard drive:

11652:2018-10-29, 15:41:21 Critical: Enclosure PD 04(c Port 0 - 3/p1) temperature sensor 1 above error threshold.

This is an example, the entry repeats across the logs several times and always for the same drive

Port 0-3 on the controller, PD04 meaning, drive number 3, yet the RAID structure is optimal, consistency checks and patrol reads are completed with no issue.

I think there is a drive that is overheating based on the RAID log, not the RAID adapter. I suggest performing tests on the hard drive and probably replacing it, if needed.

I hope this information helps to clarify the issue.

 

 

If you have any question, please let me know.

 

 

Best regards,

 

Jeremiah A.
0 Kudos
Reply