- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello!
We have another server with this strange issue.
Server started to give RAID card overheating errors. But in reality it is not true.
We changed RAID card to a new one - still the same problem.
Motherboard is S2600CW2.
RAID controller logs attached. and RAID controller FW is the latest available.
How to fix it?
P.S::
We had familiar problem with S2600WT board and problem started in the same way but then another problem appeared with battery controller.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
Thank you for contacting Intel(R) Technical Support.
I understand that the RAID controller will report overheating issues, it is not correct.
You mentioned that even when you replaced the RAID card you are still getting the error.
Have updated the motherboard firmware?
Can you please indicate the model of the RAID controller?
Have you tested the RAID controller in another Server Board?
Looking forward to your response.
Best regards,
Jeremiah A,
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jeremiah,
The case is, that server was sold to customer and worked well for few months.
The started to give these errors. We updated FW of RAID controller, but it didn't help. so we decided to change the RAID controller to a new one.
But it didn't solve the issue. The RAID controller is RS3DC040 and latest FW installed.
We didn't have an opportunity to test it in another platform as we had to send it to AWR.
Motherboard FW is not the latest. We can try to upgrade it, but I am quite sure that it will not fix the issue.
Yes we had familiar issue in ticket that was opened recently but still not solution (there another issue occuered as well)
Problem started in the same way but after that server started to beep continiously when RAID BBU module connected, but no error nowhere logged
But there is the same controller model, but motherboard is different. So yes, we met identical problem with another board.
Sergei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
Thank you for your update.
In that case, the issue is related to the BBU and on this case you are reporting that the card is doing overheating notification even when you have replaced the RAID card. This looks like two different issues. Is this RAID using the same BBU? Have you checked for any fan issues or degraded disks? Can you please update the FW on the motherboard?
Looking forward to your results.
Best regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes card gives error even if changed to a new one.
FAN are ok and Active System Console does not report any issues with them also.
Drives are fine and no warning in RAID controller logs.
RAID controller dosn not have any BBU here.
We may try to upgrade motherboard FW, but might be problematic.
Regards,
Sergei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergei,
Please let me know the outcome of the firmware update.
Regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
I hope you are doing well today.
I'm following up with you to know if you were able to update the Server Board firmware and what was the outcome.
If you have any question, please let me know.
Regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
I hope you are doing well today.
I'm following up with you to know if you were able to update the Server Board firmware and what was the outcome.
If you have any question, please let me know.
Regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
I haven't heard back from in regards to this case.
I will proceed in closing this case.
If you still need more assistance from us, do not hesitate in contacting us again.
Best regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Customer had an opportunity to update the BIOS to the latest version of motherboard, but still RAID controller overheating sense present.
What next? As I mentioned before there is 16C temperature actually and no real overheating. RAID controller was changed to anew one.
Regards,
Sergei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
Thank you for your update.
I will investigate this issue in deep, as the issue doesn't seems to be the RAID controller since you already replace the controller and still getting the same result with the new one.
As soon as I get new information I will contact you right away.
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sergei,
In the meantime, can you please ask you customer to perform a clear CMOS and clear the SEL.
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jeremiah,
That's done.
By the way my question is, where this RAID controller take this information from? it there RAID controller sensor or something else?
We have such theory that it might be wrong temperature provided by one of the hard drives?!
Sergei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
Thank you for your update.
How many HDD is the server system is using?
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
I hope you are doing well today.
I'm still investigating this issue. I will get back to you as soon as possible with new information.
Your prompt response is highly appreciated.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are 3 SSDs + 9pcs of 6TB HDDs
Our technician will go there next week and change chassis, motherboard and RAID controller. So only drives, CPU and memory will stay the same.
If it will not solve the problem then apparently we have to change all drives.
Sergei
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
Thank you for your update.
I will appreciate if you can please get me the TTY RAID log as well as the sysinfo for the server board/system.
If you have any question, please let me know.
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergei,
I hope you are doing well today.
Just following up with you to see if you were able to gather the information requested in the previous post.
Looking forward to your comments.
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sergei,
Thank you for your update.
We will check into this and get back to you as soon as possible with new information.
Regards,
Jeremiah A.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sergei,
I hope you are doing well today.
I was able to check the logs, and what I found is as follow:
There is a temperature related issue on it, but it points to a hard drive:
11652:2018-10-29, 15:41:21 Critical: Enclosure PD 04(c Port 0 - 3/p1) temperature sensor 1 above error threshold.
This is an example, the entry repeats across the logs several times and always for the same drive
Port 0-3 on the controller, PD04 meaning, drive number 3, yet the RAID structure is optimal, consistency checks and patrol reads are completed with no issue.
I think there is a drive that is overheating based on the RAID log, not the RAID adapter. I suggest performing tests on the hard drive and probably replacing it, if needed.
I hope this information helps to clarify the issue.
If you have any question, please let me know.
Best regards,
Jeremiah A.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page