- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After installing fedora on my new server, I'm seeing the following problem/error log messages:
EDAC sbridge: Failed to register device with error -22.
EDAC sbridge: Couldn't find mci handler
As far as I can tell, everything is working fine, but I want to avoid any errors or missing functionality in the future. My searching so far has only found ECC memory errors (like https://forums.linuxmint.com/viewtopic.php?t=230579 here), but those are usually accompanied with other errors about ECC being disabled. I'm not sure which other log files might have information, or whether this issue even needs attention.
Does anyone know how to continue investigating this error?
Here's the output of grepping /var/log/messages for edac, sbridge, and mci:
Dec 22 13:29:03 hostname_removed kernel: ERST: Error Record Serialization Table (ERST) support is initialized.
Dec 22 13:29:03 hostname_removed kernel: pstore: using zlib compression
Dec 22 13:29:03 hostname_removed kernel: pstore: Registered erst as persistent store backend
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: So, the end result of using this driver varies from vendor to vendor.
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: If you find incorrect reports, please contact your hardware vendor
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: to correct its BIOS.
Dec 22 13:29:03 hostname_removed kernel: ghes_edac: This system has 16 DIMM sockets.
Dec 22 13:29:03 hostname_removed kernel: EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
Dec 22 13:29:03 hostname_removed kernel: EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
Dec 22 13:29:03 hostname_removed kernel: GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
Dec 22 13:29:03 hostname_removed kernel: Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
Dec 22 13:29:03 hostname_removed kernel: Non-volatile memory driver v1.3
--
Dec 22 13:29:08 hostname_removed kernel: RAPL PMU: hw unit of domain pp0-core 2^-16 Joules
Dec 22 13:29:08 hostname_removed kernel: RAPL PMU: hw unit of domain package 2^-16 Joules
Dec 22 13:29:08 hostname_removed kernel: RAPL PMU: hw unit of domain dram 2^-16 Joules
Dec 22 13:29:08 hostname_removed kernel: EDAC sbridge: Couldn't find mci handler
Dec 22 13:29:08 hostname_removed kernel: EDAC sbridge: Couldn't find mci handler
Dec 22 13:29:08 hostname_removed kernel: EDAC sbridge: Failed to register device with error -22.
Dec 22 13:29:08 hostname_removed kernel: intel_rapl: Found RAPL domain package
Dec 22 13:29:08 hostname_removed kernel: intel_rapl: Found RAPL domain core
Dec 22 13:29:08 hostname_removed kernel: intel_rapl: Found RAPL domain dram
--
Dec 23 08:24:58 hostname_removed kernel: ERST: Error Record Serialization Table (ERST) support is initialized.
Dec 23 08:24:58 hostname_removed kernel: pstore: using zlib compression
Dec 23 08:24:58 hostname_removed kernel: pstore: Registered erst as persistent store backend
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: This EDAC driver relies on BIOS to enumerate memory and get error reports.
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: Unfortunately, not all BIOSes reflect the memory layout correctly.
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: So, the end result of using this driver varies from vendor to vendor.
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: If you find incorrect reports, please contact your hardware vendor
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: to correct its BIOS.
Dec 23 08:24:58 hostname_removed kernel: ghes_edac: This system has 16 DIMM sockets.
Dec 23 08:24:58 hostname_removed kernel: EDAC MC0: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
Dec 23 08:24:58 hostname_removed kernel: EDAC MC1: Giving out device to module ghes_edac.c controller ghes_edac: DEV ghes (INTERRUPT)
Dec 23 08:24:58 hostname_removed kernel: GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.
Dec 23 08:24:58 hostname_removed kernel: Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
Dec 23 08:24:58 hostname_removed kernel: Non-volatile memory driver v1.3
<span style...
- Tags:
- Devices
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joseph,
Thank you for contacting Intel® Technical Support.
Be aware that this server system is Out of Warranty. At this point the best way to go is by running the Intel® System Information Retrieval Utility. These logs are from the BMC which monitors the hardware, so we can take a close look to see if the issue is hardware related. You can download the Utility from https://downloadcenter.intel.com/download/26991/System-Information-Retrieval-Utility-SysInfo- here. Instructions on how to run the tool you can get them from https://downloadmirror.intel.com/26991/eng/Intel_Sysinfo_UserGuide_V1.02.pdf here.
Please try this and let me know your results.
Best regards,
Jeremiah A.
Intel® Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the response! We're doing a clean install of a windows server distro for other reasons, so I'll try that utility if the problem is evident after the migration.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joseph,
Please let me know your results.
regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Joseph,
I investigated more in this issue and I found the following:
1. This error is not really a error at all, it is just a warning letting you know the ECC is not enabled in BIOS.
2. In order for the warning to go away you need to enable Error-correcting code memory (ECC) in your BIOS.
3. It is a option for your RAM.
Please try the above instructions and let me know your results.
regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Jeremiah,
Thank you for looking into this issue!
The Memory Configuration page of the Advanced BIOS settings does not have an option to enable/disable ECC. The page shows the following:
Memory ConfigurationTotal Memory 64 GB (grayed out)
Effective Memory 65536 MB (grayed out)
Current Configuration Independent (grayed out)
Current Memory Speed DDR3-1066 (grayed out)
Memory Operating Speed Selection [Auto]
Phase Shedding [Enabled]
Memory SPD Override [Enabled]
Patrol Scrub [Enabled]
Demand Scrub [Enabled]
Correctable Error Threshold [10]
> Memory RAS and Performance Configuration
DIMM Information
DIMM_A1 4GB Installed80Operational
DIMM_A2 4GB Installed80Operational
DIMM_B1 4GB Installed80Operational
DIMM_B2 4GB Installed80Operational
DIMM_C1 4GB Installed80Operational
DIMM_C2 4GB Installed80Operational
DIMM_D1 4GB Installed80Operational
DIMM_D2 4GB Installed80Operational
DIMM_E1 4GB Installed80Operational
DIMM_E2 4GB Installed80Operational
DIMM_F1 4GB Installed80Operational
DIMM_F2 4GB Installed80Operational
DIMM_G1 4GB Installed80Operational
DIMM_G2 4GB Installed80Operational
DIMM_H1 4GB Installed80Operational
DIMM_H2 4GB Installed80Operational
Also, we switched OS to Ubuntu Server 16.04 LTS and the error issue does not appear in dmesg. So the problem seems to only exist in Fedora.
Should we look in other logs to see if the error message is in another location, or is this likely a non-issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Also, the reply formatting doesn't seem to work for a quote with courier font family. I apologize for the misaligned text.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joseph,
Thank you for your quick response. It looks like the issue is related to Unknown issue related the OS Fedora that your client is using.
As a last option, you can ask you client to run the Intel® System Information Retrieval Utility that mentioned in previous email to discard hardware issues.
Please let me know your results.
regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joseph,
I hope you are doing well today.
I'm following up with you to see if you were able to obtain the logs from the tool I sent you?
Please let me know your results.
regards,
Jeremiah A.
Intel(R) Technical Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Jeremiah,
Thank you again for your suggestions. Unfortunately, this is a side-project for both of us engineers involved, and we needed to put it on the back-burner for other things at the moment. When we come back to it, likely in the summer, I will surely return here for the excellent support you and your colleagues provide!
Sincerely,
Joe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Joe,
Ok, so I will proceed in closing this case. Once you are ready to continue you can go ahead and open a new case and refer to this one so we can continue assisting you, more than glad to do so.
I wish you the best in your projects and thank you for contacting Intel(R) Technical Support.
Best regards,
Jeremiah A.
Intel Technical Support
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page