Embedded Intel Atom® Processors
Technological Conversations about Intel Atom® Hardware, Software, Firmware, Graphics

INTEL C3758 IERR_N ERROR

M_Serhan_Ozyigit
Beginner
644 Views

Hi ;

 

I have a custom board based on C3758 SoC. When i tried to boot OpenBsd OS board gets random hangs. Sometimes its not even booted up to the OS. Sometimes its succesfully boot the Os and get in to the console but after a while it hangs again randomly. 

 

I have examine the hardware in hang situtation and i realized that the IERR_N error signal is asserted in hang situation. Other error signals (MCERR and EROR[N] signals.) are deactive.

 

Also the we cant find any fault in clocks and power supplies rails when this hang occurs.

 

What can we do about this problem ?

 

What can trigger this IERR_N signal.

0 Kudos
4 Replies
Diego_INTEL
Moderator
607 Views

Hello @M_Serhan_Ozyigit,

 

Thank you for contacting Intel Embedded Community.

 

That error is tied to an unrecoverable internal error. This signal is coming from Punit, so it is about a power error report.

The following are some possible cases of such catastrophic errors:
• Retirement watchdog time-out from the core
• Internal error detected by the SoC power management circuitry

 

Best regards,

 

@Diego_INTEL 

M_Serhan_Ozyigit
Beginner
572 Views

Hi Diego ,

 

Firstly thank you for your answer ,

 

1. What can cause "Retirement watchdog time-out from the core"

2. I have LEDs on the board that show the states of sleep states and PLTRST signal, and when I get an IERR_N error and the board goes into a freezing state, I see the sleep state 0 powergood signal still standing and the platform reset has been removed (the board is not in reset). Shouldn't these signals be deasserted in the event of a power failure?

 

Thank you.

0 Kudos
M_Serhan_Ozyigit
Beginner
504 Views

Also  sometimes when the board is booting up i get a error message like this "

CPU Index 3 - APIC 16 Unexpected Exception:18 @ 10:7f792136 - Halting
[EMERG] Code: 0 eflags: 00000046 cr2: 00000000
[EMERG] eax: 7f79d7d0 ebx: 00000004 ecx: 0000001b edx: 00000001
[EMERG] edi: 000302f8 esi: 00000000 ebp: 00000000 esp: 7f7c0fdc"
 
This message repeats for all of cpu cores and then board get hang. What is thme cause and meaning of this exception ?
0 Kudos
Diego_INTEL
Moderator
485 Views

Hello @M_Serhan_Ozyigit,

 

Regarding the watchdog time-out sometimes can be related to memory problems, can be caused by incompatible memory, bad memory or a failure in the processor's memory controllers.

https://www.kernel.org/doc/html/v5.9/watchdog/watchdog-api.html

https://www.makeuseof.com/fix-clock-watchdog-timeout-error-windows/

 

I'm reviewing the document #558579.

https://www.intel.com/content/www/us/en/secure/content-details/558579/intel-atom-processor-c3000-product-family-external-design-specification-eds-volumes-1-2-3-and-4.html?DocID=558579

 

At page 270:

IERR_N: Internal Error: This active-low signal indicates to the external circuitry that the SoC has detected an error.
While the SoC active-low output signal PMU_PLTRST_N is asserted, this signal is not valid and must be ignored by the platform board circuitry.

From 288:

Board designs must not consider IERR_N and MCERR_N valid until after the PMU_PLTRST_N (Platform Reset) signal is deasserted by the SoC. When the SoC is powered-up or a cold boot, the IERR_N and MCERR_N signals may be unstable and falsely signal an internal error or machine check error before the platform reset is deasserted by the SoC.

 

Have you verified if all voltage rails are correct? 

 

Also, I found a debug handbook for Xeon that may worth to check for your case (server processor):

Document #576242 - System Hang Issue Debug Handbook

https://www.intel.com/content/www/us/en/secure/content-details/576242/intel-xeon-processor-scalable-memory-family-skylake-system-hang-issue-debug-handbook.html?wapkw=576242&DocID=576242

 

Best regards,

 

@Diego_INTEL 

0 Kudos
Reply