My Alarm Management Module is handling IPMI traps from Mullins TIGW1U. I'd like to check, what do i do (user corrective action) in case of processor event alarm?
This alarm is mapped to the ipmi traps below. Anyone can advice on the corrective action for each of these event trap?
1) processorIERREvent - 18.104.22.168.4.1.322.214.171.124.5.1000.40.1 --> internal error
2) processorFRB1Event - 126.96.36.199.4.1.3188.8.131.52.5.1000.40.3 --> faut resilient boot error, BIST failure
3) processorFRB2Event - 184.108.40.206.4.1.3220.127.116.11.5.1000.40.4 --> faut resilient boot error, hang in POST failure
4) processorFRB3Event - 18.104.22.168.4.1.322.214.171.124.5.1000.40.5 --> faut resilient boot error, initialization failure
5) processorConfigurationErrorEvent - 126.96.36.199.4.1.3188.8.131.52.5.1000.40.6 --> configuration error
6) processorSMBIOSUncorrectableCPUEvent - 184.108.40.206.4.1.3220.127.116.11.5.1000.40.10 --> smbios uncorrectable error.
- Conferences & Events
I would remove all add in cards and try again.
IERR is the most maligned error code.
What it means is the CPU is not able to make forward process in its opertional code. -- IE it is waiting for something.
Most often it is a add in card failing rather than the processor.
The rest of the errors look like a cascade from the first.
3 types if FRB timmer indicating a hang.
configuration error could be the hung CPU, a bad CPU or miss matched CPUs
The last is telling you, its dead and could not recover it's self.
If it works with no cards installed, add the cards back one at a time until it fails again.