We found 1pce Dagger failed at 2C test with failure code “GET_ROMMON” yesterday, and based on the failure symptom as failed log shows and debug preliminary analysis( eliminated process issue after visually inspected and 2D/5D X-ray test) , the failure was related with U1_F1(FPGA), after we did A-B-A swap test, it is true component issue, need to send for FA!
1. Please provide full device details.
Device name: IC,PLD-FPGA,EP4CGX75D-7,FBGA672, 1.2V, 1.0mm, PB-Free,C-TEMP (0 to 70'C)
Full part number: CISH-16-4405-01/ EP4CGX75DF27C7N
2. What is the failure rate? What is the failure rate vs. tested sample? Example: 2 out of 100 units.
3. What is the failure symptom? Please elaborate the failure symptom in detail.
Get ROMMON failed at 2C station
4. When did the failure happen? How did you discover the failure?
2019/02/14 ,The board failed at 2C station, failure code is Get ROMMON failure, the Post code LEDs
(CR0/CR3/CR4/CR6_P80) keep solid light.
5. How did you determine the failure? Please elaborate the procedures.
This board failed to get Rommon at 1st time under 0’C, and retest the board failed at room temp 25’C. and we can duplicate it under room temperature.
6. Does the failure unit ever working before failure?
Yes, but rejected by your site for FA, Now cisco required us to send to you for FA.
7. Did they violate solder re-flow temperature profiles, moisture sensitivity? Please provide the re-flow temperature profiles.
8. Did you swap the failure device to a known good board? Is the failure following the device or board?
We are doing swap for the device on a known good board.
9. Is this a prototype build or volume/mass production?
10. Kindly provide quantitative investigation result that could proof the failure is Intel FPGA induced.
See in attachment for FA.
Noted with all the information provided. The respective team will contact you directly for the next action.
Email Details :
a)Contact name and email address: Bruce.Wu/Bruce.firstname.lastname@example.org
b)Company details: Flex.
c)Debugging steps that was done :
1) Took the failed board (FDO230700G6) to do failure analysis, and review the failed log
from 2C station as following:
Initializing Hardware ...
Initializing Hardware ...
Checking for PCIe device presence...
%ERROR% - Did not find CPLD. Read failure!
%ERROR: Critical device not found on 00:01.00
2)Put the failure board to debug station at room temperature, then power on, found
it is not stable, sometimes it can be boot up normally; sometimes it failed with get
ROMMON symptom (it can be duplicated), the failure log and Post code failed
status is as same as it happened at 2C station.
3)Based on the fail log and debug experience before, it is related with FPGA(U1_F1),
we checked the FPGA and the parts around with it, no process issue found, also
measured the impedance and voltages related with FPGA and CPU and compared
with a known pass board, all of them are normal
4)Visual inspection (including 2D and 5D x-ray test) for the whole PCBA especially for
FPGA (U1_F1) and CPU, no process issue found
5)Captured the signals between FPGA and CPU, found CPU_LPC_AD<0/1/2/3> has any
abnormality after compared with a known pass board, the details as below:
6) Replaced FPGA with a new one, the failure symptom dis-appear
d)Device Purchase Order: 85YY73943
Noted with thanks. Please be noted that the respective team will contact you directly for the next action.
We will update you the findings through our private message since the FA details contains confidential info.
I really appreciate your understanding.
Hi Intel team,
As this is fabrication defect found in the samples ,so can you provide the CN for scarp the 3pcs in your side ? By the way ,can we return the defects to you side when the components has the” GET_ROMMON” failure in
the further ?We believe it is helpful for you to investigate, thanks!
Device #2 and #3:
Device #2 and #3 failed functional tests at low temperatures. Additional characterization showed:
• Device#2 failed transceiver output buffer test and transceiver ICDR (Interpolator Clock Data
Recovery) speed test at 25°C and 0°C.
• Device#3 failed transceiver output buffer test and transceiver ICDR (Interpolator Clock Data
Recovery) speed test at 0°C only.
ICC values of both ERMA devices are comparable to factory standard device, this rules out an electrical
overstress damage (EOS) as cause of functional failures.
The devices failure was believed to be caused by a random defect. Such a defect is introduced into the
devices during the wafer fabrication process, and can cause a latent failure. This kind of fabrication defect
is random in nature, and does not pose any concern for reliability of other devices.