Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JChen576
Beginner
455 Views

Stratix10 SX Soc Development Kit boot failed randomly

Hi All,

 

We bought a Stratix10 SX Soc Development Kit (Model Name: DEV KIT DKSOC1SSXLA) recently and got an extra HPS HILO DDR Card this week. 

We plug the DDR Card onto the evaluation board.

Follow the instructions of the "QUICK START GUIDE" in the box.

However, we found the KIT is extremely unstable. 

Sometimes it can boot into Linux system, sometimes it blocks at booting or shows error message (randomly ).

The flash image is the original image as default in the box.

Read Intel's documents and compare the settings on the board, but don't see any clue.

 

Did anyone have this issue before? How can I solve this problem?

Thanks

 

 

These are some error messages while system booting:

* Case one:

===================================================================

U-Boot SPL 2017.09 (Sep 22 2018 - 07:29:05) 

MPU 1000000 kHz 

L3 main 400000 kHz 

Main VCO 2000000 kHz 

Per VCO 2000000 kHz 

EOSC1 25000 kHz 

HPS MMC 50000 kHz 

UART 100000 kHz 

DDR: Initializing Hard Memory Controller 

DDR: Calibration success 

SDRAM: Initializing ECC 0x00000000 - 0x80000000 

SDRAM-ECC: Initialized success with 1343 ms 

DDR: HMC init success 

DDR: 2048 MiB 

DDR: Running SDRAM size sanity check 

DDR: SDRAM size check passed! 

QSPI: Reference clock at 400000000 Hz 

Trying to boot from MMC1 

"Synchronous Abort" handler, esr 0x96000210 

ELR: ffe08efc 

LR: ffe08e04 

x 0: 0000000000000000 x 1: 0000000000018404 

x 2: 00000000a0000037 x 3: 0000000000000015 

x 4: 000000003fa00320 x 5: 000000000000006c 

x 6: 00000000ffe12f63 x 7: 0000000000000003 

x 8: 0000000000000230 x 9: 0000000000000080 

x10: 00000000ffe3dbec x11: 00000000ffe12a10 

x12: 0000000000000176 x13: 0000000000000454 

x14: 00000000ffe3dd6c x15: 00000000ffe12a10 

x16: 0000000000030a0f x17: f63f9fa5faadbfed 

x18: 00000000ffe3de90 x19: 000000003fa00800 

x20: 0000000000000000 x21: 00000000ffe3dd00 

x22: 00000000ffe3dbe0 x23: 00000000000007bd 

x24: 0000000000000029 x25: 0000000000000001 

x26: 0000000080020000 x27: 122c43677d8ff77f 

x28: df6e6e611f01fbfd x29: 00000000ffe3dc20 

 

Resetting CPU ... 

 

===================================================================

* Case 2

===================================================================

U-Boot SPL 2017.09 (Sep 22 2018 - 07:29:05)

MPU 1000000 kHz

L3 main 400000 kHz

Main VCO 2000000 kHz

Per VCO 2000000 kHz

EOSC1 25000 kHz

HPS MMC 50000 kHz

UART 100000 kHz

DDR: Initializing Hard Memory Controller

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Triggerring emif_reset

DDR: emif_reset triggered successly

DDR: Error as SDRAM calibration failed

DDR: Initialization failed.

### ERROR ### Please RESET the board ###

===================================================================

 

 

0 Kudos
8 Replies
368 Views

Hi,

 

Thanks for the information, I noticed that you are using the old version of you Uboot. May I also know which Quartus version you are working on?

 

We recommend that you use the latest supported version of Uboot below with guidance on booting directly using the prebuilt GHRD image.

https://rocketboards.org/foswiki/Documentation/S10GSRDBootLinuxFromSDCard180

 

Prebuilt image

https://rocketboards.org/foswiki/Documentation/GSRDTagging

 

Creating the boot loader

https://rocketboards.org/foswiki/Documentation/BuildingBootloader

 

JChen576
Beginner
368 Views

Thank you @EberL_Intel​ ! I'm using Quartus Prime Design Software Version 18.1.0 Build 222 09/21/2018 SJ Pro Edition.

 

I tried two version 2019.04 and 2018.10

https://releases.rocketboards.org/release/2019.04/gsrd/s10_gsrd/ghrd_1sx280lu2f50e2vg_hps.jic.gz

https://releases.rocketboards.org/release/2018.10/gsrd/s10_gsrd/ghrd_1sx280lu2f50e2vg_hps.jic

both have the same issue.

 

Also tried BTS bts_ddr4.sof, and got lots "Detected Errors" ( 521 million erros in 27 seconds), do you know if that's normal?

 

The BTS i am using is from https://www.intel.com/content/www/us/en/programmable/products/boards_and_kits/dev-kits/altera/strati... "L-Tile Production"

 

 

Thanks,

Josh

368 Views

Hi Josh,

 

I recommend that you try to program the .sof using JTAG config scheme, rather than programming the jic via flash.(programming .sof via JTAG config section)

https://rocketboards.org/foswiki/Documentation/S10GSRDBootLinuxFromSDCard180

 

Based on my experience, the bts_ddr4.sof is not the .sof file that is use to program.

 

Regards.

 

JChen576
Beginner
368 Views

Thanks for your support. @EberL_Intel​ 

 

I tried to boot with JTAG with this sof, https://releases.rocketboards.org/release/2019.04/gsrd/s10_gsrd/ghrd_1sx280lu2f50e2vg_hps.sof, and same problem happens.

 

My BTS DDR4 test is following the instructions in "5. Board Test System " and "5.3.8. The DDR4 Tab" in 

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/ug/ug-intel-s10-soc-devl-ki...

 

Regards,

368 Views

Hi,

 

Just to confirm, following the guide here still leads to the following error?:

https://rocketboards.org/foswiki/Documentation/S10GSRDBootLinuxFromSDCard180

 

May I know how many board is affected?

 

When you mentioned "and got an extra HPS HILO DDR Card this week. 

We plug the DDR Card onto the evaluation board."

Are you describing that you are replacing/change the dev kit's default/original DDR? If so, may I know the part number you are using?

JChen576
Beginner
368 Views

Thanks for your reply. @EberL_Intel​ 

 

 I only have one Stratix 10 SX dev board. I follow the guide and still get the error.

The sof I try to boot with JTAG is https://releases.rocketboards.org/release/2019.04/gsrd/s10_gsrd/ghrd_1sx280lu2f50e2vg_hps.sof and the same problem happens.

 

There is no original HPS HILO DDR4 card in our kit box when it comes to us. 

I believe the reason is https://www.intel.com/content/dam/altera-www/global/en_US/support/boards-kits/stratix10/dcl-ddr4-hil...

So we got the separated HPS HILO DDR4 card with our local vendor's help and installed it on the mainboard. 

The HPS HILO DDR4 card is "MEM MODULE HILDCDDR44GA DDR4 HiLo Daughter Card". The label on the card is "ALTERA DDR4 X72 DAUGHTER CARD".

 

We are trying to contact the local vendor for another HPS HILO DDR4 card. But it needs weeks to get a new one.

 

At the same time, we run Board Test System. 

We see BTS shows many "Detected Errors" in DDR tab. 

Besides, the results of "FMCA", "FMCB".... are not all exactly the same as the user guide (Intel Stratix 10 SX soc Development Kit User Guide). 

For example, in the FMCB tab, it shows "PLL lock: Partially Locked" in our BTS test.

Our BTS test is following the instructions in "5. Board Test System ".

One of our question is the result of BTS can be relied on or not.

If the BTS is correct, does it mean we should change the dev kit with a pre-installed HPS HiLo DDR4 card from our local vendor?

 

Regards,

VWoll
Beginner
368 Views

Hi,

 

We are also having the exact same issue with our production boards. In some cases, our board does not boot.

 

This occurs on our own design, but we are observing the exact same problems as above. More specifically our device is:

 

FPGA: 1SX280HU3F50E2VG

We are compiling this design using Quartus Prime Version 19.4.0 Build 64 12/04/2019 SC Pro Edition

 

This error does not always appear to occur - if you leave it on for long enough, eventually it starts to work. But not always.

 

Some additional observations:

 

1) When this occurs, JTAG programming is also likely to fail. We do not know why.

 

2) The likelyhood of this happening increases if we spray the board with flux remover.

 

We get the following error messages:

 

U-Boot SPL 2017.09-00187-g70eb145123 (Apr 06 2020 - 19:11:55) MPU 1000000 kHz L3 main 400000 kHz Main VCO 2000000 kHz Per VCO 2000000 kHz EOSC1 125000 kHz HPS MMC 50000 kHz UART 100000 kHz DDR: Initializing Hard Memory Controller DDR: Triggerring emif_reset DDR: emif_reset triggered successly DDR: Triggerring emif_reset DDR: emif_reset triggered successly DDR: Triggerring emif_reset DDR: emif_reset triggered successly DDR: Error as SDRAM calibration failed DDR: Initialization failed. ### ERROR ### Please RESET the board ###

 

Or, we get this:

 

U-Boot SPL 2017.09-00187-g70eb145123 (Apr 06 2020 - 19:11:55) MPU 1000000 kHz L3 main 400000 kHz Main VCO 2000000 kHz Per VCO 2000000 kHz EOSC1 125000 kHz HPS MMC 50000 kHz UART 100000 kHz DDR: Initializing Hard Memory Controller DDR: Calibration success SDRAM: Initializing ECC 0x00000000 - 0x80000000 SDRAM-ECC: Initialized success with 1357 ms DDR: HMC init success DDR: 2048 MiB DDR: Running SDRAM size sanity check DDR: SDRAM size check passed! QSPI: Reference clock at 400000000 Hz Trying to boot from MMC1 "Synchronous Abort" handler, esr 0x96000210 ELR: ffe08fb0 LR: ffe09034 x 0: 0000000000000001 x 1: 00000000000003e8 x 2: 0000000000000020 x 3: 0000000000000015 x 4: 00000000ffe3d640 x 5: 0000000000000001 x 6: 0000000000000040 x 7: 00000000ffe3db00 x 8: 0000000000000200 x 9: 0000000000000080 x10: 0000000080000010 x11: 0000000080000014 x12: 0000000000000176 x13: 0000000000000454 x14: 00000000ffe3dd6c x15: 00000000ffe12a60 x16: 0000000000030a10 x17: e3fb9eb93f97afdf x18: 00000000ffe3de90 x19: 000000003fa00800 x20: 00000000ffe3d7a8 x21: 0000000000000010 x22: 00000000ffe3db00 x23: 0000000000000004 x24: 0000000000000400 x25: 000000000003a980 x26: 00000000000008a4 x27: 000000000000be80 x28: 0000000000000010 x29: 00000000ffe3d6a0   Resetting CPU ...   resetting ... Mailbox: Issuing mailbox cmd REBOOT_HPS

After which it fails again.

 

We're very interested in what you recommend, and how to fix this.

 

VWoll
Beginner
368 Views

By the way, we have observed this problem on 2 other production boards, but it is very inconsistent, and we do not have a way to reliably trigger this problem. Sometimes the issue lasts for a day or two, before clearing up on its own.

 

This seems to be similar to the issue reported here: https://github.com/kraj/meta-altera/issues/164

 

In that case, it looks like this may have to do with the SDM or CMF firmware. During compilation, using 19.4, we get the following warning:

 

Warning (19729): Current CMF data structure hash (0xA2C420AC) is older version than latest CMF data structure but still allowable.

This might be transition period. You should update your CMF to latest version with hash { 0x9603E739 } [Add operation to send JTAG ID to LSM]

 

Does this have some bearing on the issue?

Reply