Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21602 Discussions

Arria 10: Remote Update may brick FPGA and Factory Fallback won't work

FabianL
Novice
2,392 Views

Hello,

 

we have observed some critical failures when doing tests with various potential error scenarios concerning a remote Update of the FPGA bitstream in the attached SPI Flash device.

 

We could repeatedly trigger cases, when the FPGA internal fallbeck mechanism to the factory load does not work. We do not use any bitstream encryption.

 

Test scenarios:

  1. Erased flash & partially programmed application load image --> Fallback mechanism works as expected
  2. Invalid application load image location, i.e. start of application load is shifted by1-10 Byte (Manually induced error scenario) --> The reprogramming sequence starts but never completes and no fallback to the factory load is performed. => The FPGA is completely unresponsive unless programmed via JTAG

 

It is obvious, that the 2nd scenario might be a more exotic error scenario, however we require a robust setup and have to make sure, that the FPGA remains accessible under any circumstances, so we need the Factory Fallback mechanism to work reliable! 

 

As a best guess I could assume it might be related to this Note in 1.3.1. Remote System Configuration Mode that the factory fallback mechanism won't work for Arria 10  FPGAs if the last 576 Bytes of the bitstream are corrupted.

Note: The fallback to the factory image does not work under the following conditions: If the last 576 bytes of an unencrypted application image bitstream are corrupted. 

Intel recommends that you examine the last 576 bytes of the unencrypted application image before triggering the application image configuration.

 

But I have noticed that the binary images of the FPGA bitstream vary in size. So there is no way to check explicit memory locations for these 576 Bytes. Is there any way to identify this section?

 

My Questions:

  1. Why is the factory configuration fallback mechanism not working in the above described scenario? The Factory load image is valid!
  2. What method does intel recommend to reliable make the factory fallback mechanism work?
  3. How can I examine/validate a FPGA bitstream in flash memory before executing it?

Thanks a lot for any help

Best regards

Fabian

 

Labels (1)
0 Kudos
16 Replies
FabianL
Novice
2,307 Views

Hello,

 

is there any advise for this topic?

 

Thanks

Fabian

0 Kudos
FvM
Honored Contributor II
2,301 Views

Hi,
did you try to enable configuration watchdog? Requires serving the watchdog in application design, of course.

Regards
Frank

0 Kudos
FabianL
Novice
2,242 Views

hi Frank,

 

Thanks for the hint. That actually helps to trigger the fallback to the Factory image.

 

However 2 questions remain:

  1. Any idea why the regular CRC check is not causing a factory fallback?
  2. Our application design is not actively serving the watchdog. We are using the Avalon IP "Remote Update Intel FPGA IP" Version 19.1.0. We do not set any of the Watchdog Registers in the application image, so I would have expected that the watchdog would trigger and cause a factory image fallback. But his is not happening. Instead the factory image fallback reliable works with an invalid image.

 

So please don't get my 2nd point wrong. This is exactly the behavior that I was asking for. But it is not what I expect from the datasheets and I would like to understand it before introducing this into production.

 

Thanks

best regards

Fabian

0 Kudos
lixy
Employee
2,191 Views

Hi Fabian,

 

1- For question 1, do you mean that you have seen CRC error, but the Factory image fallback didn't happen?

2- For question 2, do you meant that, in Application image user mode, you didn't assert the RU_nRSTIMER signal, but the Factory image fallback didn't happen?

lixy_0-1745924841234.png
lixy_1-1745924861431.png

 

Best Regards,

Xiaoyan

 

0 Kudos
FabianL
Novice
2,158 Views

Hi Xiaoyan,

 

  1. What I did is to place the application image in flash memory shifted by 1-10 Byte (Manually induced error scenario).
    • e.g. expected Application image addr: 0x01000020, actual start position in flash 0x01000018. 
    • So in the described case, I expect a CRC error, since the image is missing its first 8 Bytes. 
  2. Yes. in the Application Image I am not asserting the RU_nRSTIMER signal, but the Factory image fallback didn't happen

 

best regards

Fabian

0 Kudos
lixy
Employee
2,067 Views

Hi Fabian,


1- For the CRC_ERROR, I am still checking internally what the exact trigger condition for this CRC error. May I confirm how you make the change of application image start address? For example, did you set the start address at 0x00 to 0x1F in the flash as 0x01000020, but when generating the JIC file, you set the application image address as 0x01000018?


2- For the Watchdog question, may I know how you enabled the watchdog?

According to the 1.3.1.1. Remote System Upgrade State Machine of the IP user guide, "In the DTA mode, the watchdog timer is disabled by default. You cannot enable the watchdog timer in the initial or first application image loaded upon powering up the device".

When you generating the JIC file, if you set the boot page to the application image page (for example, Page_1) to allow the FPGA to boot directly from application image (DTA), then in this condition, the watchdog is actually not enabled when the first application image loaded.


Best Regards,

Xiaoyan


0 Kudos
FabianL
Novice
2,035 Views

Hi Xiaoyan

 

  1. I do the following:
    • set the start address at 0x00 to 0x1F in the flash as 0x01000020
    • Generate a .rpd file and load the binary manually at address 0x01000018 in the flash memory
    • I leave the factory & boot sectors of the flash memory untouched. This wouln't be the case if I directly load the .jic file using Intel programming tools
  2. We have this boot procedure:
    1. Boot into factory image (0x20 as boot address in flash boot sector 0x00 to 0x1F). We have certain HW which is sensible to boot up timing so we need this to guarantee an identical and reliable boot up procedure.
    2. Boot from factory load into application image
      1. Check for power up boot: Read RU_RECONFIG_TRIGGER_CONDITIONS register for power up state (0)
        • do not reconfigure if Bit 4,2,1,0 is set
      2. Set AnF bit: write "1" to RU_CONFIGURATION_MODE
      3. Set application image address RU_PAGE_SELECT
      4. Enable Watchdog Set RU_WATCHDOG_TIMEOUT & RU_WATCHDOG_ENABLE
      5. Reconfigure: write "1" to RU_RECONFIG
    3. In Application mode we only read the RU_RECONFIG_TRIGGER_CONDITIONS as status info
      • We do not write the RU_WATCHDOG_ENABLE nor RU_RESET_TIMER registers

 

The watchdog fix helps to trigger if between step 2 & 3 the application load is corrupt. So this scenario I would not assume to fall under the DTA mode and  would assume that this does not apply "In the DTA mode, the watchdog timer is disabled by default. You cannot enable the watchdog timer in the initial or first application image loaded upon powering up the device". 

 

However the application image we are loading is the first application image, but I'm not sure if the FPGA is able to distinguish between factory and application image based on the image flash address.

 

best regards

Fabian

0 Kudos
FabianL
Novice
1,873 Views

Hi Xiaoyan,

 

are there any news on this topic?

 

best regards

Fabian

0 Kudos
lixy
Employee
1,818 Views

Hi Fabian,


  1. First, can you ensure that in normal scenario, the factory/application images are configured successfully? Eg. It can load to factory/application images as expected when we set the RU_PAGE_SELECT same as the start address when generating the .jic file.
  2. Besides, our internal team have tested Remote update on A10 SX SoC dev kit via AVMM interface, where we set some of the RU register as shown below. Maybe you can refer to our setting as reference or can we have the customer parameter setting for us to replicate it? https://www.intel.com/content/www/us/en/docs/programmable/683695/19-4-19-1-0/register-map-12579.html

 

 

Register Bame

Address offset

Write value

RU_WATCHDOG_ENABLE  

0x2

1

RU_WATCHDOG_TIMEOUT

0x1

F

RU_PAGE_SELECT

0x3

01E28000  (refer to the start address when they generate the .jic file)

RU_CTL_NUPDT

0x7

1

 

 

 


Best Regards,

Xiaoyan


0 Kudos
FabianL
Novice
1,769 Views

Hi Xiaoyan,

 

  1. Yes I verified the RU_PAGE_SELECT and it is set correctly.
  2. Our settings are close to yours, but slightly different. We use these settings in the factory load which we load after Power-up

Register Bame

Address offset

Write value

 

RU_WATCHDOG_ENABLE  

0x2

1

Only written in Factory Load (Application load does not write this register)

RU_WATCHDOG_TIMEOUT

0x1

FFF

 

RU_PAGE_SELECT

0x3

01E28000  (refer to the start address when they generate the .jic file)

 

RU_CTL_NUPDT

0x7

we do not write this register

 RU_CONFIGURATION_MODE

 0x4

 1

 

RU_RECONFIG

 0x6

1

 to trigger reconfiguration

 

best regards

Fabian

0 Kudos
FabianL
Novice
1,274 Views

Hi Xiaoyan, 

 

are there any news on this topic?

 

best regards

Fabian

0 Kudos
FabianL
Novice
854 Views

Hello,

 

it's been already 2 months without any news to this topic.

The fact that the factory fallback mechanism is not reliably and is not fully understood is still a major problem for us.

 

Are there any news from your side?

 

best regards

Fabian

0 Kudos
Farabi
Employee
364 Views

Hello Fabian,


Sorry to keep you waiting. Few steps to check for debug:


Could you please check below?

1- confirm factory image is correctly programmed at boot address.

2- ensure application image is programmed at known valid address.

3- verify the .jic file start address matches the RU_PAGE_SELECT value.


Application Image validation.

1- Confirm application image can be loaded successfully when normal condition

2- Confirm FPGA able to enter user mode (CONF_DONE-> HIGH)

3- Confirm watchdog is not enabled - no writes to RU_RESET_TIMER


Factory Image validation.

1- Power up and confirm it boot into factory image.

2- read RU_RECONFIG_TRIGGER_CONDITIONS to confirm power up state (Bit0 = 0)

3- Factory image set to below parameters:

a. RU_PAGE_SELECT = application image address

b. RU_CONFIGURATION_MODE = 1 (Application Mode)

c. RU_WATCHDOG_TIMEOUT = set the correct value

d. RU_WATCHDOC_ENABLE = 1

4- trigger reconfiguration by writing 1 to RU_RECONFIG


Test fallback mechanism.

1-Manually corrupt the application image (erase some part or misalign)

2- Check if FPGA fail to enter user mode

3- check if FPGA fallback to factory image

4- read the RU_RECONFIG_TRIGGER_CONDITIONS reflecting the correct cause(Bit1 = watchdog timeout)


If you have followed all above and still you observed the fallback mechanism doesn't kick in. We will need your design for escalation.


regards,

Farabi


0 Kudos
Farabi
Employee
291 Views

Hello,


Do you have further question?


regards,

Farabi


0 Kudos
FabianL
Novice
248 Views

Hello Farabi,

 

thanks for the reply. As mentioned before, if I enable the watchdog as described in your post, the fallback mechanism kicks in.

 

But this leaves me with 2 critical questions:

  1. When we do not use the watchdog (RU_WATCHDOC_ENABLE = 0). The factory fallback does not work when the application image is misaligned. Why does the factory fallback not happen in this case? I expect a misaligned application load to trigger a CRC error.
  2. Our application design is not actively serving the watchdog. We are using the Avalon IP "Remote Update Intel FPGA IP" Version 19.1.0. We do not set any of the Watchdog Registers in the application image, so I would have expected that the watchdog would trigger and cause a factory image fallback. But his is not happening. The documentation indicates, that a watchdog timeout may occur after entering application user mode. Hence I would expect that an application image, that does not service the watchdog would trigger a factory fallback. Why is this not the case?

 

FabianL_0-1757329160669.png

 

Having these questions open gives the whole fallback mechanism an unreliable touch. So I would be very thankful if this behavior could be clarified.

 

Thanks.

 

kind regards

Fabian

0 Kudos
FabianL
Novice
125 Views

Hello Farabi,

 

is there any news concerning my 2 questions?

 

best regards

0 Kudos
Reply