Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21611 Discussions

Arria 10: Remote Update may brick FPGA and Factory Fallback won't work

FabianL
Novice
2,682 Views

Hello,

 

we have observed some critical failures when doing tests with various potential error scenarios concerning a remote Update of the FPGA bitstream in the attached SPI Flash device.

 

We could repeatedly trigger cases, when the FPGA internal fallbeck mechanism to the factory load does not work. We do not use any bitstream encryption.

 

Test scenarios:

  1. Erased flash & partially programmed application load image --> Fallback mechanism works as expected
  2. Invalid application load image location, i.e. start of application load is shifted by1-10 Byte (Manually induced error scenario) --> The reprogramming sequence starts but never completes and no fallback to the factory load is performed. => The FPGA is completely unresponsive unless programmed via JTAG

 

It is obvious, that the 2nd scenario might be a more exotic error scenario, however we require a robust setup and have to make sure, that the FPGA remains accessible under any circumstances, so we need the Factory Fallback mechanism to work reliable! 

 

As a best guess I could assume it might be related to this Note in 1.3.1. Remote System Configuration Mode that the factory fallback mechanism won't work for Arria 10  FPGAs if the last 576 Bytes of the bitstream are corrupted.

Note: The fallback to the factory image does not work under the following conditions: If the last 576 bytes of an unencrypted application image bitstream are corrupted. 

Intel recommends that you examine the last 576 bytes of the unencrypted application image before triggering the application image configuration.

 

But I have noticed that the binary images of the FPGA bitstream vary in size. So there is no way to check explicit memory locations for these 576 Bytes. Is there any way to identify this section?

 

My Questions:

  1. Why is the factory configuration fallback mechanism not working in the above described scenario? The Factory load image is valid!
  2. What method does intel recommend to reliable make the factory fallback mechanism work?
  3. How can I examine/validate a FPGA bitstream in flash memory before executing it?

Thanks a lot for any help

Best regards

Fabian

 

Labels (1)
0 Kudos
20 Replies
FabianL
Novice
2,597 Views

Hello,

 

is there any advise for this topic?

 

Thanks

Fabian

0 Kudos
FvM
Honored Contributor II
2,591 Views

Hi,
did you try to enable configuration watchdog? Requires serving the watchdog in application design, of course.

Regards
Frank

0 Kudos
FabianL
Novice
2,532 Views

hi Frank,

 

Thanks for the hint. That actually helps to trigger the fallback to the Factory image.

 

However 2 questions remain:

  1. Any idea why the regular CRC check is not causing a factory fallback?
  2. Our application design is not actively serving the watchdog. We are using the Avalon IP "Remote Update Intel FPGA IP" Version 19.1.0. We do not set any of the Watchdog Registers in the application image, so I would have expected that the watchdog would trigger and cause a factory image fallback. But his is not happening. Instead the factory image fallback reliable works with an invalid image.

 

So please don't get my 2nd point wrong. This is exactly the behavior that I was asking for. But it is not what I expect from the datasheets and I would like to understand it before introducing this into production.

 

Thanks

best regards

Fabian

0 Kudos
lixy
Employee
2,481 Views

Hi Fabian,

 

1- For question 1, do you mean that you have seen CRC error, but the Factory image fallback didn't happen?

2- For question 2, do you meant that, in Application image user mode, you didn't assert the RU_nRSTIMER signal, but the Factory image fallback didn't happen?

lixy_0-1745924841234.png
lixy_1-1745924861431.png

 

Best Regards,

Xiaoyan

 

0 Kudos
FabianL
Novice
2,448 Views

Hi Xiaoyan,

 

  1. What I did is to place the application image in flash memory shifted by 1-10 Byte (Manually induced error scenario).
    • e.g. expected Application image addr: 0x01000020, actual start position in flash 0x01000018. 
    • So in the described case, I expect a CRC error, since the image is missing its first 8 Bytes. 
  2. Yes. in the Application Image I am not asserting the RU_nRSTIMER signal, but the Factory image fallback didn't happen

 

best regards

Fabian

0 Kudos
lixy
Employee
2,357 Views

Hi Fabian,


1- For the CRC_ERROR, I am still checking internally what the exact trigger condition for this CRC error. May I confirm how you make the change of application image start address? For example, did you set the start address at 0x00 to 0x1F in the flash as 0x01000020, but when generating the JIC file, you set the application image address as 0x01000018?


2- For the Watchdog question, may I know how you enabled the watchdog?

According to the 1.3.1.1. Remote System Upgrade State Machine of the IP user guide, "In the DTA mode, the watchdog timer is disabled by default. You cannot enable the watchdog timer in the initial or first application image loaded upon powering up the device".

When you generating the JIC file, if you set the boot page to the application image page (for example, Page_1) to allow the FPGA to boot directly from application image (DTA), then in this condition, the watchdog is actually not enabled when the first application image loaded.


Best Regards,

Xiaoyan


0 Kudos
FabianL
Novice
2,325 Views

Hi Xiaoyan

 

  1. I do the following:
    • set the start address at 0x00 to 0x1F in the flash as 0x01000020
    • Generate a .rpd file and load the binary manually at address 0x01000018 in the flash memory
    • I leave the factory & boot sectors of the flash memory untouched. This wouln't be the case if I directly load the .jic file using Intel programming tools
  2. We have this boot procedure:
    1. Boot into factory image (0x20 as boot address in flash boot sector 0x00 to 0x1F). We have certain HW which is sensible to boot up timing so we need this to guarantee an identical and reliable boot up procedure.
    2. Boot from factory load into application image
      1. Check for power up boot: Read RU_RECONFIG_TRIGGER_CONDITIONS register for power up state (0)
        • do not reconfigure if Bit 4,2,1,0 is set
      2. Set AnF bit: write "1" to RU_CONFIGURATION_MODE
      3. Set application image address RU_PAGE_SELECT
      4. Enable Watchdog Set RU_WATCHDOG_TIMEOUT & RU_WATCHDOG_ENABLE
      5. Reconfigure: write "1" to RU_RECONFIG
    3. In Application mode we only read the RU_RECONFIG_TRIGGER_CONDITIONS as status info
      • We do not write the RU_WATCHDOG_ENABLE nor RU_RESET_TIMER registers

 

The watchdog fix helps to trigger if between step 2 & 3 the application load is corrupt. So this scenario I would not assume to fall under the DTA mode and  would assume that this does not apply "In the DTA mode, the watchdog timer is disabled by default. You cannot enable the watchdog timer in the initial or first application image loaded upon powering up the device". 

 

However the application image we are loading is the first application image, but I'm not sure if the FPGA is able to distinguish between factory and application image based on the image flash address.

 

best regards

Fabian

0 Kudos
FabianL
Novice
2,163 Views

Hi Xiaoyan,

 

are there any news on this topic?

 

best regards

Fabian

0 Kudos
lixy
Employee
2,108 Views

Hi Fabian,


  1. First, can you ensure that in normal scenario, the factory/application images are configured successfully? Eg. It can load to factory/application images as expected when we set the RU_PAGE_SELECT same as the start address when generating the .jic file.
  2. Besides, our internal team have tested Remote update on A10 SX SoC dev kit via AVMM interface, where we set some of the RU register as shown below. Maybe you can refer to our setting as reference or can we have the customer parameter setting for us to replicate it? https://www.intel.com/content/www/us/en/docs/programmable/683695/19-4-19-1-0/register-map-12579.html

 

 

Register Bame

Address offset

Write value

RU_WATCHDOG_ENABLE  

0x2

1

RU_WATCHDOG_TIMEOUT

0x1

F

RU_PAGE_SELECT

0x3

01E28000  (refer to the start address when they generate the .jic file)

RU_CTL_NUPDT

0x7

1

 

 

 


Best Regards,

Xiaoyan


0 Kudos
FabianL
Novice
2,059 Views

Hi Xiaoyan,

 

  1. Yes I verified the RU_PAGE_SELECT and it is set correctly.
  2. Our settings are close to yours, but slightly different. We use these settings in the factory load which we load after Power-up

Register Bame

Address offset

Write value

 

RU_WATCHDOG_ENABLE  

0x2

1

Only written in Factory Load (Application load does not write this register)

RU_WATCHDOG_TIMEOUT

0x1

FFF

 

RU_PAGE_SELECT

0x3

01E28000  (refer to the start address when they generate the .jic file)

 

RU_CTL_NUPDT

0x7

we do not write this register

 RU_CONFIGURATION_MODE

 0x4

 1

 

RU_RECONFIG

 0x6

1

 to trigger reconfiguration

 

best regards

Fabian

0 Kudos
FabianL
Novice
1,564 Views

Hi Xiaoyan, 

 

are there any news on this topic?

 

best regards

Fabian

0 Kudos
FabianL
Novice
1,144 Views

Hello,

 

it's been already 2 months without any news to this topic.

The fact that the factory fallback mechanism is not reliably and is not fully understood is still a major problem for us.

 

Are there any news from your side?

 

best regards

Fabian

0 Kudos
Farabi
Employee
654 Views

Hello Fabian,


Sorry to keep you waiting. Few steps to check for debug:


Could you please check below?

1- confirm factory image is correctly programmed at boot address.

2- ensure application image is programmed at known valid address.

3- verify the .jic file start address matches the RU_PAGE_SELECT value.


Application Image validation.

1- Confirm application image can be loaded successfully when normal condition

2- Confirm FPGA able to enter user mode (CONF_DONE-> HIGH)

3- Confirm watchdog is not enabled - no writes to RU_RESET_TIMER


Factory Image validation.

1- Power up and confirm it boot into factory image.

2- read RU_RECONFIG_TRIGGER_CONDITIONS to confirm power up state (Bit0 = 0)

3- Factory image set to below parameters:

a. RU_PAGE_SELECT = application image address

b. RU_CONFIGURATION_MODE = 1 (Application Mode)

c. RU_WATCHDOG_TIMEOUT = set the correct value

d. RU_WATCHDOC_ENABLE = 1

4- trigger reconfiguration by writing 1 to RU_RECONFIG


Test fallback mechanism.

1-Manually corrupt the application image (erase some part or misalign)

2- Check if FPGA fail to enter user mode

3- check if FPGA fallback to factory image

4- read the RU_RECONFIG_TRIGGER_CONDITIONS reflecting the correct cause(Bit1 = watchdog timeout)


If you have followed all above and still you observed the fallback mechanism doesn't kick in. We will need your design for escalation.


regards,

Farabi


0 Kudos
Farabi
Employee
581 Views

Hello,


Do you have further question?


regards,

Farabi


0 Kudos
FabianL
Novice
538 Views

Hello Farabi,

 

thanks for the reply. As mentioned before, if I enable the watchdog as described in your post, the fallback mechanism kicks in.

 

But this leaves me with 2 critical questions:

  1. When we do not use the watchdog (RU_WATCHDOC_ENABLE = 0). The factory fallback does not work when the application image is misaligned. Why does the factory fallback not happen in this case? I expect a misaligned application load to trigger a CRC error.
  2. Our application design is not actively serving the watchdog. We are using the Avalon IP "Remote Update Intel FPGA IP" Version 19.1.0. We do not set any of the Watchdog Registers in the application image, so I would have expected that the watchdog would trigger and cause a factory image fallback. But his is not happening. The documentation indicates, that a watchdog timeout may occur after entering application user mode. Hence I would expect that an application image, that does not service the watchdog would trigger a factory fallback. Why is this not the case?

 

FabianL_0-1757329160669.png

 

Having these questions open gives the whole fallback mechanism an unreliable touch. So I would be very thankful if this behavior could be clarified.

 

Thanks.

 

kind regards

Fabian

0 Kudos
FabianL
Novice
415 Views

Hello Farabi,

 

is there any news concerning my 2 questions?

 

best regards

0 Kudos
Farabi
Employee
241 Views

Hello Fabian,


Sorry to take sometime to answer your question. I just transfer this case to myself so I can monitor this case individually.


1-Why factory fallback doesn't occur when application image is misaligned and watchdog disabled

<ANS> Misaligned image might not trigger CRC error, because FPGA may not recognize the bitstream header correctly. If the image is corrupted in such a way that will prevent configuration from starting, the CRC logic never got invoked, resulting the fallback mechanism does not activated. Please check your bitstream if the last 576 bytes are corrupted or not. If corrupted, this will prevent fallback to factory image as well. During this failure, can you check the nSTATUS signal?


2- Why doesn't the watchdog trigger fallback when not served in the application image?

<ANS> From Remote Update IP document: watchdog timer only starts counting after FPGA enters usermode. If the application image is invalid- FPGA never reach usermode, watchdog is never activated. Watchdog must be explicitly enabled via RU_WATCHDOG_ENABLE in the factory image. If this is not done, the watchdog is by default- disabled.


In your setup:

1- You correctly enabled the watchdog in the factory image.

2- The application image does not serve the watchdog (no writes to RU_RESET_TIMER) which should trigger fallback only if the image enters user mode.

3- If the image is misaligned and fails to enter user mode, watchdog never starts-> fallback will never occurs.


regards,

Farabi



0 Kudos
FabianL
Novice
162 Views

Hello Farabi,

Thanks very much for the reply.

 

  1. Thanks for the explanation about the missing CRC error. So I guess the only way to safely deal with this is to enable the watchdog. That is fine for us.
  2. I'm sorry, but I don't fully under stand your answer. We have to scenarios (see also here
    1. Misaligned Image:
      1. Enable Watchdog in Factory Image
      2. trigger reconfiguration (write 1 to RU_RECONFIGURATION_MODE & RU_RECONFIG)
      3. Reconfiguration fails due to misaligned image --> Watchdog triggers 
      4. Fallback to factory mode
      5. ==> This case is working as expected. Good Case!
    2. Aligned valid Image
      1. Enable Watchdog in Factory Image
      2. trigger reconfiguration (write 1 to RU_RECONFIGURATION_MODE & RU_RECONFIG)
      3. Application Image starts. Application Image does not serve or actively disable the watchdog!
      4. Since the application image does not serve the watchdog, I would expect a factory fallback due to watchdog triggering. NOTE: We do not talk about further reconfiguration triggered from within application image. We only do reconfiguration from within the factory load.
      5. ==> This is not happening. And I don't understand why. Or is the watchdog automatically disabled once a valid application image is loaded?

 

best regards

Fabian

0 Kudos
Farabi
Employee
226 Views

You should see nSTATUS like below:

config_timing_diagram.png

 

regards,

Farabi

0 Kudos
Farabi
Employee
56 Views

Hello,


Thanks for your update. I am not sure what happened to your design, but few things to check below:


1- Can you make sure : RU_WATCHDOG_ENABLE = 1 is written before triggering reconfiguration, and make sure the setting is persist during configuration. Some reconfig reset settings.


2- Please make sure the watchdog timeout not too. eg. Dont set RU_WATCHDOG_TIMEOUT = 0xFFF (this is too long)


3- Please confirm the application image does not contain any logic that somehow triggers/modify RU_RESET_TIMER register.


regards,

Farabi


0 Kudos
Reply