Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21034 Discussions

Arria 10 HPS Bridge Lockup after Reset

TomCarpenter
New Contributor I
667 Views

I'm having an incredibly frustraiting issue with an Arria 10 HPS which is bordering on the bizarre.

Short story:

  •  The processor boots and runs fine on first run, all peripherals in the FPGA can be accessed fine.
  •  After issuing a cold reset, any attempt to access peripherals in the FPGA locks up the processor.
  • After the watchdog resets the processor, access to peripherals is fully functional again.

This is a highly repeatable issue - after any attempt to reset the processor from the FPGA, I can no longer access peripherals in the FPGA from the processor. After the watchdog resets it, all is fine again.

---

The HPS device is instantiated in Platform Designer as follows (there are other components but that makes no differencce at this point).

TomCarpenter_1-1720198802793.png

Currently the higher up the signals are assigned as:

  • `hps_warm_reset` is tied to 0 (not used)
  • `hps_cold_reset` is the signal I am asserting to trigger a reset of the processor
  • `hps_h2f_cold_reset` is fed out of the block and along with other sources is ultimately used to generate the `lwint_reset` and `proc_reset` signals which are synchronised to their corresponding clock domain (async assert, sync deassert).
  • `hps_axi_slave` is unused (terminated)
  • `hps_axi_master` is connected to various sources and runs on one clock domain (200MHz)
  • `h2f_lw_axi_master` is connected to some simple peripherals - e.g. a system ID as shown in the screenshot above. This is on a second clock domain (100MHz).
  • `hps_io` is connected to the `lwint_reset` and `proc_reset` signals so I can monitor whether the domains are active or held in reset.

There is actually a lot of other stuff in the design not shown (e.g. PCIe core) but that shouldn't have any impact.

 

---

On power up, the FPGA portion of the Arria 10 SX660 device is configured from an ASx4 source. The HPS is also reset using the external physical pins, and configured to boot from an EMMC device.

The physical HPS reset pins are released after the FPGA is configured. However it remains held in reset by the f2h_cold_reset_req signal in the FPGA until I am ready for the processor to boot.

Once released, a customised U-Boot SPL preloader is launched. The modification to the standard preloader beyond device_tree settings, is just to remove SDRAM configuration as there is no DDR attached and the HPS EMIF is disabled, along with disabling the data/instruction caches and the MMU (*).

The preloader boots through fine. It then launches a bare-metal image from the EMMC in on-chip RAM. There is no OS - I'm basically trying to use it as a replacement for a Nios processor from an older design.

The bare-metal image checks and waits until the FPGA AXI masters are released from reset, and then enables the HPS bridge. The same behaviour happens if I let the preloader enable the HPS bridge before the bare metal application is run, so there shouldn't be anything specifically wrong with the bridge configuration.

Once the bridges are enabled, I try to access a peripheral in the FPGA, such as the system ID.

During the first run after power-on, everything works fine. The bridges are enabled and I can read the system ID OK.

----

If I then issue a cold reset request from the FPGA (not the phsyical reset pins), the processor will reboot.

I see the preloader runs through fine, and my bare metal application is successfully launched.

The HPS bridges are enabled fine (both in ALT_RSTMGR_BRGMODRST_ADDR register, and in the ALT_SYSMGR_NOC_IDLESTAT_ADDR register show them as not in reset and not idle).

However as soon as the processor tries to read any register address on the FPGA side of the bridge (both lightweight and regular interfaces), the processor locks up and the debugger can no longer interact with it. (**)

 

Eventually after about 10 seconds, the watchdog resets the processor, it reboots, runs through the preloader, and launches my bare metal application. Everything works fine again - I can properly access peripherals in the FPGA.

 

Any thoughts what is wrong here? Or even any ideas where I could start looking?

 

There seems to be very little information about using this for anything other than booting Linux, so if there are any resources for bare-metal applications out there, it would be handy to point me to them also.

 

---

(*) If the caches aren't disabled, everything is incredibly unstable and there are lots of spurious data aborts, so I've made sure they are fully disabled by the preloader

(**) curiously the debugger is able to access peripherals in the FPGA after a reset, but as soon as the processor tries to access anything, it locks up.

Labels (1)
0 Kudos
6 Replies
JingyangTeh
Employee
550 Views

Hi


I am Jingyang and will be helping you out.

Let me try to reproduce the issue here.

First how do read and write using the processor and without the processor?

When R/W without using the processor do you mean you are using the JTAG debugger using system console?

When using processer are you running a code to print out those registers value?

Is the system hanging if you try to R/W in the uboot stage?


Regards

Jingyang, Teh



0 Kudos
JingyangTeh
Employee
495 Views

Hi


Any update on this case?


Regards

Jingyang, Teh


0 Kudos
TomCarpenter
New Contributor I
481 Views

Hi Jingyang,

 

Apologies, I was away last week.

By JTAG debugger, I'm referring to using a USB-Blaster and the debugger in ARM Development Studio (2022).

I'm not using U-Boot proper as it is too big to use without DDR RAM. Only the U-Boot SPL. For reference, this is the U-Boot repository I'm using: https://github.com/UARPGitHub/u-boot-socfpga - it's a slightly modified version of the v2023.10 branch, which has had a few minor tweaks to cope with there being no DDR RAM (basically changing the defconfig along with removing some of the code which initialised the DDR RAM).

The FPGA is configured from an EPCQL device using Active Serial.

 

---

 

After much playing around, there appear to be two seperate problems going on simultaneously which is why everything was completely confiusing.

 

The first was that for some reason occasionally Qsys/PlatformDesigner inexplicably added in random second reset controller with synchronous assert/deassert. This meant that the FPGA side of the HPS bridge was not being reset because the clock was no longer present so the synchronous assert didn't do anything. After regenerting from Qsys without changing anything the random reset controller disappeared leaving the intended async assert/sync deassert controller that was already there allowing the bridge to be reset properly on a cold processor reset.

 

This seems to be a wierd Qsys bug. I regenerated the code again after changing something completely unrelated (an Av-MM address) and again the random second reset controller appeared in the HDL. But then again disappeared after another regenerate and no changes. So who knows what that is about. I'll just have to keep checking the HDL after each generate to make sure it is not there. Weird.

 

----

 

I can only assume the second part of the problem might be some sort of caching issue but I could be wrong.

 

On a first boot after resetting the processor, running through U-Boot SPL, which then executes my bare-metal program, the processor after running through the various C initialisation routines tries to access the FPGA Manager registers to read GPI (0xffd03014), using a simple dereference "x = *((volatile unsigned int*)0xffd03014)".

This works fine, I get the value.

I then do this read again a few lines later, and again it works fine, I get the value. My software then runs perfectly fine.

 

In some instances, I have to call back to the start of the software without resetting the processor (a sort of manual warm reset) because I'm jumping between two bare-metal images (a recovery image and an application image which are both in on-chip RAM at the same time but different address regions).

If I jump to the entry point of my already loaded RAM image - e.g. `BX 0xFFE30040` - but not resetting the processor, it will then run the bare-metal program again from the start skipping the preloader (but no hardware was reset so this shouldn't matter). I was originally trying to do this using a warm reset but that was causing other issues, so now I'm trying to skip the reset all together.

 

After going through the various C library initialisation, it will happily do the first read from the GPI register perfectly fine.

But then on the second attempt to read the same register address, the processor locks up and the debugger is unable to communicate with it. I get the following from the console

 

interrupt
ERROR(TAD9-NAL30): 
! Unable to stop device Cortex-A9_0
! Cannot stop target.

 

 

If I repeat the process, but instead set a breakpoint just before the second attempt to read from the GPI register, I can use the Memory view in ARM Development Studio to read the GPI register, and it reads fine:

TomCarpenter_1-1721231212147.png

Then if I let the processor run again, it has no problems accessing the memory address itself and continues on running. But at this point it will then randomly lock up later in the program accessing memory in other registers (e.g. 0xC0000000 or 0xFF200000).

The trick of viewing the value of the register in the debugger before stepping over the processor line that accesses it doesn't work for accessing registers on the far side of the HPS bridge.

 

---

It's hard to explain because everything to do with the HPS on the Arria 10 seems to be so incredibly flakey.

I use a very similar approach on a Cyclone V device and don't recall having any of these issues, everything just works as you might expect.

0 Kudos
JingyangTeh
Employee
405 Views

Hi


After comparing the connection made in the screenshot you shared I found that the h2f reset input to the HPS is different from the reset to the other IPs.

I would suggest if they are connected to the same reset source.

The error you are seeing could be due to the HPS and the FPGA fabric are not resetting at the same time.


The reset output of the h2f from the hps is connected together with the clock reset for the other IPs.

You could try taking a look how it is connected in our GSRD.

https://www.rocketboards.org/foswiki/Documentation/Arria10SoCGSRD


Regards

Jingyang, Teh


0 Kudos
JingyangTeh
Employee
344 Views

Hi


Any update on this case?

Did you managed to solve the reset issue?


Regards

Jingyang, Teh


0 Kudos
JingyangTeh
Employee
308 Views

Hi


As we do not receive any response from you on the previous question/reply/answer that we have provided. Please login to ‘https://supporttickets.intel.com/s/?language=en_US’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support. The community users will be able to help you on your follow-up questions.


Regards

Jingyang, Teh


0 Kudos
Reply