Data Communication Time to high

MF28 · ‎05-25-2022

Hello!

I am currently using HPS and the FPGA together. I am communicating between them via the lightweight AXI Bridge.
On the FPGA I also have LXDE Linux running.
In order to test the maximum speed I can achieve, I toggle one bit of the address via C-Code.

This is what my question is about.

I toggle the LSB of the 32-Bit address (ctrl_hps_to_fpga register in the soc file) in a while(1) loop in the C-program which is executed in the console of LXDE.
The FPGA simply routs the signal to a GPIO pin which I am observing with a oscilloscope.
However, the maximum speed is only about 500kHz while the FPGA is running on 50MHz and the HPS CPU should also be way faster than this.

Where could possibly be my problem?
It works 100% like intended but it is way to slow in my opinion.
I am pretty certain that the C Code is correct and working fine so it must either be the FPGA or the Linux installation.
I attached my Platform Designer File.

EBERLAZARE_I_Intel · ‎05-29-2022

Hi,

I am looking into the issue, could your provide the screenshot of the performance difference that you faced regarding the FPGA vs HPS frequency?

MF28 · ‎06-01-2022

Hello!

I included the Qsys file that I am using again.
In the Qsys File I declared four Parallel IOs. Those are connected to the HPS lightweight AXI bridge via a memory bridge.

The so called "data_..." PIOs are simply holding data which is either set by the FPGA or the HPS.
The "ctrl_...." PIOs should handle the communication between the HPS and the FPGA.
This is handled like this (simple explanation):
HPS toggles the ctrl_hps_... signal
Meanwhile FPGA waits for a state change on this ctrl_hps_... signal
When the FPGA registers the state change by the HPS it reads data from the data PIO and sets its own data to the other data PIO
After this it toggles its own ctrl_fpga_... signal
The HPS also (after toggling its own PIO) waits for this toggle by the FPGA
When it registers the state change it again starts the same communication flow

It is somewhat like this:

This is the communication that is happening.
This is working totally like intended! So everything is working without any error.

Now to my problem:
The FPGA is running with 50MHz. As far as i know the HPS Processor is running with almost 1GHz. (Is this the case? I can not find any information about it)

The FPGA is currently only waiting for a state change and toggles its signal according to that. I can see that this is done correctly in the oscilloscope because it takes 20ns ( --> 50MHz) after the HPS toggles that the FPGA is toggling.

But the HPS is not working the way it should be. I am running the LXDE Linux OS on it. It is slightly faster when controlling it via UART instead of the Desktop GUI but the problem is both times occuring.
THE HPS CPU IS WAY TO SLOW.
When I am simply just toggling the signals (no other calculations done in the C Code that I am executing on Linux) I only reach a frequency of around 300kHz, when transmitting information and doing some normal C Code operations (bitshift, ...) the frequency drops to around 200kHz.

The FPGA is always toggling correctly after 20ns. However the HPS takes WAY longer. It takes about 3us.
I know that doing this with while(1) loops is not the most efficient way of waiting in the C Code. But the code is like

while(fpga has not toggled) {
wait...

}
//fpga has toggled

toggle hps signal

So it is really not that resource demanding.
If the HPS really runs with almost 1GHz why is it taking so long? Is the CPU just not capable of running this code faster?
Is there any other restriction?
Is there a way that is more efficient?

If the HPS really runs internally with 1GHz , does it really take almost 3000 clock cycles to check one 32Bit address and write to another 32Bit address?
It is not somehow stuck in the while loop or waiting for too long I checked this.

More examples with pictures from the oscilloscope:

The highest frequency i can achieve is at around 1.25MHz. But this is only when toggling and not doing anything else, which is not appliable for other projects.

1.25 MHz Toggle Signal

You can see that I achieve 1.25 MHz for the Signal Toggles (I routed them to a GPIO Pin thats why there are spikes besides of the oscilloscope quality of course)

FPGA toggles 20ns after the HPS Processor

It can clearly be seen that the FPGA is toggling its own signal (the blue one) 20ns after the HPS has toggled (yellow signal). According to my C Code the HPS should toggle its signal now as soon as it has detected that the FPGA toggled its own signal. But this takes WAY longer than expected. It takes about 400ns for the processor to do so. But the CPU clock is ~1GHz ??

The processor takes way to long

According to htop when running the code one of the two CPU cores is maxed out at 100% load.

HOWEVER when checking for the toggle and also do data manipulation (only bit shifting nothing complex) and setting/reading some 32Bit addresses the data rate drops. I transmitted a data frame and according to the CPU time it took 760us to transmit the 1024 values.

This it the transmission of the data frame.

Like before the blue signal (FPGA) only takes 20ns to register the toggle of the yellow signal (HPS). The HPS processor however takes "years" to do so. I do not calculate anything to complex. The CPU is only running at around 10% when executing this program.
When the CPU is really running at 900MHz to 1GHz i should be way faster than this.

Why is the CPU limiting the speed of the C Code so drastically? I also uploaded the C Code where I achieved 1.25MHz transmission speed. There is nothing to complex going on I think. Where is my problem?

Best regards

EBERLAZARE_I_Intel · ‎06-09-2022

Hi,

As in Linux, it is not in real time, and at the same time, the CPU usage is used for other things in the background.

So for real time, you could use a Nios II processor to see it in real time..

MF28 · ‎06-12-2022

Hello!
I of course know that Linux is not running in real time and that some resources are needed for things in the background and for the graphical overlay.
However for example the speed remains the same when I connect via UART to Linux and dont use the graphical interface and it is disconnected.

Still I am wondering why I can only achieve about 300kHz when the CPU itself should be running at 900MHz ?
This is feeling way to slow in my opinion.

Also I do not have a Nios II processor or any possibility to get one so this in no option.

EBERLAZARE_I_Intel · ‎06-19-2022

Hi,

I see, not necessarily use Nios II processor in your design.

Better yet, you could try for example, use signal tap etc, on Bare metal or U-boot rather than Linux, for example try to tap the signal when reading/writing in U-boot registers using for e.g. , the "md mw" command in U-boot.

EBERLAZARE_I_Intel · ‎06-23-2022

Hi,

The access latency from lw_h2f bridge: for write access, it is about ~360ns, for read access, it is about ~420ns.

You can verify the result using a bare-metal code.

You can also use signal tap to monitor the real bus transaction and get the exact result.

MF28 · ‎06-24-2022

Hello!
Thank you very much for those informations! These timings were very helpful, because they explain a lot about my timing.
I was able to optimize everything so far that the data rate is pretty much maxed out with Linux with 3.3 MByte/s .
Thanks for all the help!!!!

EBERLAZARE_I_Intel · ‎06-24-2022

Hi,

Glad your issue have been addressed, if you have further question, you can just open a new forum case.

Thank you for using Intel Forum Support. To strive for continuous improvement in our support, please take a few minutes to complete a survey if you happens to receive one. Have a nice day!