Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21602 Discussions

FPGA-to-HPS bridge, too slow?

Altera_Forum
Honored Contributor II
5,168 Views

Hi, my co-worker is using the Cyclone V with the embedded ARM cores. I often stop by his cube and ask him his opinion of the SoC. His biggest complaint is the FPGA-to-HPS bridge which he says is a big bottleneck. He admits that there may be some other knobs available to increase the throughput and make data transfers more deterministic but he hasn't found the knob. I thought to post this question and see the responses. I assume that when the Stratix 10 is released there wouldn't be such a bottleneck but I may be wrong. I'm still undecided about the embedded cores and am curious what applications it is best suited for. Can these cores, either on the Cyclone V or Stratix 10, compete with the multi-core DSPs from TI such as the TMS320C6678. All of my designs I have used a high-performance FPGA and a high-performance DSP. I'm waiting to see how things turn out before putting an SoC on my next design. I admit it is very exciting! Please let me know you comments. Thanks, joe

0 Kudos
18 Replies
Altera_Forum
Honored Contributor II
2,714 Views

could you tell a little bit more about too slow means? maybe there is another configuration that causing you in slower speed?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi Joe, what is your application looks like?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi and thanks for responding to my post. The FPGA will interface with a Analog Devices TigerSharc through its Link Ports. The FPGA side implements the Link Port Interface and then passes the received data to the HPS side. The time it takes to pass data from the FPGA side to the HPS side is very long and cannot keep up with the Link Port data rate.  

 

Is anyone else experiencing this type of behavior over the HPS bridge? 

 

Thanks
0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi, how did you confirm if the data congestion is came from the side that is entering or exit the bridge? how did you measure the simulated data flow?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi. I'm using mSGDMA in Cyclone V FPGA to read and write directly to shared memory. In between I plan to do data processing using Avalon-ST Interfaces. With this solution I now can process 350 MB/s. Actually CPU is only used to control DMAs. As comparison a simple memcpy requires 100% of CPU for less bandwidth. If You plan to use floating points in Your DSP applications then Arria10 might be more suitable because it has FP in hardware.

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

 

--- Quote Start ---  

Hi. I'm using mSGDMA in Cyclone V FPGA to read and write directly to shared memory. In between I plan to do data processing using Avalon-ST Interfaces. With this solution I now can process 350 MB/s. Actually CPU is only used to control DMAs. As comparison a simple memcpy requires 100% of CPU for less bandwidth. If You plan to use floating points in Your DSP applications then Arria10 might be more suitable because it has FP in hardware. 

--- Quote End ---  

 

 

your application also utilizing the bridges?
0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi, I use only lightweight bridge for configuring DMAs per Avalon-MM interface. I transfer data from FPGA directly to DDRAM without going to CPU. The CPU only waits for transfer to complete. 

If You add "Cyclone V Hard processor system" You can define that You want to have "FPGA-to-HPS SD-RAM interface" available so You get port f2h_sdram0_data in hps component. Then I connect DMAs mm_write/mm_read ports to f2h_sdram0_data.
0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

did you face any data congestion issue with the light weight bridge?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

No, but my traffic on lw-bridge is low: writing about four hundred DMA descriptors per second, each 20 bytes or so. Is Your congestion permanent? or is it a small lag?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

There is some small lag during peak transfer, but those mainly is due to the post data manipulation on the Arm core area, as my manipulation task priority is quite high.

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Do You have Linux running or do You have bare-metal? According to tech docu the LW-bridge should have less latency than normal bridges. Did You tried both?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

I'm using a Linux OS only ,in order to have a priority with the thread running for data processing.

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

OK, I also use Linux. I've not measured latency yet. My test runs for 10 seconds and I get very reproducable bandwidth for each run. How do You write to the bridge? Is it MM Interface and You do a mmap in Linux? Then read/write to virtual adress? Do You use interrupts?

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

oh. i am using a map in Linux, then access via the virtual address. Using interrupts as well

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Do You use altera macros for read/write to this virtual adresses? (alt_read_word, alt_write_word) This macros have casts to volatile pointers which tells compiler that memory caching should be skipped. This is not the same as just dereferencing standard pointers.  

My memory map call looks like this, might be flags are different:  

void * vbase = mmap(NULL,map_size, (PROT_READ | PROT_WRITE), (MAP_SHARED),mem_fd,(unsigned int)adress & ~(unsigned int)map_mask);
0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

this is one of the area i never thought before, i think i need to take a look with my project files.

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

Hi, to answer the question. We had a FOR loop running on the HPS and it toggled a bit in a register on the FPGA side. There was no delay in the FOR loop, the executed as fast as it could. We tide the register bit to a test point where we could look at it on an oscope. We measured a 50% duty cycle square wave but the frequency was something near 16Mhz. The HPS clock was 100Mhz. Remember, that I did not work on this design, I'm only a by-stander relaying information. My main goal with this post is to get feedback from the community with their experiences with the bridge and how fast can one write to a register on the FPGA side from the HPS side. Hope this helps and if anyone has comments, please send them.

0 Kudos
Altera_Forum
Honored Contributor II
2,714 Views

100 Mhz? so slow? how about the clock of the memory that you are using?

0 Kudos
Reply