Re: Data Transfer from FPGA-to-HPS

Altera_Forum · ‎06-12-2018

Hi,

I am very new to the Altera FPGA suite (and FPGAs in general), and was looking to get some advice on how to transfer data from the FPGA to the HPS. I've done fair amount of research, but I wasn't sure which method would be the best way to do it. The amount of data will be set to around 37Kilobits. So would the FPGA to HPS bridge be the best? Or limiting the amount of RAM the HPS uses then writing to the other section, then having the HPS read from that memory space? The data would becoming practically continuously. So, I was thinking of a double buffer scenario where the FPGA would write to one buffer, the HPS would read it while the FPGA was writing to the other, then the HPS would read from that one while the FPGA was writing to the other and so on and so forth. I appreciate any advice! I'm using the DE0-Nano-SOC with Cyclone V 5CSEMA4U23C6 Revision C. Thanks!

EDIT: Also I was imagining the FPGA could send an interrupt to the HPS whenever it had finished writing a chunk of data, telling it that its time to read. For the most part I have the HPS side figured out (i'm more of a software guy) Its the FPGA side I'm completely lost.

EDIT2: In terms of speed it only needs to be capable of around 80MBits per second since thats as fast ill be receiving data into the FPGA.

Altera_Forum · ‎06-13-2018

There are basically two options.

First, you could hook up the FPGA to HPS bridge and carve out some DDR3 RAM to be used for storage purposes. The FPGA to HPS bridge has the master on the FPGA side and the slave in the HPS side. So, the FPGA side is the one issuing write requests, those requests have to go into some virtual memory space, and it might as well be RAM (unless it is a HPS-side peripheral.) You could write into RAM through the FPGA->SDRAM bridge, but that is not guaranteed to maintain cache coherency (so, as I understand, HPS-side memory reads may result in old data, even after the transfer, if it happens to be in the cache.) FPGA->HPS is slower than FPGA->SDRAM, but should be plenty fast enough for your task.

Alternately, you could go the other way around. In Qsys/Platform Designer, instantiate an altera_hps, an altera_avalon_onchip_memory2, and an altera_avalon_mm_bridge. Hook them up like so:

altera_hps.h2f_axi_master -> altera_avalon_onchip_memory2.s1

altera_avalon_mm_bridge.m0 -> altera_avalon_onchip_memory2.s1

Export altera_avalon_mm_bridge.s0.

Now you have some on-chip RAM (the amount configurable through altera_avalon_onchip_memory2's properties) that you can write on FPGA side through the pins you just exported and read on HPS side starting at the address 0xC0000000. The downside is that you don't have a whole lot of RAM to work with (your entire chip has something like 300 kbytes worth of M10Ks) so you need to be frugal.

P.S. I'm basically learning this as I go along myself, so take this with a grain of salt.

Altera_Forum · ‎06-13-2018

--- Quote Start ---

There are basically two options.

First, you could hook up the FPGA to HPS bridge and carve out some DDR3 RAM to be used for storage purposes. The FPGA to HPS bridge has the master on the FPGA side and the slave in the HPS side. So, the FPGA side is the one issuing write requests, those requests have to go into some virtual memory space, and it might as well be RAM (unless it is a HPS-side peripheral.) You could write into RAM through the FPGA->SDRAM bridge, but that is not guaranteed to maintain cache coherency (so, as I understand, HPS-side memory reads may result in old data, even after the transfer, if it happens to be in the cache.) FPGA->HPS is slower than FPGA->SDRAM, but should be plenty fast enough for your task.

Alternately, you could go the other way around. In Qsys/Platform Designer, instantiate an altera_hps, an altera_avalon_onchip_memory2, and an altera_avalon_mm_bridge. Hook them up like so:

altera_hps.h2f_axi_master -> altera_avalon_onchip_memory2.s1

altera_avalon_mm_bridge.m0 -> altera_avalon_onchip_memory2.s1

Export altera_avalon_mm_bridge.s0.

Now you have some on-chip RAM (the amount configurable through altera_avalon_onchip_memory2's properties) that you can write on FPGA side through the pins you just exported and read on HPS side starting at the address 0xC0000000. The downside is that you don't have a whole lot of RAM to work with (your entire chip has something like 300 kbytes worth of M10Ks) so you need to be frugal.

P.S. I'm basically learning this as I go along myself, so take this with a grain of salt.

--- Quote End ---

First off thanks!

Now going off what you said I think the F2H bridge is the best option for me so I don't have to worry about the cache coherency issues of the SDRAM bridge. Speed I don't believe should be an issue since I only need 80Mbitps transfer speeds, and I saw in another forum post that the bridge has 133 MHz frequency and I know it can support 128-bit width so that would give 2.12Gigabytes per second? In the case that I use the F2H bridge to setup DDR3 writes how would that look? I have found plenty of examples of H2F and the H2F lw, but none of the F2H except for the Altera one, which I can't seem to get to work on my DE0-Nano-SoC since it was written for the Cyclone V dev board.

Thanks again for all your help!

Altera_Forum · ‎06-13-2018

--- Quote Start ---

First off thanks!

Now going off what you said I think the F2H bridge is the best option for me so I don't have to worry about the cache coherency issues of the SDRAM bridge. Speed I don't believe should be an issue since I only need 80Mbitps transfer speeds, and I saw in another forum post that the bridge has 133 MHz frequency and I know it can support 128-bit width so that would give 2.12Gigabytes per second? In the case that I use the F2H bridge to setup DDR3 writes how would that look? I have found plenty of examples of H2F and the H2F lw, but none of the F2H except for the Altera one, which I can't seem to get to work on my DE0-Nano-SoC since it was written for the Cyclone V dev board.

Thanks again for all your help!

--- Quote End ---

Re: bandwidth, 2 GB/s would be only achievable with burst transfers. When writing one word at a time, it may be less (possibly much less). Not sure.

As in my first example, instantiate an altera_hps and an altera_avalon_mm_bridge. Connect bridge.m0 -> hps.f2h_axi_slave. Export bridge.s0. Now you have access to signals:

mm_bridge_0_s0_waitrequest

mm_bridge_0_s0_address

mm_bridge_0_s0_writedata

mm_bridge_0_s0_write

When you have data to send, essentially, put 1 into 'writedata', put your data into 'write', and put the physical address of RAM where you want to write into 'address', and wait for 'waitrequest' to go down. See https://www.altera.com/content/dam/altera-www/global/en_us/pdfs/literature/manual/mnl_avalon_spec.pdf (it's rather complicated).

Altera_Forum · ‎06-13-2018

--- Quote Start ---

Re: bandwidth, 2 GB/s would be only achievable with burst transfers. When writing one word at a time, it may be less (possibly much less). Not sure.

As in my first example, instantiate an altera_hps and an altera_avalon_mm_bridge. Connect bridge.m0 -> hps.f2h_axi_slave. Export bridge.s0. Now you have access to signals:

mm_bridge_0_s0_waitrequest

mm_bridge_0_s0_address

mm_bridge_0_s0_writedata

mm_bridge_0_s0_write

When you have data, you wait till 'waitrequest' is 0, put 1 into 'writedata', put your data into 'write', and put the physical address of RAM where you want to write into 'address'.

--- Quote End ---

Ok so I got that, now the channel width is only 128 Bits, would I write 128 Bits to the mm_bridge_0_s0_write multiple times advancing the physical RAM address by 128 each time? Or could I write the full say 37KBits into write data at once and the mm bridge will take care of the streaming? Sorry for my lack of knowledge on this, I really do appreciate the assistance

Altera_Forum · ‎06-13-2018

--- Quote Start ---

Ok so I got that, now the channel width is only 128 Bits, would I write 128 Bits to the mm_bridge_0_s0_write multiple times advancing the physical RAM address by 128 each time? Or could I write the full say 37KBits into write data at once and the mm bridge will take care of the streaming? Sorry for my lack of knowledge on this, I really do appreciate the assistance

--- Quote End ---

You can adjust the width of the bridge through its parameters. Looks like it won't go higher than 4096 though (and I'm not sure if it would even work if you tried to set it wider than the physical channel). It's really just an adapter to let you deal with the Avalon-MM protocol instead of the considerably more complex AXI protocol. (You could in principle just export hps.f2h_axi_slave and write into that.) So, yes, you write X bits at a time, advance the pointer and repeat.

Why would you even want to write 37 kilobit at once, anyway? That means 37000 physical wires on the FPGA, all feeding into some IP component that tries to squeeze all that data through a 128-wire bus. You can probably figure out a way to do that, but I can't see it being very efficient. On the FPGA side, most of the data should be stored in M10Ks, which are relatively "deep" but "narrow" (each M10K can store 256 32-bit words, and produce at most 2 of them per clock.) That works well if you want to pull them out one by one and send them on their way. If you want to access all your data at once, the compiler can't use M10Ks, it has to use regular registers, so your logic utilization goes through the roof. Your chip only has 60K registers across its entire surface.

Altera_Forum · ‎06-13-2018

Yeah thats what I figured, I just have 80Mbits per second coming into the FPGA and need to be sending 80Mbits per second from the FPGA to the HPS, hopefully I can achieve that with this implementation of the FPGA-2-HPS bridge. Now I just need to work on the verilog logic to set up a loop to break up that 37Kbit and repeatedly write it to the bridge.

Altera_Forum · ‎06-18-2018

--- Quote Start ---

You can adjust the width of the bridge through its parameters. Looks like it won't go higher than 4096 though (and I'm not sure if it would even work if you tried to set it wider than the physical channel). It's really just an adapter to let you deal with the Avalon-MM protocol instead of the considerably more complex AXI protocol. (You could in principle just export hps.f2h_axi_slave and write into that.) So, yes, you write X bits at a time, advance the pointer and repeat.

Why would you even want to write 37 kilobit at once, anyway? That means 37000 physical wires on the FPGA, all feeding into some IP component that tries to squeeze all that data through a 128-wire bus. You can probably figure out a way to do that, but I can't see it being very efficient. On the FPGA side, most of the data should be stored in M10Ks, which are relatively "deep" but "narrow" (each M10K can store 256 32-bit words, and produce at most 2 of them per clock.) That works well if you want to pull them out one by one and send them on their way. If you want to access all your data at once, the compiler can't use M10Ks, it has to use regular registers, so your logic utilization goes through the roof. Your chip only has 60K registers across its entire surface.

--- Quote End ---

Eugene, or anyone else that happens to be looking at this. I've hit a wall was wondering if you could offer some advice. Pretty much I have it set up I think the right way where for now the FPGA should be writing a 128 bit value to an address in memory, and I am trying to read it with the HPS but I am having 0 luck. Here is my QSYS setup: https://alteraforum.com/forum/attachment.php?attachmentid=15591&stc=1

I have also attached my top level code for the verilog modules: https://alteraforum.com/forum/attachment.php?attachmentid=15592&stc=1 https://alteraforum.com/forum/attachment.php?attachmentid=15593&stc=1 https://alteraforum.com/forum/attachment.php?attachmentid=15594&stc=1 . In this case I am just trying to write the debug value of 128'b1. My C Code is taken from: https://digibird1.wordpress.com/playing-with-the-cyclone-v-soc-system-de0-nano-soc-kitatlas-soc/ . I use his code for the SGDMA to try to read from 0x32000000. However I haven't had any luck. Any advice?

Altera_Forum · ‎06-18-2018

Make sure that you have mm_bridge_0_s0_byteenable set to all 1's.

If that does not help, make sure that you have the right preloader in the HPS.

Altera_Forum · ‎06-19-2018

--- Quote Start ---

Make sure that you have mm_bridge_0_s0_byteenable set to all 1's.

If that does not help, make sure that you have the right preloader in the HPS.

--- Quote End ---

Thanks eugene, I didn't have the byteenable set, so that's one issue. Also this may be a stupid question, but when do you need to remake the preloader and uboot? Everytime you change anything in your Quartus project? Or only when you change something in Qsys? Thanks

Altera_Forum · ‎06-19-2018

I'm not sure what the correct answer is, but it's definitely less frequent than that. I've been testing my own project for the last several days without changing the preloader. It may just need to match the version of Quartus (the preloader that comes with the board might have been generated by 16.0 or an even earlier version.)

Altera_Forum · ‎06-19-2018

--- Quote Start ---

I'm not sure what the correct answer is, but it's definitely less frequent than that. I've been testing my own project for the last several days without changing the preloader. It may just need to match the version of Quartus (the preloader that comes with the board might have been generated by 16.0 or an even earlier version.)

--- Quote End ---

Alright well I'm having some issues generating preloaders right now, So I'm working on getting that figured out, in the mean time have you successfully written to the HPS ddr and then read the value from it? If so what address are you writing/read to/from?

Altera_Forum · ‎06-19-2018

wire mm_bridge_2_s0_waitrequest; wire mm_bridge_2_s0_readdata; wire mm_bridge_2_s0_readdatavalid; reg mm_bridge_2_s0_writedata=0; reg mm_bridge_2_s0_address=0; reg mm_bridge_2_s0_write=0; reg mm_bridge_2_s0_read=0; //reg mm_bridge_2_s0_read_requested=0; //reg mm_bridge_2_s0_write_requested=0; reg fifo_read_offset= 32'h20000000; reg fifo_read_address=0; reg fifo_write_offset=32'h28000000; reg fifo_write_address=0; if(!source_empty && !mm_bridge_2_s0_waitrequest) begin mm_bridge_2_s0_read<=1; mm_bridge_2_s0_address<=fifo_read_offset|{5'b0,fifo_read_address}; fifo_read_address<=fifo_read_address+4; source_empty<=...; end else if(!mm_bridge_2_s0_waitrequest) mm_bridge_2_s0_read<=0; if(mm_bridge_2_s0_readdatavalid && !core_full) ... <= mm_bridge_2_s0_readdata; else mm_bridge_2_s0_read<=0; if(core_readdatavalid && !mm_bridge_2_s0_waitrequest) begin mm_bridge_2_s0_write<=1; mm_bridge_2_s0_writedata <= ...; mm_bridge_2_s0_address<=fifo_write_offset|{5'b0,fifo_write_address}; fifo_write_address<=fifo_write_address+4; end else if(!mm_bridge_2_s0_waitrequest) mm_bridge_2_s0_write<=0; .... // FPGA->HPS access wires .mm_bridge_2_s0_waitrequest(mm_bridge_2_s0_waitrequest), // mm_bridge_1_s0.waitrequest .mm_bridge_2_s0_readdata(mm_bridge_2_s0_readdata), // .readdata .mm_bridge_2_s0_readdatavalid(mm_bridge_2_s0_readdatavalid), // .readdatavalid .mm_bridge_2_s0_burstcount(1'b1), // .burstcount .mm_bridge_2_s0_writedata(mm_bridge_2_s0_writedata), // .writedata .mm_bridge_2_s0_address(mm_bridge_2_s0_address), // .address .mm_bridge_2_s0_write(mm_bridge_2_s0_write), // .write .mm_bridge_2_s0_read(mm_bridge_2_s0_read), // .read .mm_bridge_2_s0_byteenable(4'hf), // .byteenable .mm_bridge_2_s0_debugaccess(1'b0), // .debugaccess

Altera_Forum · ‎06-20-2018

Hi Eugene,

Just a few questions on your code.

On this section here:

if(!source_empty && !mm_bridge_2_s0_waitrequest)			begin
			mm_bridge_2_s0_read<=1;
			mm_bridge_2_s0_address<=fifo_read_offset|{5'b0,fifo_read_address};
			fifo_read_address<=fifo_read_address+4;
			source_empty<=...;
			end
		else if(!mm_bridge_2_s0_waitrequest)
			mm_bridge_2_s0_read<=0;
		if(mm_bridge_2_s0_readdatavalid && !core_full)
			... <= mm_bridge_2_s0_readdata;
		else
			mm_bridge_2_s0_read<=0;

I am pretty lost on what this read section here is for. I don't think I need to do any reads so for now I am commenting it out assuming that it is regarding some specific act you are trying to do. I'm not exactly sure. And I do not know what source_empty or core_full are either.

I am pretty sure I understand the write section found here:

if(core_readdatavalid & !mm_bridge_0_s0_waitrequest)			begin
			mm_bridge_0_s0_write<=1;
			mm_bridge_0_s0_writedata <= data_input;
			mm_bridge_0_s0_address<=fifo_write_offset|{5'b0,fifo_write_address};
			fifo_write_address<=fifo_write_address+4;
			end
		else if(!mm_bridge_0_s0_waitrequest)
			mm_bridge_0_s0_write<=0;

Except I am not sure what core_readdatavalid is for, I'm assuming though that it would help if I understood what the read section before that is for?

Given that I followed along with everything above correctly, I modified it into my module here:

module send_to_hps(data_input, mm_bridge_0_s0_waitrequest, mm_bridge_0_s0_writedata, mm_bridge_0_s0_address, mm_bridge_0_s0_write);
	//Inputs
	input  data_input;
	input mm_bridge_0_s0_waitrequest;
	
	//Wires
	//wire mm_bridge_0_s0_waitrequest;
	//wire  mm_bridge_0_s0_readdata;
	//wire        mm_bridge_0_s0_readdatavalid;
	
	//Outputs
	output  mm_bridge_0_s0_writedata;
	output mm_bridge_0_s0_write;
	output mm_bridge_0_s0_address;
	
	//Registers
	reg  mm_bridge_0_s0_writedata=0;
	reg  mm_bridge_0_s0_address=0;
	reg mm_bridge_0_s0_write=0;
	//reg mm_bridge_0_s0_read=0;
	//reg fifo_read_offset= 32'h20000000;
	//reg fifo_read_address=0;
	reg fifo_write_offset=32'h28000000;
	reg fifo_write_address=0;
	/*if(!source_empty && !mm_bridge_2_s0_waitrequest)
			begin
			mm_bridge_2_s0_read<=1;
			mm_bridge_2_s0_address<=fifo_read_offset|{5'b0,fifo_read_address};
			fifo_read_address<=fifo_read_address+4;
			source_empty<=...;
			end
		else if(!mm_bridge_2_s0_waitrequest)
			mm_bridge_2_s0_read<=0;
		if(mm_bridge_2_s0_readdatavalid && !core_full)
			... <= mm_bridge_2_s0_readdata;
		else
			mm_bridge_2_s0_read<=0;*/
	always @ (data_input)
	begin
		if(/*core_readdatavalid &&*/ !mm_bridge_0_s0_waitrequest)
			begin
			mm_bridge_0_s0_write<=1;
			mm_bridge_0_s0_writedata <= data_input;
			mm_bridge_0_s0_address<=fifo_write_offset|{5'b0,fifo_write_address};
			fifo_write_address<=fifo_write_address+16;
			end
		else if(!mm_bridge_0_s0_waitrequest)
			mm_bridge_0_s0_write<=0;
	end
endmodule

The Inputs/Outputs to this module are all connected to the MM_Bridge for the HPS except for data_input. So I think that's correct.

I have two final questions:

One where did you come to the address 32'h28000000 to start writing at?

And two would I then try to read from that address in Linux using the C code something like this:

char* toBinary(int value, int precision){	static char buf = {0};
	buf = 0;
  for(; value && precision ; --precision, value /= 2)	{
    buf = "01";		
  }
  for(; precision ; --precision){
    buf = '0';
  }
	return &buf;
}
int main(int argc, char *argv) {
    if (argc != 3) {
        printf("usage: %s <address> <#bytes>\n",argv);
        return 0;
    }
    off_t offset = strtoul(argv, NULL, 0);
    size_t len = strtoul(argv, NULL, 0);
    // Truncate offset to a multiple of the page size, or mmap will fail.
    size_t pagesize = sysconf(_SC_PAGE_SIZE);
    off_t page_base = (offset / pagesize) * pagesize;
    off_t page_offset = offset - page_base;
    int fd = open("/dev/mem", O_SYNC);
    unsigned char *mem = mmap(NULL, page_offset + len, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, page_base);
    if (mem == MAP_FAILED) {
        perror("Can't map memory");
        return -1;
    }
    size_t i;
    for (i = 0; i < len; i++){
        int temp = (int)mem; 
        printf("%s ",toBinary(temp,8));
        if((i + 1) % 16 == 0 && i != len - 1){
            printf("\n");
        }
    }
    printf("\n");
    for (i = 0; i < len; i++){
        int temp = (int)mem; 
        if(temp >= 32 && temp <= 126){
            printf("%c",temp);
        }else{
            printf("_");
        }
        if((i + 1) % 143 == 0 && i != len - 1){
            printf("\n");
        }
    }
    printf("\n");
    
    return 0;
}

Where address would be 28000000 and bytes in my case would be 16 for 128 bits?

Altera_Forum · ‎06-20-2018

core_readdatavalid and core_full are signals from the part of my code that does the processing. source_empty signals the code to stop reading after a certain number of bytes has been read.

I am confused by your "always @ (data_input)". Either I misunderstand what you're trying to do, or this is wrong (or both). It should be "always @(posedge clk)" where clk is the clock that is associated with mm_bridge_0_s0. If the write succeeds (you come into the block, you have data to write, and mm_bridge_0_s0_waitrequest is low), you do whatever has to be done to replace data_input with the next value to write.

I use the entire top 512M for transferring data in and out. 2000_0000 to 2800_0000 is input buffer (written by the HPS and read by the FPGA) and 2800_0000 onwards is the output buffer (written by the FPGA and read by the HPS).

Yes, the resulting data should be visible to a code like that at address 2800_0000.

Altera_Forum · ‎06-21-2018

--- Quote Start ---

core_readdatavalid and core_full are signals from the part of my code that does the processing. source_empty signals the code to stop reading after a certain number of bytes has been read.

I am confused by your "always @ (data_input)". Either I misunderstand what you're trying to do, or this is wrong (or both). It should be "always @(posedge clk)" where clk is the clock that is associated with mm_bridge_0_s0. If the write succeeds (you come into the block, you have data to write, and mm_bridge_0_s0_waitrequest is low), you do whatever has to be done to replace data_input with the next value to write.

I use the entire top 512M for transferring data in and out. 2000_0000 to 2800_0000 is input buffer (written by the HPS and read by the FPGA) and 2800_0000 onwards is the output buffer (written by the FPGA and read by the HPS).

Yes, the resulting data should be visible to a code like that at address 2800_0000.

--- Quote End ---

What I am trying to do is send the data every time I get new data, aka every time it changes. The overall idea of how this project will work, or what I need it to do:

1) I get some data over FPGA GPIO.

2) The FPGA does some formatting with the data.

3) As soon as the its ready, it sends it to the HPS DDR memory.

4) Once the FPGA has written an entire packet, interrupts the HPS to let it know its ready.

5) HPS reads it and sends it out over the network.

So my idea was everytime data_input changes it would launch the write block. But maybe like you said I should change the always to @(posedge clk) and then check inside that block if the data_input changed AND if the waitrequest is low? I'll edit this post once I work on that, hopefully that always block is what is causing my issue. Because I still wasn't able to see anything at that 2800_0000 address. What I don't want happening is the system to write data multiple times. So, if no new data has been sent, the block still executes on the clock cycle and writes a single data point twice.

Maybe always@(posedge clk AND data_input) would work? I'm going to try and run some tests but its hard since I cant even get anything to write to the DDR right now.

EDIT: I can now successfully write! However for some reason I can only write 32 Bits at a time instead of 128, not sure why yet.