how the data is transfered using PCIE LINK

Altera_Forum · ‎03-27-2013

I am using the altera S2GX for a project uses PCIE to transfer data.

Now we have the hardware working with the PC, since altera didn't provide the source code for the pc app.

I am not sure how the data is transferred to or from the fpga.

For example, what's BAR used for? if I am not using the DMA to transfer the data, is that mean I dont need the BAR?

I just want to do a simple read and write to the fpga, which address should i put the data on? where i specify the address i want to write to on fpga?

If you know any document talks about, please point me there.

Thanks

Altera_Forum · ‎03-27-2013

so after some research, i find out:

assume there is only one RC, and one EP device...so the BAR address range is basically one to one mapped on the End point device, right?

so, does that mean, if my end point device needs 4G address range, then the BAR will be 4G big?

Next question, when i use PCIE compiler, the BAR0 and BAR1 is together formed a 64bit memory space, what does this mean? for me, it looks like a 32 bit memory address but with size as the sum of BAR0 and BAR1

Last question, i am using the descriptor packet in side fpga, and the databus is 64 bit wide, so my question is:

at the RC side/PC, how many bytes of data does each BAR address points to? 1 byte?

if its 1 byte, and i want to transfer 2k byte to the end point device, so i just write 2k BAR memory space?

Do I need to trigger the sending action? or once the max_payload is formed, the packet will be sent automatically?

Appreciated.

Altera_Forum · ‎03-27-2013

Last question, sounds stupid, my teammate is doing the software side work, i am trying to help...

So in case of reading, how to read from the BAR? or you dont?

we use Windriver, there is a dataread function....I guess that will handle the job, right?

Altera_Forum · ‎03-28-2013

Nobody??? why just no replies....for every post.....

Altera_Forum · ‎03-28-2013

--- Quote Start ---

Nobody??? why just no replies....for every post.....

--- Quote End ---

You're probably not getting any replies because you're jumping into PCIe without actually spending any time to understand what PCI and PCIe is.

Take a look at the document I posted on this thread:

http://www.alteraforum.com/forum/showthread.php?t=35678

Now, to answer a few basic questions for you.

1. A PCIe BAR is a "base address register", its basically a window into the address map of your board. Lets say your board had 4kB of registers that the host needed to access, then the BAR size would be 4kB. The physical address of the BAR in the address map of the RC (the host PC) is determined by the BIOS at boot time. The guy writing the device driver on the host needs to know that address, but that is easy, as the device driver just asks the operating system.

2. A PCIe end-point does not usually use big BARs. Why? Well, a host like a PC does not usually have a DMA controller, and performing reads and writes as 32-bit or 64-bit operations is really efficient ('cause they get packed into serial packets with headers etc). A PCIe end-point usually has a DMA controller, and that DMA controller is what your host device driver programs to move data around, i.e., the host can use a small number of inefficient writes to setup the DMA controller, and then say "go", and the DMA controller will move large volumes of data efficiently.

Writing Linux device drivers for PCIe devices is very simple. In fact, you don't even need a device driver for simple accesses. The thread above has an example program that allows you to read or write from the PCIe BARs.

Cheers,

Dave

Altera_Forum · ‎03-28-2013

--- Quote Start ---

You're probably not getting any replies because you're jumping into PCIe without actually spending any time to understand what PCI and PCIe is.

Take a look at the document I posted on this thread:

http://www.alteraforum.com/forum/showthread.php?t=35678

Now, to answer a few basic questions for you.

1. A PCIe BAR is a "base address register", its basically a window into the address map of your board. Lets say your board had 4kB of registers that the host needed to access, then the BAR size would be 4kB. The physical address of the BAR in the address map of the RC (the host PC) is determined by the BIOS at boot time. The guy writing the device driver on the host needs to know that address, but that is easy, as the device driver just asks the operating system.

2. A PCIe end-point does not usually use big BARs. Why? Well, a host like a PC does not usually have a DMA controller, and performing reads and writes as 32-bit or 64-bit operations is really efficient ('cause they get packed into serial packets with headers etc). A PCIe end-point usually has a DMA controller, and that DMA controller is what your host device driver programs to move data around, i.e., the host can use a small number of inefficient writes to setup the DMA controller, and then say "go", and the DMA controller will move large volumes of data efficiently.

Writing Linux device drivers for PCIe devices is very simple. In fact, you don't even need a device driver for simple accesses. The thread above has an example program that allows you to read or write from the PCIe BARs.

Cheers,

Dave

--- Quote End ---

Thanks for you reply first.

I had spent some time on simulating the PCIE packet at fpga side, so I thought I had some understanding of the PCIE TLP packet, at least, without going to deep into the core of PCIE.

I know how the interface signals. anyway...

I am playing with the PCIE testbench simulation today, Now I understood that BAR is one to one mapped between the RC and EP. I think in my case BAR0/1 is used for simple memory read/write

and BAR2 is used for DMA.

looks like the BAR is also mapped on the system memory address with same size, I understand the basics of DMA which frees the CPU from handling all the data moving work. (isn't still require a method to tell the EP which address to start?)

at the EP, I have a 20Gbytes nand flashes, should I make the BAR bigger? or there is other way around? lol I dont think I can make BAR 20G, right?

Altera_Forum · ‎03-28-2013

--- Quote Start ---

I am playing with the PCIE testbench simulation today, Now I understood that BAR is one to one mapped between the RC and EP. I think in my case BAR0/1 is used for simple memory read/write

--- Quote End ---

Think of the BAR as the interface needed by the host (the RC). The only way the host can talk to the PCIe EP is via the BAR.

--- Quote Start ---

and BAR2 is used for DMA.

--- Quote End ---

Not quite, you need to be clear in your descriptions, so its clear that you understand what is going on.

The DMA controller must be a PCIe bus master and so it generates its own 64-bit addresses. If the RC needs to program the DMA controller, then the DMA control registers might live in BAR2.

--- Quote Start ---

I understand the basics of DMA which frees the CPU from handling all the data moving work. (isn't still require a method to tell the EP which address to start?)

--- Quote End ---

The RC needs to program the EP DMA controller registers.

For a simple DMA controller, controller registers have a source address, a destination address, the data length to transfer, a control register (with a "go!" bit) and a status register (with a "done!" bit).

"Real" DMA controllers are more complicated than that. They have scatter-gather buffers, which are basically linked lists of data transfers to perform. The DMA controller will consume the linked list based on other register settings, eg., "do this every time you get an interrupt". This relieves the processor of doing anything ... other than the original setup of the scatter-gather lists. The host (RC) can optionally be interrupted as DMA events occur.

--- Quote Start ---

at the EP, I have a 20Gbytes nand flashes, should I make the BAR bigger? or there is other way around? lol I dont think I can make BAR 20G, right?

--- Quote End ---

No, if you make your BAR 20G, you will not be able to boot your PC. The BIOS will choke. If you read the notes I linked to in the other thread, I could not boot my EliteBook laptop if I made the BAR too big.

If you want the RC to get data from the 20G drive, then it has to program the DMA controller with source addresses that correspond to the 20G drive (or memory buffers that drive creates), and then transfer those buffers using DMA to the host.

If you dig into the filesystem for your OS, you'll find that it works in pages/sectors, i.e., blocks of bytes. Your DMA controller needs to move a request page of bytes from the PCIe EP over the PCIe bus to the host memory (where it is possibly copied into a page that the filesystem driver gave you).

Something like that anyway ...

Cheers,

Dave

Altera_Forum · ‎03-28-2013

--- Quote Start ---

Think of the BAR as the interface needed by the host (the RC). The only way the host can talk to the PCIe EP is via the BAR.

Not quite, you need to be clear in your descriptions, so its clear that you understand what is going on.

The DMA controller must be a PCIe bus master and so it generates its own 64-bit addresses. If the RC needs to program the DMA controller, then the DMA control registers might live in BAR2.

The RC needs to program the EP DMA controller registers.

For a simple DMA controller, controller registers have a source address, a destination address, the data length to transfer, a control register (with a "go!" bit) and a status register (with a "done!" bit).

"Real" DMA controllers are more complicated than that. They have scatter-gather buffers, which are basically linked lists of data transfers to perform. The DMA controller will consume the linked list based on other register settings, eg., "do this every time you get an interrupt". This relieves the processor of doing anything ... other than the original setup of the scatter-gather lists. The host (RC) can optionally be interrupted as DMA events occur.

No, if you make your BAR 20G, you will not be able to boot your PC. The BIOS will choke. If you read the notes I linked to in the other thread, I could not boot my EliteBook laptop if I made the BAR too big.

If you want the RC to get data from the 20G drive, then it has to program the DMA controller with source addresses that correspond to the 20G drive (or memory buffers that drive creates), and then transfer those buffers using DMA to the host.

If you dig into the filesystem for your OS, you'll find that it works in pages/sectors, i.e., blocks of bytes. Your DMA controller needs to move a request page of bytes from the PCIe EP over the PCIe bus to the host memory (where it is possibly copied into a page that the filesystem driver gave you).

Something like that anyway ...

Cheers,

Dave

--- Quote End ---

Really appreciated, I was hoping to avoid using DMA and just make a simple read and write function.

I guess I still have lots of readings to do....T_T.....

So many stuff....

Altera_Forum · ‎03-28-2013

--- Quote Start ---

Think of the BAR as the interface needed by the host (the RC). The only way the host can talk to the PCIe EP is via the BAR.

Not quite, you need to be clear in your descriptions, so its clear that you understand what is going on.

The DMA controller must be a PCIe bus master and so it generates its own 64-bit addresses. If the RC needs to program the DMA controller, then the DMA control registers might live in BAR2.

The RC needs to program the EP DMA controller registers.

For a simple DMA controller, controller registers have a source address, a destination address, the data length to transfer, a control register (with a "go!" bit) and a status register (with a "done!" bit).

"Real" DMA controllers are more complicated than that. They have scatter-gather buffers, which are basically linked lists of data transfers to perform. The DMA controller will consume the linked list based on other register settings, eg., "do this every time you get an interrupt". This relieves the processor of doing anything ... other than the original setup of the scatter-gather lists. The host (RC) can optionally be interrupted as DMA events occur.

No, if you make your BAR 20G, you will not be able to boot your PC. The BIOS will choke. If you read the notes I linked to in the other thread, I could not boot my EliteBook laptop if I made the BAR too big.

If you want the RC to get data from the 20G drive, then it has to program the DMA controller with source addresses that correspond to the 20G drive (or memory buffers that drive creates), and then transfer those buffers using DMA to the host.

If you dig into the filesystem for your OS, you'll find that it works in pages/sectors, i.e., blocks of bytes. Your DMA controller needs to move a request page of bytes from the PCIe EP over the PCIe bus to the host memory (where it is possibly copied into a page that the filesystem driver gave you).

Something like that anyway ...

Cheers,

Dave

--- Quote End ---

Really appreciated, I was hoping to avoid using DMA and just make a simple read and write function.

I guess I still have lots of readings to do....T_T.....

So many stuff....

Altera_Forum · ‎03-28-2013

--- Quote Start ---

Really appreciated, I was hoping to avoid using DMA and just make a simple read and write function.

--- Quote End ---

That has already been done, as I said, look at the thread I posted the link to, there is a PCIe debug utility that does not require a driver. You can use that to read or write registers. The code is written for Linux. Under Windows, you can probably do something similar with the Jungo tools.

--- Quote Start ---

I guess I still have lots of readings to do....T_T.....

So many stuff....

--- Quote End ---

If it was easy, it wouldn't be fun, right? :)

Cheers,

Dave

Altera_Forum · ‎03-28-2013

The basic problem is that although PCIe is high bandwidth, it is also high latency.

So if a cpu tries to read a location over PCIe the instruction will stall for a long time - probably in the order of 10us (think ISA bus speeds).

This is the same for an x86 host reading fpga memory, or a nios cpu reading host memory.

Writes can be a little faster since they can be 'posted' (address + data latched and the initiating cycle terminated).

This is not too bad for diagnostics, but horrid for anything that requires any amount of throughput.