FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6669 Discussions

How Can I set up DMA operation with my own PC software application?

Altera_Forum
Honored Contributor II
8,039 Views

Hi All: 

 

I want to re-use the pcie_highperformancedesign example provided by the Arria GX Development Kit. Now I am confused with the PC software application altpcie_demo.exe. :confused:  

I am trying to control FPGA to initiate dma read and write operation just like altpcie_demo does with my own PC software application but failed. 

 

Firstly, I used Jungo Windriver to generate a pcie driver. With the API functions provided by the driver I can access(R/W) configure registers ,memory bar 1:0(the syncram) and bar2(dma control registers).  

 

Secondly, I creat a Read Descriptor Table--Header+2 Descriptors and set data(Length,Ep mem addr, RC mem addr)for desciptors. The header has four dw(DW0,DW1,DW2,DW3). For DMA Read, I set DW0=0x00040002,DW1=0,DW2=addr of header,DW3=0x1. Then I write DW0 to Bar2+0x10,DW1 to Bar2+0x14,DW2 to Bar2+0x18,DW3 to Bar2+0x1c. 

 

My first question: Where(mem addr) can I poll the RCLast value to indicate the completion of DMA read?:confused:  

 

Thirdly, I want to transfer the DMA Read data back to PC. I creat a Write Descriptor Table--Header+2 Descriptors.In each descriptor, I set PC mem addr for write back data and addr of EP mem correctly .The header has four dw(DW0-DW3). For DMA Write, I set DW0=0x00050002,DW1=0,DW2=addr of Header,DW3=0x1. Then I write DW0 to Bar2+0x0,DW1 to Bar2+0x4,DW2 to Bar2+0x8,DW3 to Bar2+0xc. 

 

At the end, I checked the write back data and found that the write back data are all zeros.It seems like that the FPGA does nothing at all.:confused:  

 

What are the detailed steps I should follow to set up the DMA operation correctly? I read the pci express compiler doc but didn't get enough information about software application.  

 

Thanks a lot for any help.
0 Kudos
33 Replies
Altera_Forum
Honored Contributor II
2,780 Views

What I found confusing in the 'PCI Express Compiler Users Guide' on pg 6-19 was the relation of the 5 step process to kick off the DMA and the Chaining DMA Descriptor Table. The 5 step process sounds like the implementation of the Simple DMA. If it is not, what do the terms PCI Express address (step 1) and master memory block (step 2) refer to? 

 

Does master memory block refer to Chaining DMA Descriptor Table's offset in BAR2? 

 

A few other questions I had about the example are: 

  • Are the Descriptor Tables supposed to be written into the shared memory assigned to BAR2?
  • Figure 6-3 shows the descriptor tables in RC memory. If this is the case, how does the Arria access these data structures? Is the RC memory in Figure 6-3 an implicit shared memory block?
  • pg 6-19 says 'The software application writes the descriptor header into the into the endpoint header descritor register'. Table 6-7 maps the descritor headers to endpoint addresses 0x00 thru 0x20. These memory spaces conflict with the 5 step process on pg 6-19 to kick of the DMA. It looks like I am confusing something here. Does anyone know?
Any help would be appreciated. Thanks.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

I think the correct method is the software application writes Descriptor Table Header into Bar2(or3) mapped endpoint header descriptor registers at offset 0x00-0x1c.  

 

PCIe compiler 7.2 User Guide page 6-17 said,"altpcie_dma_prg_reg-This module contains the descriptor header table registers which get programmed by the software application.This module collects PCI Express transaction layer packets from the software application with the TLP type Mwr on Bar2 or 3" and "Header register module-RC programs the descriptor header(4 DWORDS) at the beginning of the DMA". 

 

The next paragraph,"altpcie_dma_descriptor-This module retrieves the DMA read or write descriptor from the root port memory,and stores it in descriptor FIFO.This module issues PCI Express transaction layer packets to the BFM shared memory with the TLP type MRd". 

 

In the simulation model,the Root Port BFM sources data(descriptors) for completions in response to read transactions received from the PCIE link,I think. But in the software application ,which module will response to the altpcie_dma_descriptor issued MRd TLP? Does the Jungo Pcie driver response automatically? Or should I write codes to deal with such MRd TLP in my software app? 

 

At page 6-18,Table 6-4 descripted the Bar/Address map. Should I set Bar4(or5) if I want to use the rc_slave module in the example to bypass the chaining dma? But Bar0(or1) is also descripted to be used for rc_slave module.A mistake? 

 

I have so many questions with the chaining dma example.I am wondering why Altera not release the source code of pcie software application such as altpcie_demo.exe. 

 

Thanks for reply.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

  • You said: I have so many questions with the chaining dma example.I am wondering why Altera not release the source code of pcie software application such as altpcie_demo.exe.
You are correct, all of this confusion would be eliminated if they would release this source code (driver source might be needed as well). Do you think we need to start a new thread to explicitly ask for this? 

  • You said: I think the correct method is the software application writes Descriptor Table Header into Bar2(or3) mapped endpoint header descriptor registers at offset 0x00-0x1c.
 

After reading PCI Express Compiler Users Guide, I thought the same thing, but then I started looking at the bus function model (BFM) driver source code to see how a DMA simulation is performed, and now I think otherwise. For example, look at the file 'C:\altera\72\kits\ArriaGX_PCIe\Examples\PCIe_HighPerformanceDesign\Quartus\top_x4_examples\chaining_dma\testbench\altpcietb_bfm_driver_chaining.v' 

I believe this file is one of the higher level bfm driver routines. If you look at the file you can find the following (some parts omitted for brevity): 

# #########BEGIN CODE################ 

// Run the chained DMA write 

task dma_wr_test(...); 

begin 

// write 'write descriptor table in the RC Memory 

dma_set_wr_desc_data(bar_table, setup_bar); 

 

// Write Descriptor header in EP memory PRG 

dma_set_header( ... ) 

 

end# #########END CODE################ 

 

If you look at the called functions dma_set_wr_desc_data() claims it writes the descriptor table in root complex (PC / host) memory.  

Also, the comments above dma_set_header() function shows descriptor header tables for endpoint and root complex memory. The documentation almost reads like there is one descriptor header table mapped to a BAR 2. All in all, the code is not clear for porting to an actual implementation because I cant tell if shared memory means BFM driver memory or memory mapped by a BAR (or they are the same thing in an actual implemenation). 

 

I wish there was a document that explained the reference design a little more for the vantage point of someone that wants to modify the existing design, and not from the BFM vantage point. The BFM blurs what needs to be done by a PC and what is contained in the reference design. 

 

Best of luck, it appears we both need some right now. :cool:
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Maybe I can clear up a few things... 

 

The descriptor tables are located in the system's host memory (also known as root complex memory or BFM Shared Memory). See figure 7-2 of the 8.0 PCIe Compiler User Guide. I'm not a software guy, but you will need to get the system to lock down that memory and give you the real physical memory address of it (not the virtual address the application would use). Same thing you need to do with the actual memory buffer data you want to transfer via DMA. The addresses in the descriptor table point to the data buffers to be transferred. The descriptor table entries are described by tables 7-6, 7-7 and 7-8.  

 

Then you must write the real physical address of the descriptor table to the Descriptor Table Header registers which are offset from BAR2 (or BAR3:2) by the values shown in table 7-5. The Descriptor Table Header format is shown in tables 7-3, 7-4, and 7-5.  

 

The Chaining DMA hardware will then read the Descriptor Table using MRd TLP's from the system host memory, using the address from the Descriptor Table Header register. The root complex hardware will automatically respond to the MRd TLP and return the data from the memory address. (huzj_ecc - your driver doesn't need to respond to the MRd TLP, in fact there is no way to do that, you just have to have the descriptor table locked down in memory and put the correct address in the Decriptor Table Header register.) 

 

It does appear that the PCIe Compiler user guide is missing an important piece of information on how this is all setup. The organization of the actual descriptor table:  

 

Byte Offset Field 

0-13 Reserved  

14-15 EPLAST 

16-31 Descriptor# 1 (following format of tables 7-6, 7-7, and 7-8) 

32-47 Descriptor# 2 (ditto) 

48-63 Descriptor# 3 (ditto)  

..... and so on for as many descriptors as specified by the "Size" field 

in the descriptor table header register 

 

I think the Descriptor Table must also be no more than 4KB in total size and can't cross a 4KB boundary.  

 

The EPLAST field in the Descriptor Table is updated by the Chaining DMA hardware with the number of the last descriptor that was completed, when the hardware is enabled to do so by the EPLAST_ENA bit in the Descriptor Table Header register or the EPLAST_ENA bit in the actual descriptor.  

 

heppermann - Yes, it looks like those steps you mentioned in the user guide are leftover from the previous simple DMA description.  

 

I think I answered most of the questions with the above description. Please post any followups here. I will try to answer if I know the answer and when I can.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hello Hey Hey, 

Thank you for your response, that was very informative and cleared up a lot. 

 

I have atleast one more point of confusion. There are two Chaining DMA Descriptor Headers at offset 0x00 and 0x10. The first for write and the other for read. Why is there a Direction bit in the Control Fields (Table 7-4 of PCI Express Compilers Users Guide 8.0)? Is this a redudant thing, or is there some significance to this bit. To me, I would assume the registers at 0x00 and 0x10 specify the direction. 

 

Thanks.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

The direction bit is redundant. I'm not 100% sure, but I think it is not even neccesary to set it to the correct value.

0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hello, 

 

on what board do you guys exercise this chaining DMA design? 

 

I would like to write an (open source) device driver for the Altera Chaining DMA Example, but I need to get the soft core working first. 

 

Regards, 

 

Leon.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Here are some Altera boards that it will work on: 

 

pci express development kit, stratix ii gx edition (http://www.altera.com/products/devkits/altera/kit-pciexpress_s2gx.html

 

arria gx fpga development kit (http://www.altera.com/products/devkits/altera/kit-arriagx.html

 

Altera also has many partners with PCI Express boards that it will work on. Look here (http://www.altera.com/products/devkits/kit-dev_platforms_partner.jsp). The chaining DMA design example requires no off-FPGA resources (besides the PCIe interface of course) so it should work on pretty much any board with an Altera device connected to the PCIe link, via either an internal or external PHY.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hello Hey Hey, 

Can I use the chaining programming files (C:\altera\72\kits\ArriaGX_PCIe\Examples\PCIe_HighPerformanceDesign\Quartus\top_x4.sof) that are installed with the Arria GX dev kit, or do I have to do a complete build of the FPGA build files? 

 

If that is OK, I am seeing some unexpected results. Everytime I try to write and read a BAR, I always read a 0xffff back. Do you have any idea what I might be doing wrong?
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

That .sof should be okay. I'm not a software guy so I'm not sure what could be wrong. Do you see the device with PCI bus scanning software? On Linux systems I have seen something called "lspci" that does that. If you are not seeing the device with that then the board is probably not configuring or something. But if the san finds it, then there is something wrong in the software at your end.

0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

At least in Linux, -1 (or 0xffff...) is the default value when you are reading non-existent I/O or memory mapped locations. Not sure if this goes for PCI as well. 

 

However, we cannot guess what you might be doing wrong without seeing source code.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

TO_BE_DONE

0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hello Hey Hey, 

The Linux command lspci does show my device, and further more the kernel registers this device with my driver so I am confident that the PnP features of PCI are operating correctly on my Arria dev board. It seems like the board is configuring correctly, hopefully member 'likewise' has a clue of what I am doing wrong. 

 

Thanks. 

 

PS. I bumped your rep points. I appreciate the continued help.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hmm....You might not be doing anything wrong, this all might be 'features' of the chaining DMA hardware.  

 

Whether there is actually anything behind the BAR0 is controlled by the USE_RCSLAVE Verilog generic on the altpcierd_example_app_chaining module in the example design. Depending on which version of the PCIe compiler you are using this may be set to either 0 or 1 by default. You can change it to a 1 and recompile your design to enable the memory behind BAR0.  

 

When USE_RCSLAVE == 0 reads to BAR0 will not generate a completion on the PCIe link. The Root Complex (motherboard chipset) will timeout and probably return all FF's to the CPU.  

 

Now the .sof file that comes with the dev kit should have this set to a 1, so the memory should be there.  

 

But... I think the hardware may not respond completely correctly to a single byte read. The completion would probably still be for a Dword (4 bytes). The root complex may not like this and still return all FF"s to the CPU.  

 

Even though I'm not a software guy, with a little help from Google it looks like the readb() function you are using is just a single byte read. So if your design has USE_RCSLAVE == 1 (like the development kit .sof), I suggest trying to use writel() and readl() instead of writeb() and readb(), to see if that works for accesses to BAR0.  

 

Now as far as accesses to BAR2 go, it turns out those registers are write-only. The PCIe Compiler User Guide is just plain wrong on that. So reads to those will fail. The chaining DMA was designed to provide all of it's status through interrupts and writes to the host memory. Those registers are also never changed by the hardware so they always have the same value that was written by software. So there was no real functional need to have read-back, you just have to trust the hardware. Though I admit read-back would be nice for the "trust but verify" mindset. :)
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Another suggestion, when setting USE_RCSLAVE to 1 and trying to Write and Read from BAR0, use an offset above 32 (and less than 32KB) when accessing it. That should be some just plain old Read/Write memory. Offsets 0-31 from BAR0 have some undocumented internal testing features.

0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Hello, 

 

in first instance I would like to mimick what the testbench does, when starting a device driver. I am reading PCIe Compiler 8.0 UG, May 2008. 

 

Page 7-16 describes the Test Module: 

 

the Descriptor Header for write DMA is at 0x00-0x10, and for reads is at 0x10-x20. In the Descriptor Header, the base address descriptor fields is set to 0x800. 

 

However, the first Descriptor is placed at 0x810, the next at 0x820, etc. 

 

Now, my question is, what must be at 0x800?? 

 

The only reference to 0x800, is page 7-18 where it is said that at 0x80c the DMA engine is writing its completed DMA number. 

 

My suspicion is that at 0x800 a copy of the header must be, which is written by the DMA engine and where EPLAST is written. 

 

Any ideas? 

 

Regards, Leon.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

Leon,  

 

The chaining DMA hardware does write to the EPLAST field at 0x800c as you suspected. Nothing else needs to be in the 0x800 to 0x80b range. (The testbench BFM Driver may store a copy of the descriptor header there for it's own bookeeping purposes, I can't remember off the top of head for sure, but that is not needed for the actual hardware operation.)  

 

See my post# 5 (http://www.alteraforum.com/forum/showpost.php?p=10857&postcount=5) above where I said the same basic thing.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

I have a Linux driver that at least triggers some DMA transfer. I will post it online once I have the last issues worked out. I need some insights though, can someone look along at what I miss? 

 

Table 7-13 on page 7-19 of PCIe UG 8.0: 

 

82 DWORDS (32-bit words) are transferred by the End Point DMA engine from Root Complex (or BFM) memory from (bus) address 0x8EF0 to End Point memory address 0x3. 

 

The table also specifies that "Data" is initialized with incrementing values in the address range 0x8900-0x8940. 

 

But this address range is not used in the DMA transfer at all??! 

What did I miss? 

 

The same question applies to descriptor [1] and [2]. 

 

Second unclearity: 

 

In the header, no control bits are set. In the descriptors, no control bits are set.  

 

How then can the Root Complex (BFM) poll for EPLAST??! It is only updated if either EPLAST_ENA is set in the descriptor control bits or header control bits.
0 Kudos
Altera_Forum
Honored Contributor II
2,780 Views

I am correct that DMA operates on the End Point memory mapped to BAR[0], or is the memory involved in DMA located elsewhere? 

 

Also, for individual DWORD read/writes to RC_SLAVE memory, must I set USE_RC_DIRECT_MEM to 1? 

 

altpcierd_rc_slave.vhd: 

 

USE_EP_MWR := 0;-- Allow EP to issue MemWr to RC on command 

USE_RC_MWR_MRD := 1; -- Allow RC access to EP MEM thru opcode regs 

USE_INIT_MEM: INTEGER := 0; 

USE_RC_DIRECT_MEM: INTEGER := 0;-- Allow RC direct access to EP MEM 

USE_EP_IO_RDWR:= 0; -- Allow EP to issue IO Rd/Wr to RC on command 

 

Thanks for any insights.
0 Kudos
Altera_Forum
Honored Contributor II
2,717 Views

 

--- Quote Start ---  

I have a Linux driver that at least triggers some DMA transfer. I will post it online once I have the last issues worked out. I need some insights though, can someone look along at what I miss? 

 

Table 7-13 on page 7-19 of PCIe UG 8.0: 

 

82 DWORDS (32-bit words) are transferred by the End Point DMA engine from Root Complex (or BFM) memory from (bus) address 0x8EF0 to End Point memory address 0x3. 

 

The table also specifies that "Data" is initialized with incrementing values in the address range 0x8900-0x8940. 

 

But this address range is not used in the DMA transfer at all??! 

What did I miss? 

 

The same question applies to descriptor [1] and [2]. 

 

Second unclearity: 

 

In the header, no control bits are set. In the descriptors, no control bits are set.  

 

How then can the Root Complex (BFM) poll for EPLAST??! It is only updated if either EPLAST_ENA is set in the descriptor control bits or header control bits. 

--- Quote End ---  

 

 

Regarding the first issue, unfortunately it appears the User Guide documentation is out of sync with the actual altpcietb_bfm_driver_chaining.v descriptor setup. The Verilog localparam statements that define all of the WR_DESCxxx and RD_DESCxxx values are by definition the correct values.  

 

I don't see the problem with the second issue. The first two times the "chained_dma_test" task is called, the input "use_eplast" is set to 1. Through some sub task calls we end up in task "dma_set_header" where "use_eplast" sets dt_dw0[18]. dt_dwo0 is then written to the descriptor header register in the Endpoint. Setting this bit in the EP register is what causes EPLAST to be updated during the DMA operation.  

 

The next two calls to the "chained_dma_test" use MSI instead of updating EPLAST to report the status.
0 Kudos
Reply