FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6359 Discussions

Help: Windows didn't assign Bar0 to PCie (address mapping is not correct)

Altera_Forum
Honored Contributor II
4,864 Views

Dear all, 

 

We changed the Altera Cyclone PCIe DDR2 reference design to our needs. However, it doesn't work. The address mapping is not correct, it doesn't agree to address assignments in Qsys. 

 

In WinDriver, Bar0 doesn't appear, only Bar2 appears under "Memory" tab (see Attached Figure). But the address range of Bar2 is not correct either, and the reading and writing were not correct when we write then read from Bar2. 

 

Anyone can give any suggestions? Why it didn't have bar0 as assigned in Qsys? 

 

The Windows operating system is Windows XP. 

 

I attached the figures of Qsys block diagram, Qsys address map and IP Complier. 

 

Thank you very much!
0 Kudos
27 Replies
Altera_Forum
Honored Contributor II
281 Views

 

--- Quote Start ---  

1) It is Cyclone IV GX speed grade 7, I modified the design in  

http://www.alterawiki.com/wiki/pci_express_in_qsys_example_designs 

changed the DDR2 to our 128 M 16 bit DDR2 SDRAM. 

 

--- Quote End ---  

 

 

Ok. 

 

 

--- Quote Start ---  

 

2) The memory is to be used for storing some parameters/commands for signal processing 

 

--- Quote End ---  

 

 

Why use the DDR memory for this? It would make more sense to me to use on-chip memory for parameters that the DSP logic will be using. Of course, that assumes there are a few parameters. 

 

 

--- Quote Start ---  

 

3) There is a lot of RF data from ADCs for the FPGA to process 

 

--- Quote End ---  

 

 

That data should be going directly into the DSP logic. What actually needs to be stored in the DDR? A power spectrum? A cross-correlation? 

 

 

--- Quote Start ---  

 

CPU needs to send commands/parameters 

 

--- Quote End ---  

 

Again, this likely should go to on-chip RAM and registers. 

 

 

--- Quote Start ---  

 

also obtains processed data storage in DDR2 SDRAM from PCIe. I don't know if CPU needs to see that memory or not, if it does not need to see that memory, how to realize ? to get the data from DDR2 SDRAM? 

 

--- Quote End ---  

 

You will never want to use the host CPU to transfer anything but a few simple parameters. The performance of a CPU issuing a write or read command to a PCIe device is slow. Its fine for setting up a few registers, or initializing a DMA controller, but its ultimately just slow. 

 

You need to talk to a device driver developer. They will explain that devices that transfer data, eg., network cards, video cards, data processing cards, do not use the CPU for moving the data, they use a DMA controller on each of those respective cards. 

 

You should design the hardware to match the requirements of the device driver developer. 

 

 

--- Quote Start ---  

 

4) FPGA processes the ADC data,CPU sends commands/parameters, access data in on-chip memory or in DDR2 SDRAM. displays processed data for real-time (30frams/s) images on screen. 

--- Quote End ---  

 

 

30 frames per second of what? An HD image, or a 1024-point power spectrum? This data defines your sustained data rate from your board to the CPU. Calculate it. This is your design target!!! 

 

If your data rate is low, perhaps a simple CPU-based read of that data will be sufficient. However, in most applications it would not be, or it would be a waste of CPU time, and DMA will be your only option. If you do need DMA, then you can discuss with your device driver developer whether or not the Altera DMA controller has sufficient functionality for your requirements. 

 

For example, in the Qsys PCIe example I sent a link to, the PCIe bridge can be configured with a 1MB outgoing translation window. If your device driver developer can guarantee that the host data used for DMA is located in a single 1MB region, then you can just use that DMA controller directly. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
281 Views

Dave,  

 

The data rate is high, DDR2 SDRAM is necessary to storage large temporary data to ensure real-time, and it is designed by my supervisor, not me. 

 

My task is to make both DDR2 and on-chip memory work, not to avoid the problem by getting rid of DDR2 SDRAM or on-chip memory. 

 

Could you tell me where is the link of your example ?  

" in the Qsys PCIe example I sent a link to, the PCIe bridge can be configured with a 1MB outgoing translation window." 

 

Thank you very much! 

 

 

--- Quote Start ---  

Ok. 

 

 

 

Why use the DDR memory for this? It would make more sense to me to use on-chip memory for parameters that the DSP logic will be using. Of course, that assumes there are a few parameters. 

 

 

 

That data should be going directly into the DSP logic. What actually needs to be stored in the DDR? A power spectrum? A cross-correlation? 

 

 

Again, this likely should go to on-chip RAM and registers. 

 

 

You will never want to use the host CPU to transfer anything but a few simple parameters. The performance of a CPU issuing a write or read command to a PCIe device is slow. Its fine for setting up a few registers, or initializing a DMA controller, but its ultimately just slow. 

 

You need to talk to a device driver developer. They will explain that devices that transfer data, eg., network cards, video cards, data processing cards, do not use the CPU for moving the data, they use a DMA controller on each of those respective cards. 

 

You should design the hardware to match the requirements of the device driver developer. 

 

 

 

30 frames per second of what? An HD image, or a 1024-point power spectrum? This data defines your sustained data rate from your board to the CPU. Calculate it. This is your design target!!! 

 

If your data rate is low, perhaps a simple CPU-based read of that data will be sufficient. However, in most applications it would not be, or it would be a waste of CPU time, and DMA will be your only option. If you do need DMA, then you can discuss with your device driver developer whether or not the Altera DMA controller has sufficient functionality for your requirements. 

 

For example, in the Qsys PCIe example I sent a link to, the PCIe bridge can be configured with a 1MB outgoing translation window. If your device driver developer can guarantee that the host data used for DMA is located in a single 1MB region, then you can just use that DMA controller directly. 

 

Cheers, 

Dave 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
281 Views

 

--- Quote Start ---  

 

The data rate is high, DDR2 SDRAM is necessary to storage large temporary data to ensure real-time 

 

--- Quote End ---  

 

 

Then you have no choice but to use DMA. You cannot efficiently transfer data using the host CPU. 

 

 

--- Quote Start ---  

 

it is designed by my supervisor, not me. 

 

--- Quote End ---  

 

 

And is your supervisor aware of how to design PCI and PCIe systems and how to write Linux device drivers? Is he aware of the differences in address maps? Has he been reading these forum messages? (Please ask him to). 

 

 

--- Quote Start ---  

 

My task is to make both DDR2 and on-chip memory work, not to avoid the problem by getting rid of DDR2 SDRAM or on-chip memory. 

 

--- Quote End ---  

 

 

I never indicated you had to get rid of it. However, it is unlikely you can ever have the host CPU see all memory on your board via a BAR. As I commented earlier, I have a 64-bit HP EliteBook with 16GB of RAM, and if I use a PCIe BAR of 256MB, that CPU will not boot. So, this issue is not solved by simply using a 64-bit host CPU, its only solved by avoiding it altogether, by using DMA between the Qsys address map and the PCIe address map. 

 

 

--- Quote Start ---  

 

Could you tell me where is the link of your example ?  

" in the Qsys PCIe example I sent a link to, the PCIe bridge can be configured with a 1MB outgoing translation window." 

 

--- Quote End ---  

 

 

Its part of the PCIe core configuration. The core gets configured with two 1MB outgoing translation windows, eg., see p8. This window can be used to access any location in the PCIe address map. Using the Qsys DMA controller, you can access two 1MB regions of the host memory. This is not as flexible as a Qsys-to-PCIe bridge with a DMA controller embedded inside it, however, since that component does not exist, you need to figure out whether you can live with the existing solution. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
281 Views

Dave, p8 of which document? I can not find your information in page 1-8 of ug_pci_express.pdf 

 

Thank you very much! 

 

 

--- Quote Start ---  

 

 

Its part of the PCIe core configuration. The core gets configured with two 1MB outgoing translation windows, eg., see p8. This window can be used to access any location in the PCIe address map. Using the Qsys DMA controller, you can access two 1MB regions of the host memory. This is not as flexible as a Qsys-to-PCIe bridge with a DMA controller embedded inside it, however, since that component does not exist, you need to figure out whether you can live with the existing solution. 

 

Cheers, 

Dave 

--- Quote End ---  

0 Kudos
Altera_Forum
Honored Contributor II
281 Views

 

--- Quote Start ---  

Dave, p8 of which document? I can not find your information in page 1-8 of ug_pci_express.pdf 

 

--- Quote End ---  

Wrong document. Earlier in this discussion I directed you to read the document I posted in this thread: 

 

http://www.alteraforum.com/forum/showthread.php?t=35678 

 

The Cyclone IV GX designs meet timing if you turn on multi-corner timing analysis. I'm still working through an Altera SR to get the Stratix IV GX design to meet timing. 

 

Cheers, 

Dave
0 Kudos
Altera_Forum
Honored Contributor II
281 Views

 

--- Quote Start ---  

 

The Cyclone IV GX designs meet timing if you turn on multi-corner timing analysis. 

Cheers, 

Dave 

--- Quote End ---  

 

 

I need to reply about this in that other thread but I have a design which is... 

 

CycloneIV '7 

Hard IP PCIe core 

Avalon MM interface with Qsys design flow 

A fair bit of custom logic, but a fairly simple Avalon interface 

My logic drives DMA to the host memory using the address translation tables as above 

125MHz core clock. 

 

This doesn't meet timing (with a pcie core clock to pcie core clock path setup failure), although the PCIe user guide says this configuration should work in all '7 devices. 

 

I raised an SR but a project archive doesn't include the *.qsys file, and as whoever handled the SR couldn't open the qsys project they simply marked the SR as closed. :eek: 

 

I don't actually need the 125Mhz clock so am using the 62.5MHz option but could have had my fingers burnt here. 

 

 

Nial.
0 Kudos
Altera_Forum
Honored Contributor II
281 Views

Hi Nial, 

 

 

--- Quote Start ---  

 

This doesn't meet timing (with a pcie core clock to pcie core clock path setup failure), although the PCIe user guide says this configuration should  

work in all '7 devices. 

 

--- Quote End ---  

 

 

I've updated the PDF and zip file linked to in the original thread: 

 

http://www.alteraforum.com/forum/showthread.php?p=147114 

 

I edited the C4GXSK constraints.tcl script to use a -7 speed grade, and I can confirm that timing fails for that speed grade. I agree that this is inconsistent with the PCIe Compiler Users Guide. I still can't get the Stratix IV GX x8 design to meet timing either. 

 

Cheers, 

Dave
0 Kudos
Reply