FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6356 Discussions

Problems with SGDMA through PCIe in several motherboards

Altera_Forum
Honored Contributor II
1,468 Views

Hi All, 

 

We have finalized a project for high speed video transfer. It applies a PCI Express Gen1 4x interface between the digital HD video input and central RAM of an x86 based computer. 

 

The design is based on Cyclone IV GX device with Altera PCIe Compiler and SGDMA IP cores. The cards are working well in our 3 years old Asus test motherboard and a newer Super Micro while they work only in several special PCIe slots in the newest motherboards, although the systems always recognize our cards. 

 

The debugger showed the driver can reach and set the registers of the device but the SGDMA core does not transfer any byte through the PCIe Compiler. 

 

Has anybody experienced something similar? 

 

Best regards, 

Istvan
0 Kudos
10 Replies
Altera_Forum
Honored Contributor II
212 Views

Hi Guys, 

 

Can anyone help in this issue? We still have it:( 

 

Now I am testing the card in a nice Asus P6T7WS Supercomputer motherboard but it does not work in any of the 7 PCIe slots, although it still works well in the old Asus. The software can see it, read and write the internal registers but the data stops in the SGDMA. Maybe some configuration is missing but where? In the software? In the HW/SOPC Builder? 

Or what's going on? 

 

Any idea? 

 

Istvan
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

I have no practical experience with the SGDMA example, except for looking at the generated code and … deciding to write my own application code connecting to the PCIe IP on the Avalon ST layer. 

 

I cannot be of big help here, but such issues typically arise if one (=the old) chipsets or bridge chips are forgiving about some irregularities in the PCIe transactions while another (=the new) chipsets or bridge chips are more strict. 

 

For example, PCIe requires the application to issue requests with 4 DW headers only for addresses where the upper 32 bits are nonzero. If the upper address bits are zero, a 3 DW header request must be used. Now, some systems work without an issue with just 4 DW header requests, even for addresses in the lowest 4 GiB range. You change the motherboard and – bang! – nothing works because the illformed requests are dropped. 

 

Other sources of trouble, when changing the system, are changes in the timing that uncover race conditions that were hidden due to limitations of the old system. 

 

For example, consider an MSI that should push out all written data so that the interrupt service routine (or bottom halve) can read this value and act on the new value accordingly. Now, if the application is wrongly designed so that the MSI could overtake the final write request, the old system might act slow enough on the MSI that the data is already fresh when read by the ISR. But the new system could be faster in activating the ISR or slower in updating the written data, resulting in the ISR reading the old value and probably finishing without having done anything. Of course, in such a case it’s not the fault of either the old or the new chipset, but the application was buggy. 

 

Similar race conditions can be present in the driver as well, reading/writing not strictly in correct order, and at the moment the system takes the chance of changing order within spec, things start to mysterically fail. Memory barriers are something most developers don’t care about much and instead think they can rely on volatile memory locations. 

 

Another source of trouble when changing the system architecture might be the setting of the memory BARs, e.g. difference between prefetchable and non-prefetchable, which might be handled differently by each system, e.g. by prefetching data for read, or by collecting multiple write requests into a single request. 

 

As I said, not much concrete help, but maybe something to consider. 

 

– Matthias
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

We had a lot of problems with (I think) the PCIe physical layer repeatedly resetting. Single slave transfers worked - but very, very slowly (even slower than when it is working 'properly'), under the same conditions a longer burst transfer could quite possibly get retried for almost ever. 

Don't know if this was a hardware SERDES(?) issue or was fixed by changing some of the PCIe parameters at one of the ends (on pcb link to a small ppc).
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

Hi, 

 

Thanks for the responses! 

The chaining DMA example design was adopted to our hardware for a short trial and it was working well in the same motherboard where our design doesn't. So I don't think this is a PCB or a SERDES issue. 

 

Regards, 

Istvan
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

Hello Again, 

 

I have rebuilt the PCIe/SGDMA block in Qsys instead of SOPC Builder. Now the design is doing more or less the same in the old motherboards as in the new ones. So it does not work anywhere. 

 

Is there any work with other parts of the system? E.g. with the root port? Should something be configured over the endpoint configuration registers by the SW/driver? Maybe something is missing on this field. 

 

Regards, 

Istvan
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

Hello, 

 

We restarted this project. A small new project has been set up containing only the PCIe Compiler and SGDMA cores. The latter one is fed continuously by a fixed value for test purposes. A fully new and clean driver has been written too. The chip is the same as before (Cyclone IV GX 22). 

 

The card behaves exactly the same as the original design so there should be a fault somewhere in it. I do not think these two Altera IP cores are so bad. I think we have to examine the SW/driver side also. 

 

Maybe this is just a small initialization or configuration mistake. Does the PCIe bus have any fixed configuration process in SW side? From where could we learn this process? 

The SW developer guy was examining the Jungo driver supplied to the PCIe demo board but he said that source was not enough. 

 

Regards, 

Istvan
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

Hi, Istvan ! 

Did You solve Your problem? 

What other QSys components did You use in Your project besides SGDMA and PCIe Compiler? 

Did You try Your project on Cyclone IV GX FPGA Development Kit to see how it behaves in modern motherboards?
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

Hi Aphraton, 

 

The design was tried on the C IV GX dev board also with the same results. 

It utilized two pipline bridges and totally 10 PIOs. The SGDMA was fed via a special self-made SOPC Builder/Qsys block. 

 

The whole design was rebuilt in the Quartus II v12.0 when it was released. Surprisingly it was working very well so the issue has been solved. I would be interested what was changed in that Quartus version... 

 

Regards, 

Istvan
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

 

--- Quote Start ---  

 

The SGDMA was fed via a special self-made SOPC Builder/Qsys block. 

--- Quote End ---  

 

It seems, You used this self-made block instead of Clocked Video Input to feed SGDMA with HD video? 

If so, then why didn't You use Clocked Video Input?
0 Kudos
Altera_Forum
Honored Contributor II
212 Views

It was not a video input. It was a data input from an external SDRAM. CVI could not be used in that case.

0 Kudos
Reply