125MHz PCIe application clock on a large cyclone V

Altera_Forum · ‎09-30-2016

We are trying to get a large Avalon bus based design to pass timing using the 125MHz clock from the PCIe block.

The design passed at 62.5MHz (just), but we want to use more than PCIe 1.0 x1 - which requires the faster clock.

I'm slowly adding in components, but don't get very far before timing fails badly.

Just connecting one of the BAR masters to two simple slaves is enough to cause problems - particularly if one of the slaves also writes to the Txs and Cra ports of the PCIe block.

There seem to be far too many logic levels inside the pcie_cv_hip_avmm block to allow for even a small amount of Avalon 'goop'.

None of this is helped by the BAR master generating 64bit Avalon burst transfers to slaves which are all 32bit non-burst (I'm changing them to 1 cycle setup and 1 cyle read wait - fairly generous).

We don't care about the performance of these BAR master accesses - well not given how slow they are guaranteed to be, and given that the x86 host has difficulty generating PCIe requests for more than 8 bytes.

We do need to generate long PCIe TLP into host memory - I've a dma controller to do that.

Has anyone else had a similar issue?

At the moment I've thrown together an Avalon bridge/buffer that forces the bus width and burst adapters to placed between a single master and slave. This adds two clocks of extra delay to read/write and waitrequest (which aren't a problem to us).

I've going to have to test it eventually - unless someone has something equivalent lurking.

Hopefully I won't have to make its slave side support 64bit burst transfers!

Altera_Forum · ‎09-30-2016

Avalon "goop" happens whenever you've got dissimilar masters/slaves. Whenever possible, I try to put a Clock Crossing bridge whenever the bus width or clock domain changes, and a Pipeline Bridge anywhere anything else [bursting] changes.

So from your description, I'd at least try a configuration like:

(BAR master) -> (Avalon-MM Clock Crossing Bridge [x64]) -> (Avalon-MM Pipeline Bridge x32 non-bursting) -> (your slaves)

(your slaves) -> (Avalon-MM Clock Crossing Bridge) -> PCIe ports.

If you really don't care about performance, I'd consider using the clock crossing bridge to put your slaves in a different clock domain if possible.

Another thing to consider is that if you've just plain got too many nodes connecting to the same arbiter / "goop", split up your buses with Pipeline Bridges possibly in a tree topology.

Altera_Forum · ‎10-04-2016

I did try adding pipeline bridges, with more than 3 or 4 slaves per bridge it fails timing.

With that many bridges it is hard to see what is what.

Once I get away from the PCIe BAR avalon master (which has deep levels of logic on it avalon outputs and waitrequest input) the problems seem to be associated with the address comparators and the slave arbiters generating waitrequest.

I'm wondering whether asserting read/write a cycle after the address and giving a timequest a multi-cycle constraint on the address might help?

Telling qsys to ad 1 cycle of additional latency just makes things worse!

I'm not sure where it adds the latch, but it isn't anywhere near the right place.

Altera_Forum · ‎10-04-2016

I think "Limit interconnect pipeline" might be the parameter you can tweak and possibly see an improvement.

https://www.altera.com/en_us/pdfs/literature/hb/qts/qts-qps-5v1.pdf

Search for "Limit interconnect pipeline" and you'll find a couple paragraphs in different sections for tweaking that parameter, which might be all you need to put this topic behind you.

Altera_Forum · ‎10-06-2016

I've read those bits.

Setting "Limit interconnect pipeline" to 1 actually makes things worse.

While I'm not worried about the performance of the BAR masters, I don't want to add a lot of extra clocks to some other master-slave pairs.

One 'problem' signal seems to be the waitrequest signal that is generated when the arbiter finds the slave busy.

I think that needs a 'non-pipelined' bridge that always asserts waitrequest for one cycle.

Altera_Forum · ‎10-11-2016

This is getting silly.

I have to set "Limit interconnect pipeline" to 4 to get any latches in the main Avalon interconnect.

Even then they don't seem to appear in the right place. It is difficult to see where they are because all the lines are crossing at the point where the latches are added.

I'm feeding the BAR master into a simple non-pipelined 32bit avalon bridge that latches all the signals and issues a master read/write transfer two clocks later. By putting a 'false path' on the addresses (only latched once) I get past most the of the address related timing errors.

I've replaced some of the memory slaves with one that has an additional latch of the slave address (get rid of issues with setup time on WE).

But I've still got errors where (I think) waitrequest is looped back through the arbiter.

I've still got more stuff to add as well.