Lack of capability with Qsys conduits

Altera_Forum · ‎02-01-2012

Hello all,

My apologies in advance, but this is a long-winded one... my inner monologue seems to be broken today. :D

I have a bit of an outstanding issue with what, to me anyways, seems like a gaping hole in the the capabilities offered by Qsys (and, prior to that, SoPC Builder). To illustrate it, I'll provide the specifics of the use case I'm presently being affected on.

I have a design which is producing a number of different packet streams, bound for transmission over a 1G Ethernet interface. The MAC and the arbitration stage are custom-designed - this is *not* using the ATSE, nor can it. So the suggestion "you should just use the ATSE" is a non-starter. The arbiter provides N "client" interfaces, and uses a couple of handshaking signals in addition to the data. Each client asserts a "valid" signal when it has a packet to send, along with the first byte of the packet header. This amounts to an arbitration request. When the client has been selected by the Tx arbiter, it then asserts an "advance" signal to the client for a few cycles - this instructs the client to feed the next few bytes into the arbiter's pipeline while it is enforcing the IFG and beginning to produce the packet's preamble and SFD sequence. At exactly the right moment in the sequence, another signal "ack" is pulsed for one clock, which directs the client to proceed with the remainder of the packet all the way to its end. The client lowers its "valid" signal to end the packet. All of these signals are also qualified by a "clockenable" signal, allowing the interface to throttle the data rate for 100M links as well as 1G.

My first thought was to implement the Qsys component wrapper by adapting the interface described above to an Avalon-ST link. However, it pretty quickly became apparent that this wasn't really a good fit. First, an Avalon-ST sink has only one signal to send back to the source, "ready" - I would somehow need to multiplex the "advance", "ack", and even "clockenable" into this one signal. There's no clean way to do that - I'm not saying it's impossible, but it would be ugly and probably necessitate changes to the existing core logic being wrapped for Avalon, which is not really acceptable anyways since it is already running in a lot of designs on different targets. Second, the "valid" signal in Avalon-ST can't be used as an arbitration request signal; per the Avalon-ST spec, data symbols are always transferred on every cycle in which valid is asserted, it's merely up to the Source to deassert valid within the appropriate ready latency. I could, of course abuse the spec and just do whatever I want with the signals, but... well, I'll get to that.

So instead I opted to use the free-form "conduit" interface type. This worked fine for my initial design which had only one packet source and consequently no arbitration stage. Then I added it in along with another packet source, and trouble ensued. As I added a Qsys wrapper to the arbiter, I created several conduit interfaces, giving them names like "client_tx_0", "client_tx_1", etc. Since the signals down in the HDL module also require unique names, they similarly received names like "Client_Tx_Data_0", "Client_Tx_Valid_0", etc. But when I tried to bolt this up to my transmitting clients, everything compiled but.. nothing worked!

After a little investigation, I found that the conduits weren't actually connecting anything together. In hindsight, I should have realized they wouldn't - the conduit interface has no real "roles" for its signals, they all are just uniformly given the psuedo-role of "export". The only scheme used to make the logical connections between the two ends of a connection made in Qsys is by an *exact name match*. This just plain old does not scale; clearly, I can't design my FIFO-based Avalon-MM Ethernet peripheral for the NIOS2 to have the signal names "Client_Tx_Valid_3", "Client_Tx_Ack_3", etc. with the intent that it will *only* ever be used in a design where it is bolted up to the fourth interface of our Tx arbiter block! In fact, I might have two NIOS2's, each with one of these peripherals, and both sharing the same Ethernet port through an arbiter - how would that work?

As a result, I have to resort to a very, very silly workaround - I export these interfaces from the Qsys design, and then hand-wire things in my top-level HDL. Very ugly, as there is no way to visualize these connections in the Qsys GUI (in fact, they just look "unconnected" since they're exported).

Really, the simple solution for Altera to implement would be to allow the use of free-form signal roles within conduit interfaces. Instead of:

add_interface client_0_tx conduit end

add_interface_port client_0_tx Client_Tx_Enable_0 export Output 1

add_interface_port client_0_tx Client_Tx_Lock_0 export Input 1

add_interface_port client_0_tx Client_Tx_Data_0 export Input MAC_PORT_WIDTH

add_interface_port client_0_tx Client_Tx_Valid_0 export Input MAC_PORT_WIDTH/8

...

what I would *really* like to be able to do is this:

add_interface client_0_tx conduit end

add_interface_port client_0_tx Client_Tx_Enable_0 enable Output 1

add_interface_port client_0_tx Client_Tx_Lock_0 lock Input 1

add_interface_port client_0_tx Client_Tx_Data_0 data Input MAC_PORT_WIDTH

add_interface_port client_0_tx Client_Tx_Valid_0 valid Input MAC_PORT_WIDTH/8

...

What the latter implies is the ability to assign roles to each of the signals, using whatever words I want. When Qsys generation occurs, all it needs to do is look for the matching role to connect the signals up. If a signal's role is "export", it can instead do the legacy behavior, which is to look for an exact name match. To keep things clean, Qsys could perform a DRC on each of your interfaces to make sure you're either using the "new" semantics, i.e. ascribing a unique role to each of your signals, or calling them *all* "export". Mixing user-defined roles as well as export *could* be figured out, but it would be sort of sloppy IMHO.

I would think this should be pretty easy to implement; without it, there's no good solution. Probably the biggest bit of evidence in my mind that this is, as I've so carelessly characterized it, a "gaping hole" is the fact that some of Altera's own IP blocks with similar requirements solve the issue by sort of abusing the Avalon-ST interface standard. Take the altera_vectored_interrupt_controller for example. It supports up to 32 IRQ inputs, each of which can come from different modules within the system. Clearly, they can't expect these other modules to name their IRQ lines something like "vic_irq_5", using precognition to decide which input to the VIC someone is going to use (or, for that matter, the fact that they even intend to use the VIC!) So they make each interrupt client interface an Avalon-ST interface. But... it only uses two signal roles, valid and data. The valid signal is, actually, the IRQ pulse, and the data signal concatenates an ID and a configuration vector. Does it work? Yes, of course - but is it really being used as a streaming data source? No, not at all.

What would actually be an even better solution to the general problem would be to allow free-form declaration of different interface types. These could even fundamentally behave exactly the same as conduit interfaces do, with the extension of signal roles as I described earlier. What if the Altera vectored interrupt controller had multiple interfaces of type "vectored_irq", and had the roles "irq", "id", and "config"? Wouldn't that be a much more literate way to do this, and much more illuminating to whomever looks at a design in the Qsys GUI than to see a bunch of Avalon-ST connections? And to the point of difficulty in implementation - again - this hardly needs to behave much differently than the generation / elaboration of conduits today.

Just my three cents. ;)

Altera_Forum · ‎02-01-2012

--- Quote Start ---

per the Avalon-ST spec, data symbols are always transferred on every cycle in which valid is asserted, it's merely up to the Source to deassert valid within the appropriate ready latency

--- Quote End ---

In my experience, with '0 ready latency' the data only gets transferred when both 'ready' and 'valid' are asserted. I have never used anything else than 0 ready latency, though. I don't see the use (in what I design) as well.

To my idea you could achieve what you want (to some extent) using the two ST-signals at hand: the source valid goes high to signal the request, the arbiter keeps ready de-asserted until ready to accept and then asserts ready for one cycle only, the source in turn de-asserts it's valid and then waits for the ready to be re-asserted again by the arbiter to start the rest of the stream. Again assuming 0 ready latency.

valid 000011111000000111111111111
ready 000000001000001111111111111

But you are quite right: Qsys is a bit of a straitjacket and it certainly would be an enhancement if we could add the kind of signalling you propose, as well as other out_of_band signals. I personally had a hard time with the enforced Data Width / Symbol Width between sinks and sources. E.g. an ST-splitter is used to divide a data stream into two processing chains, each operating on part of the data, Qsys doesn't allow this. So I had to write my own 'adapter' for this. I can't deny it was fun and I learned a bit of Tcl.

Altera_Forum · ‎02-02-2012

Ah, you're correct about ready latency of zero - I'm just so accustomed to ready latency of one since that's what's employed with the video "profile" atop Avalon-ST. The reason for that is just for sake of pipelining, it gets difficult when you have a long pipe that has to stop *on a dime* as soon as ready deasserts from your sink. Latency of one allows you to have a little bit of inertia, and in video designs we're often pushing the limits of devices' clock rates, even if that's just doing wimpy old 1080p on a Cyclone-class device :)

Sadly though the same calculus has to apply in my case - no sooner than I have standard-def video happily bridging across a 1G network, I'm immediately off to hot-rodding everything to run 3G-SDI across 10G Ethernet and all the clock rates go up for both the video and network clock domains. So I'm reluctant to predicate everything on being able to stretch across parts of the chip and still meet timing with the ability to throttle the pipeline with zero cycles of "notice" from the sinks. I'll stick with my workaround for now and see if I can get anyone to pay attention to me.

For what it's worth, I can do *exactly* what I want to do with X**inx Platform Studio, have been for several years...

...maybe that will do it :D

Altera_Forum · ‎02-02-2012

My pipelined designs are fully 'systolic' so they can stop on every clock. This sometimes creates long combinatorial ready-to-valid paths but I devised a component to break that combinatorial chain. (Altera could add this circuit in silicon, it would let the pipeline run at very near the speed-limit of the device)

I had developed my own streaming framework which turned out to be a sub-set of the Avalon ST, but with an inherent ready latency of 0. So I never even tried to understand how a non-0 ready latency works. (Don't have to ...).

I switched to using ST (upgrading most of my library) as connecting in Qsys is a lot nicer (and less error prone) then doing that in the text-editor. The draw-back are the conduits. If I were faced with your 'issue' I might try to develop a custom ST-adapter to translate a 'numbered conduit' into a 'unnumbered conduit' which you then can connect further down or up. The drawback is that it fills up the Qsys connection diagram. But as you rightfully deplore we're then doing the job Qsys should do on its own initiative ...

Actually I have ST-component with two conduits, just a simple std_logic_vector info needed by two other components. I just checked the RTL-diagram and they are connected, although the outputs have names LineCounterA and LineCounterB and the two receivers are named LineCounter. Maybe I'll have to try a more complex connection to see when it breaks.