Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
3,166 Views

My AXI4 Lite slave hangs CPU after read. Write transactions work correctly

Hi, 

I have ported my AXI4-Lite to IPbus bridge from Zynq to Altera Cyclone V. 

Unfortunately, it behaves in a strange way in Cyclone V. 

Tests have been done with "devmem" tool in Buildroot compiled Linux. 

 

There is also Altera sys-id component connected to the same bridge using the same clock and reset signals, which works perfectly, so the clock and reset signals are correct and the lwhps2fpga bridge is enabled (at the U-Boot level). 

 

Below are the Signal Tap recordings for write transactions (32-bit write, and 64-bit write). 

 

devmem 0xff200124 32 0xdad98123 

http://www.alteraforum.com/forum/attachment.php?attachmentid=12186&stc=1  

devmem 0xff200000 64 0x12345678fedcba98 

http://www.alteraforum.com/forum/attachment.php?attachmentid=12187&stc=1  

 

Unfortunately, for read accesses the CPU hangs just after the first (in case of 64-bit read) 

or the only (in case of 32-bit read) transaction is finished: 

 

devmem 0xff200000 64  

http://www.alteraforum.com/forum/attachment.php?attachmentid=12188&stc=1  

devmem 0xff20001c 

http://www.alteraforum.com/forum/attachment.php?attachmentid=12189&stc=1  

 

The transaction is completed correctly at the slave level, however it looks like the information about completion doesn't reach the CPU. 

I attach sources of the component. It contains a few IPbus slaves conencted to the bridge. One of registers drives 8 lines conncted to LEDs via "leds" conduit. 

 

The verified Xilinx version is almost the same. It only does not use the WPROT and RPROT ports and uses narrower LEDS conduit (3 LEDS). 

 

What can be the reason of such strange behaviour on the Cyclone V platform? 

 

TIA & Regards, 

Wojtek
0 Kudos
14 Replies
Altera_Forum
Honored Contributor I
188 Views

Sorry, I have forgotten to attach the promised sources. 

Unfortunately, I'm not able to add them now via "edit post/manage attachments". 

Therefore I'm attaching them to that quick reply. 

 

BR, Wojtek
Altera_Forum
Honored Contributor I
188 Views

Hi, 

 

I really don't know how this forum is working. I have successfully posted the message explaining the problem. But when I edited it to remove an unnecessary attachment, it disappeared. 

 

OK. So the problem was caused by the strange behavior (bug?!) of the qsys interconnect, which during the read transaction kept the rready asserted in that cycle, in which it received arready, but it was apparently not able to accept rvalid . As my slave (and bridge) produced RDATA in the same cycle, the bridge asserter RVALID and assumed transaction to be finished. It seems that Qsys Interconnect ignored the RVALID='1' in that cycle and waited for it starting from the next cycle, causing the bus to lock. 

 

In the previous (now lost) post, I have sent waveforms and sources of the bridge, which simply delays RVALID by at least one cycles. This causes read transactions in Altera SoC less efficient than in case of Xilinx Zynq, which do not suffer from this problem. 

 

Now I'd like to present another solution, which should work both with Altera and Xilinx SoCs with the same efficiency. 

The ARREADY is issued in the same cycle in which the ARVALID is asserted. The address is latched and in the next cycles is sent to the IPbus from the bridge internal register (in the previous solution I forced the master to keep the address constant, so I could avoid multiplexing of address lines). 

 

The IPbus slave performs read in the next cycle, and as RREADY is still high, the RVALID is asserted in that cycle (of course if the IPbus slave requires additional wait states, assertion of RVALID is delayed appropriately.) 

I have also added similar optimization for the write access. 

 

Below are the waveforms showing operation of that version of the bridge: 

devmem 0xff200000 64 0xdeadbeefabba1234 

https://www.alteraforum.com/forum/attachment.php?attachmentid=12235  

devmem 0xff200000 32 0x12345678 

https://www.alteraforum.com/forum/attachment.php?attachmentid=12236  

# devmem 0xff200000 64  

0xABCDFEDCDEADBEEF 

https://www.alteraforum.com/forum/attachment.php?attachmentid=12237  

# devmem 0xff200000 32 

0xDEADBEEF 

https://www.alteraforum.com/forum/attachment.php?attachmentid=12238  

 

The sources are also attached. 

 

Regards, 

Wojtek
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

OK. So the problem was caused by the strange behavior (bug?!) of the qsys interconnect, which during the read transaction kept the rready asserted in that cycle, in which it received arready, but it was apparently not able to accept rvalid . As my slave (and bridge) produced RDATA in the same cycle, the bridge asserter RVALID and assumed transaction to be finished. It seems that Qsys Interconnect ignored the RVALID='1' in that cycle and waited for it starting from the next cycle, causing the bus to lock. 

--- Quote End ---  

 

 

Do you have any other devices in the interconnect, such as an AXI bridge?
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

Do you have any other devices in the interconnect, such as an AXI bridge? 

--- Quote End ---  

 

 

In that version of the project there was only standard sysid_qsys block connected as an "Avalon Memory Mapped Slave" and my axi2ipb bridge. 

 

With best regards, 

Wojtek
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

In that version of the project there was only standard sysid_qsys block connected as an "Avalon Memory Mapped Slave" and my axi2ipb bridge. 

 

With best regards, 

Wojtek 

--- Quote End ---  

 

 

Could you put an SR in on this? I had a discussion with them just recently about the Qsys interconnect components, and bad behavior, and I think they are only recently really debugging AXI. I could be wrong, but anything to get a more robust Qsys -> AXI interconnect is good in my book.
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

Could you put an SR in on this? I had a discussion with them just recently about the Qsys interconnect components, and bad behavior, and I think they are only recently really debugging AXI. I could be wrong, but anything to get a more robust Qsys -> AXI interconnect is good in my book. 

--- Quote End ---  

 

 

OK. I have created Altera SR#: 11234425: Qsys interconnect incorrectly incorrectly handles AXI4-Lite read transactions (New Service Request)
Altera_Forum
Honored Contributor I
188 Views

My colleague Adrian Byszuk has found the incompatibility of my bridge with the AXI4 specification. 

According to thre AXI4 specification: http://www.gstitt.ece.ufl.edu/courses/fall15/eel4720_5721/labs/refs/axi4_specification.pdf page A3-36 & A3-37, "On master and slave interfaces there must be no combinatorial paths between input and output signals." 

Current implementation boosts performance by violating that requirement.  

 

At the moment i'm not sure if that incompatibility justifies the behaviour of the Qsys interconnect.
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

At the moment i'm not sure if that incompatibility justifies the behaviour of the Qsys interconnect. 

--- Quote End ---  

 

 

I guess the question is whether or not removing that works better? What would be interesting is taking a look at the optimization section of the fitter to see if it is optimizing out part of the logic.
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

I guess the question is whether or not removing that works better? What would be interesting is taking a look at the optimization section of the fitter to see if it is optimizing out part of the logic. 

--- Quote End ---  

 

 

The question is not about resources consumption but about the speed of the interface. I've speed up transactions by combinational control of ARREADY, AWREADY and some other signals. However it contradicts the statement, that "On master and slave interfaces there must be no combinatorial paths between input and output signals." 

 

So it works, it is fast, but it does not comply to the AXI4 specification :cry:. 

By the way, the whole design is available at http://opencores.org/project,ax4lbr
Altera_Forum
Honored Contributor I
188 Views

Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

You may want to peruse this document: https://silica.avnet.com/wps/wcm/connect/e2612871-ffdd-470e-b355-f12e0f570395/silica_xilinx_designin... 

Page 3. 

--- Quote End ---  

 

 

That's exactly, what I have used to design my bridge. See http://opencores.org/websvn,filedetails?repname=ax4lbr&path=%2fax4lbr%2ftrunk%2frtl%2faxil2ipb.vhd ;) 

 

However this link is a "moving target"... 

 

BTW. In this document there is nothing about avoiding combinational connections between input and output signals. That's why I have run into this incompatibility. 

 

Thanks, 

Wojtek
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

That's exactly, what I have used to design my bridge. See http://opencores.org/websvn,filedetails?repname=ax4lbr&path=%2fax4lbr%2ftrunk%2frtl%2faxil2ipb.vhd ;) 

 

However this link is a "moving target"... 

 

BTW. In this document there is nothing about avoiding combinational connections between input and output signals. That's why I have run into this incompatibility. 

 

Thanks, 

Wojtek 

--- Quote End ---  

 

 

Can you show the combinatorial connections

Now I already perused your code. Immediately I can't make much of it as it is rather complicated. I had expected to see a state-machine ... a state machine is worth a thousand equations 

 

Regards, 

 

Jos
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

Can you show the combinatorial connections

Now I already perused your code. Immediately I can't make much of it as it is rather complicated. I had expected to see a state-machine ... a state machine is worth a thousand equations 

 

--- Quote End ---  

 

 

Yes, the state machine is much clearer. However this design was based on an attempt to make the bridge as fast as possible. 

Therefore the main part is a combinational process, which translates signals between the AXI4 Lite side and the IPbus (or WishBone in the second bridge) as fast as possible. 

So the combinational connections are just results of the combinational process used here. An example may be, that in the write transaction: 

 

If S_AXI_AWVALID = '1' and S_AXI_WVALID = '1' and there is no uncompleted transaction, 

then if S_AXI_WSTRB = "1111", finally S_AXI_AWREADY and S_AXI_WREADY are set to '1'. 

 

The above is done in a fully combinational way, creating the combinational connection between the input signals (AWVALID, WVALID, WSTRB ) and output signals (AWREADY, WREADY) at the AXI4-Lite side of the bridge. 

 

Probably to comply to the requirement of avoiding combinational connections, I'll have to rewrite it as a state machine (hopefully for translated signals it will be the Mealy one). 

However, almost for sure it will result in slightly slower (but safer) operation... 

 

Regards, 

Wojtek
Altera_Forum
Honored Contributor I
188 Views

 

--- Quote Start ---  

Yes, the state machine is much clearer. However this design was based on an attempt to make the bridge as fast as possible. 

Therefore the main part is a combinational process, which translates signals between the AXI4 Lite side and the IPbus (or WishBone in the second bridge) as fast as possible. 

So the combinational connections are just results of the combinational process used here. An example may be, that in the write transaction: 

 

If S_AXI_AWVALID = '1' and S_AXI_WVALID = '1' and there is no uncompleted transaction, 

then if S_AXI_WSTRB = "1111", finally S_AXI_AWREADY and S_AXI_WREADY are set to '1'. 

 

The above is done in a fully combinational way, creating the combinational connection between the input signals (AWVALID, WVALID, WSTRB ) and output signals (AWREADY, WREADY) at the AXI4-Lite side of the bridge. 

 

Probably to comply to the requirement of avoiding combinational connections, I'll have to rewrite it as a state machine (hopefully for translated signals it will be the Mealy one). 

However, almost for sure it will result in slightly slower (but safer) operation... 

 

Regards, 

Wojtek 

--- Quote End ---  

 

 

Now your original code as hidden state! Where a State Machine proper will clearly indicate what your intentions are, in the case of a bunch of equations one has to decode all the equations and draw a timing diagram to understand what is going on. I even write State Machines with 2 (yes even only 2) states! 

Now whether it is Mealy or Moore or a hybrid form doesn't really matter, that's for the academics. I divide my state machines in three parts:  

  1. a combinatorial part to decide on the next state, plus combinatorial outputs 

  2. a registered part with a reset to register the state, and to register possible output signals that require a known reset state 

  3. a registered part for the dataflow-type signals 

 

I happened to work on an AXI4-Lite to Avalon MM bridge too :) 

A state machine will make it easier to follow what the AXI4 master does, e.g. for a write transaction there a 3 transfers, address, write and response. These 3 may arrive in some sequence: A->W->R, A->WR, AW->R or even AWR.  

 

Regards, 

 

Josy
Reply