FPGA Intellectual Property
PCI Express*, Networking and Connectivity, Memory Interfaces, DSP IP, and Video IP
6376 Discussions

altera_msgdma with unaligned access: lost data and stuck busy

etome
Beginner
297 Views

Hello,

I've developed an application on a Terasic DE10-Lite board which has a MAX10 and an SDRAM chip onboard, using Quartus 18.1 for this. In my design I have used a nios II/f core (w/o cache), the altera_msgdma IP and have developed a small driver to use it. Testing it it I've discovered that sometimes the device would never clear the busy bit, making it appear stuck and the written data is corrupted. I've provided a patch that solves it, .sof, .elf and .qsys and .stp files, below are all the details.

The altera_msgdma is configured with mode MM-to-MM and transfer_type set to unaligned access, no burst.

I'll use the pdf UG-01085 | 2018.09.24 as reference for everything: https://www.intel.com/content/www/us/en/docs/programmable/683130/18-1/introduction.html

Test application: make an array of bytes (uint8_t) on an on-chip memory (altera_avalon_onchip_memory2), fill it with increasing values (starting from 0). Transfer 500 times using the DMA device (meaning that the same descriptor is placed 500 times in the descriptor FIFO) a number of bytes to SDRAM (altera_avalon_new_sdram_controller). Wait for the DMA to end the transfers (polling the status register), check if the copied data matches, then increase the number of bytes to transfer and repeat. The number of bytes starts from 1 and goes up to the array size.

To make it clearer: src array has values (uint8_t) 0, 1, 2, ..., 255, 0, 1, 2, ..., dma copies 1 byte from src to dest 500 times, then dma copies 2 bytes from src to dest 500 times, and so on. The test never ends, waiting for the DMA to assert the Descriptor Buffer Empty bit (table 298 p. 330) when the number of bytes to transfer is 25. This is repeatable, but I think specific to my setup.

I run this test under GDB without breakpoints and, after more than enough time, pause it, the dma has (table 297 on page 329 and table 298 p. 330) the status register set to 1, and it stays like that until the end of times. This is what I can see from GDB when I stop it:

 

52        while (dma_jolly.csr_port.status.get().descriptor_empty != 1) {
1: /x dest = {_M_elems = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xba <repeats 975 times>}}
(gdb) p /x src
$6 = {_M_elems = {0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e,
    0x1f, 0x20...}}

 

As you can see, dest does not have the same values as src, it should start with 0x0, the DMA has "changed" the order of the written data.

After many trials I've tried to use Signal Tap Logic Analyzer in Quartus: there is a hardware bug that "hangs" the device, it exists only if the IP is configured with TRANSFER_TYPE set to unaligned access (in platform designer).

etome_0-1715008100867.png

In the screenshot you can see the last node [...]read_master:read_mstr_internal|scfifo:the_master_to_st_fifo|usedw[4..0] as a bar chart, this is a FIFO in the read_master of the altera_sgdma that overflows. The issue is in the read_master module, its task is reading from the MM Avalon bus and forwarding data to the write_master, it uses a FIFO and has to stop requesting data from the bus early "enough" to not overflow that FIFO. When the module is configured to allow unaligned access it creates the signal (line 751 of read_master.v)

assign too_many_pending_reads = (({fifo_full,fifo_used} + pending_reads_counter) > (FIFO_DEPTH - (maximum_burst_count * 3)));

 It assumes a pipeline depth of 3, but it really is 4 deep, because of a barrel shifter in module MM_to_ST_Adapter.v, lines 289-300, that needs an extra cycle to empty itself, after all the reads from the Avalon bus are already done.

The .stp file shows the first time the overflow happens at sample -2472.

src is at addess 0x3e90, dest is at 0x4000000

I have no issue uploading every source file, but it requires an upstream copy of gcc 13.2 to compile, I'm attaching only my main .cpp file to show how the test works.

Labels (1)
0 Kudos
7 Replies
ShengN_Intel
Employee
271 Views

Hi,


Could you provide the whole design file for duplicating?

There're some software library files missing.


Thanks,

Regards,

Sheng


0 Kudos
etome
Beginner
217 Views

Hello Sheng,

sorry for the delay, I've cleaned up the software files and left only what is needed for the test to compile.

My setup is a bit "twisted", you will likely need a copy of the toolchain, I've attached it. I had to modify the Makefile to have it work under WSL (to escape paths), and there is a patch for the bsp makefile (it removes 2 source files) under the buffering folder.

If you start with a clean bsp, then I'd start with

 

patch ../buffering_fast_bsp/Makefile bsp_makefile.patch

 

The toolchain is compiled to use picolibc and it is used for both the BSP and the test application, so you'll have to copy the file Software/buffering/picolibc_custom.specs to the BSP directory (as I've done under buffering_fast_bsp).

I've run make clean_all to clean all the build files, I think you can use the BSP I've provided and not bother with all that above.

The toolchain was compiled under ubuntu 22.04, it is not static linked.

Cheers,

etome

0 Kudos
ShengN_Intel
Employee
181 Views

Hi etome,


May I know where is the attached file?

Better that you can attach the whole software folder and the .qsf as well? Thanks.


Regards,

Sheng


0 Kudos
etome
Beginner
163 Views

@ShengN_Intel wrote:

May I know where is the attached file?


I'd like to know myself... Sorry for the delay, I'm attaching all you've requested to this reply.

Thanks,

etome

0 Kudos
etome
Beginner
163 Views

Here is the toolchain

0 Kudos
ShengN_Intel
Employee
128 Views

Hi etome,


Could you provide also the sof and elf files that works fine after applying the patch?

I'll report this to internal team. Please stick with the workaround in the mean time.


Thanks,

Regards,

Sheng


0 Kudos
etome
Beginner
118 Views

Hello Sheng,

sure. This is what the test outputs when it is working fine:

nios2-terminal: connected to hardware target using JTAG UART on cable
nios2-terminal: "USB-Blaster [USB-0]", device 1, instance 0
nios2-terminal: (Use the IDE stop button or Ctrl-C to terminate)

DMA bugger

=== continuous loop ===

[dma_bugger] success = 999, fail = 0, setup time = 12517544, transfer time = 129804940
[dma_bugger] success = 1998, fail = 0, setup time = 12517515, transfer time = 129805544
[dma_bugger] success = 2997, fail = 0, setup time = 12521287, transfer time = 129805067
[dma_bugger] success = 3996, fail = 0, setup time = 12517509, transfer time = 129805073
[dma_bugger] success = 4995, fail = 0, setup time = 12521300, transfer time = 129804962
[dma_bugger] success = 5994, fail = 0, setup time = 12517506, transfer time = 129804818
[dma_bugger] success = 6993, fail = 0, setup time = 12518670, transfer time = 129805148
[dma_bugger] success = 7992, fail = 0, setup time = 12517509, transfer time = 129805062
[dma_bugger] success = 8991, fail = 0, setup time = 12517509, transfer time = 129805089
[dma_bugger] success = 9990, fail = 0, setup time = 12517512, transfer time = 129806820
[dma_bugger] success = 10989, fail = 0, setup time = 12518671, transfer time = 129804916
[dma_bugger] success = 11988, fail = 0, setup time = 12521309, transfer time = 129804786
[dma_bugger] success = 12987, fail = 0, setup time = 12517512, transfer time = 129805243
[dma_bugger] success = 13986, fail = 0, setup time = 12517509, transfer time = 129805216

Use the same elf as before, I have attached the Quartus project with the patch applied and built.

Thanks,

etome

0 Kudos
Reply