Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
20777 Discussions

ALT_AVALON_DMA_NSLOTS - how to increase FIFO size for DMA scheduler

Altera_Forum
Honored Contributor II
1,314 Views

Looking into HAL driver code for DMA I've found, that scheduler array is limited by ALT_AVALON_DMA_NSLOTS macro, which is 

defined in altera_avalon_dma.h as (4). 

 

This apparently means, that no more than 4 records for rx and tx DMA can be scheduled by consecutive calls to tx send and rx prepare. 

 

What is even worse is, that one cannot schedule ALT_AVALON_DMA_NSLOTS, but only ALT_AVALON_DMA_NSLOTS-1 transactions. 

Calling functions alt_dma_rxchan_depth and alt_avalon_dma_space it returns correctly '3' in both cases. 

 

Now, my big trouble is, that I want to schedule 8 of those transactions in one go. So I just though to increase ALT_AVALON_DMA_NSLOTS in one of my files, and that it would do the job. 

 

Well, it does not. printing ALT_AVALON_DMA_NSLOTS and ALT_AVALON_DMA_NSLOTS_MSK gives correct values, but depth and space functions return only '3' as the define was not taken into account. 

 

Looking more I'v found, that alt_sys_init.c includes already altera_avalon_dma.h, so probably this is the place, where I should put my '#define ALT_AVALON_DMA_NSLOTS (8)' macro before the inclusion of the altera_avalon_gma.h

 

Well, this function only partially: whereas alt_dma_rxchan_depth returns now correctly '7', the alt_avalon_dma_space returns again '3'. 

 

No idea what's going on here. Did someone already try to setup different FIFO depth? How that should be done correctly? The issue with alt_sys_init.c is, that it gets overwritten every time new BSP is generated. Hence I presume this should be somehow setup by TCL command when using create-this-bsp..... 

 

any help kindly appreciated.
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
390 Views

I'll answer myself: 

 

after quite a lot of messing around the ALT_AVALON_DMA_NSLOTS must be setup in BSP editor in bsp_cflags_optimization. 

 

So in my case this contains: 

 

-Os -DALT_AVALON_DMA_NSLOTS=16 

 

 

this can be setup as well in TCL script used to generate BSP: 

 

set_setting hal.make.bsp_cflags_optimization "-Os -DALT_AVALON_DMA_NSLOTS=16" 

 

 

Recompile BSP and you get support for 15 DMA records. Apparently this works.
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

An aside - it is worth checking that -O3 doesn't actually generate smaller code overall than -Os.

0 Kudos
Altera_Forum
Honored Contributor II
390 Views

well, I need faster than smaller. Iniche TCP is quite demanding in terms of CPU power I found

0 Kudos
Altera_Forum
Honored Contributor II
390 Views

-O3 is likely to generate faster code than -Os, whether or not it is smaller. 

It is quite likely that the source code can be changed to significantly improve the performance. 

OTOH that is quite hard work on something as large as a TCP/IP stack. 

Typically it involves: 

1) Stopping the compiler spilling registers to stack (may involve reducing the number of 'live' values in the code). 

2) Assigning intermediate values to locals if they are used multiple times and a memory write could alias the source. 

3) Forcing values be read from memory early to avoid pipeline stalls. 

4) Put as much data (and io) where it can be referenced relative to %gp (reduces code size and pressure on registers) 

5) Getting the static branch prediction right for every branch (and then disabling the dynamic branch predictor on the hidden menu). 

6) Be willing to modify gcc. 

 

At a guess you can get 30%+ improvement - unless the code has already been treated that way!
0 Kudos
Altera_Forum
Honored Contributor II
390 Views

Allright then. I'll give it a try now (and let it run through the night to see whether it hangs :)

0 Kudos
Reply