- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Looking into HAL driver code for DMA I've found, that scheduler array is limited by ALT_AVALON_DMA_NSLOTS macro, which is
defined in altera_avalon_dma.h as (4). This apparently means, that no more than 4 records for rx and tx DMA can be scheduled by consecutive calls to tx send and rx prepare. What is even worse is, that one cannot schedule ALT_AVALON_DMA_NSLOTS, but only ALT_AVALON_DMA_NSLOTS-1 transactions. Calling functions alt_dma_rxchan_depth and alt_avalon_dma_space it returns correctly '3' in both cases. Now, my big trouble is, that I want to schedule 8 of those transactions in one go. So I just though to increase ALT_AVALON_DMA_NSLOTS in one of my files, and that it would do the job. Well, it does not. printing ALT_AVALON_DMA_NSLOTS and ALT_AVALON_DMA_NSLOTS_MSK gives correct values, but depth and space functions return only '3' as the define was not taken into account. Looking more I'v found, that alt_sys_init.c includes already altera_avalon_dma.h, so probably this is the place, where I should put my '#define ALT_AVALON_DMA_NSLOTS (8)' macro before the inclusion of the altera_avalon_gma.h. Well, this function only partially: whereas alt_dma_rxchan_depth returns now correctly '7', the alt_avalon_dma_space returns again '3'. No idea what's going on here. Did someone already try to setup different FIFO depth? How that should be done correctly? The issue with alt_sys_init.c is, that it gets overwritten every time new BSP is generated. Hence I presume this should be somehow setup by TCL command when using create-this-bsp..... any help kindly appreciated.Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll answer myself:
after quite a lot of messing around the ALT_AVALON_DMA_NSLOTS must be setup in BSP editor in bsp_cflags_optimization. So in my case this contains: -Os -DALT_AVALON_DMA_NSLOTS=16 this can be setup as well in TCL script used to generate BSP: set_setting hal.make.bsp_cflags_optimization "-Os -DALT_AVALON_DMA_NSLOTS=16" Recompile BSP and you get support for 15 DMA records. Apparently this works.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An aside - it is worth checking that -O3 doesn't actually generate smaller code overall than -Os.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
well, I need faster than smaller. Iniche TCP is quite demanding in terms of CPU power I found
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-O3 is likely to generate faster code than -Os, whether or not it is smaller.
It is quite likely that the source code can be changed to significantly improve the performance. OTOH that is quite hard work on something as large as a TCP/IP stack. Typically it involves: 1) Stopping the compiler spilling registers to stack (may involve reducing the number of 'live' values in the code). 2) Assigning intermediate values to locals if they are used multiple times and a memory write could alias the source. 3) Forcing values be read from memory early to avoid pipeline stalls. 4) Put as much data (and io) where it can be referenced relative to %gp (reduces code size and pressure on registers) 5) Getting the static branch prediction right for every branch (and then disabling the dynamic branch predictor on the hidden menu). 6) Be willing to modify gcc. At a guess you can get 30%+ improvement - unless the code has already been treated that way!- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Allright then. I'll give it a try now (and let it run through the night to see whether it hangs :)
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page