Programmable Devices
CPLDs, FPGAs, SoC FPGAs, Configuration, and Transceivers
21594 Discussions

DMA reading multiple PIOs

Altera_Forum
Honored Contributor II
1,824 Views

Hi, 

 

My system contains a total of twenty two (22) 16-bit I/O ports (Altera PIOs) that are used to read external (10 bit) ADCs. My system is running at 85MHz and my maximum sample rate should be around 1usec. I am currently reading these I/O ports individually (using the IORD_ALTERA_AVALON_PIO_DATA command) and I am not getting a fast enough through-put.  

 

Although they are sequential in memory addresses, and each PIO spans 16 memory addresses i.e.; 

 

PIO#1 memory space is 0x0060a100 through 0x0060a10f 

PIO#2 memory space is 0x0060a110 through 0x0060a11f 

PIO#22 memory space is 0x0060a250 through 0x0060a25f 

 

I am not certain of why there is a 16 address memory span associated with this device, but it may be that each address represents a single bit of the 16 bit value. Since I don’t know how the PIO core is structured, and since the IORD_ALTERA_AVALON_PIO_DATA command returns a 16 bit short value, I feel there must be some manipulation within this routine that fetches/extracts the 16 bits and places them into a ‘short’ structure.  

 

My question is, is there a way to use the Altera ‘DMA’ to perform a read on the Altera I/O’s? What I really need is a way for the DMA to read the PIOs sequentially and transfer this data to SRAM. I believe if this is possible, I could improve my sampling rate by offloading the read function to the DMA. 

 

Thanks in advance for any and all help. 

 

Fred  

0 Kudos
12 Replies
Altera_Forum
Honored Contributor II
1,064 Views

You could use a scatter-gather DMA controller and program the descriptors to continuously read each PIO in a round-robin fashion. If you have a little HDL knowledge, you could write your own block to do this for you and ditch the PIOs altogether. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Yes I agree with Jake, You most likely want to to write your own custom io. 

 

At the very least you can insure your ADC sampling is consistent. By the number of ADC's you're talking about, you could probably do some of your number crunching and pass the date to nios when it really needs it. 

 

Pete
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

anakha and jakobjones, 

 

Thank you both for your comments. I have still been trying to use the basic Altera DMA core, but when I ask questions about using it with a series of PIOs, I tend to get a bunch of references to literature that does not answer my questions. 

 

I do not have much (if any) experience using HDL, so creating my own code is not likely, and I decided to continue trying to use the Altera DMA core for a little while longer before I switch over to the SG_DMA core. 

 

I have a couple of questions on the basic DMA core and its usage with the PIO cores. In order to try and use the DMA to read the PIOs, my software is arranged as follows; 

 

0) Assign Handles 

 

1) I first set up the data size by calling alt_dma_rxchan_ioctl with the ALT_DMA_SET_MODE_16 parameter 

 

2) I also set it us for Rx Only by calling alt_dma_rxchan_ioctl with the ALT_DMA_RX_ONLY_ON parameter 

 

3) In preparing the DMA to receive data, would I assign the BASE address of the first PIO in memory to the DMA? 

 

4) The PIOs have been setup as 16 bit, so my 'size' value that I pass to the  

alt_dma_rxchan_prepare routine would be 44 (i.e. 2 bytes per each of the 22 PIOs) 

 

While the DMA is doing its thing, I have a flag set to prevent me from placing another call to the DMA until it has finished. 

 

Does this look valid? Thanks 

 

Fred
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Well it's been a while for me since I used Altera DMA core. I assume you've read this: 

http://www.altera.com/literature/hb/nios2/n2cpu_nii51006.pdf 

and the “Using DMA Devices” on page 6–24 of this: 

http://www.altera.com/literature/hb/nios2/n2sw_nii52004.pdf 

 

When I used the core, I chose to access it's registers directly rather than use the higher level software functions. However what you have looks good to me except: 

 

in step 3, you're going to want to give the address of the data register of the PIO. It does so happen that this is located at the base address of the PIO peripheral. If it weren't however, you would obtain it using the following macro found in "altera_avalon_pio_regs.h": 

IOADDR_ALTERA_AVALON_PIO_DATA(base). 

 

Now what you want to do is have the DMA controller perform a read from the base address of each PIO in turn. This means that you need to have it set up with a list of addresses to read from and reading two bytes from each address. You can't have it read from consecutive addresses because the base addresses of your PIOs are not consecutive. So you can't give it the base address of the first PIO, tell it to do 22 consecutive address reads, and expect to get the data. Does that make sense? That's why I think you need to use the SGDMA core for what you are trying to accomplish. The SGDMA core gives you the ability to specify a list of addresses to read from. 

 

I should add too that in your case it would be faster to just read the values directly from the NIOS and not use DMA. There is a certain amount of overhead associated with DMA. DMA is only efficient when you are moving large chunks of data. So if you are just going to read each PIO once, do some processing in the NIOS, then repeat, drop the DMA as it's not going to buy you anything. You'll spend way more time setting up the DMA, servicing the IRQ, and waiting for the DMA core to cycle through descriptors than you would have just reading the PIO ports directly. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Jake, 

 

Yes, that is what I suspicioned all along, i.e. that you cannot just pass the base address for the first PIO in and expect it to read the other PIOs. When I saw the address range for each PIO, I did not think the approach I presented would work, but I just wanted someone to verify it was for the reason I suspected. 

 

The reason for trying to use the DMA in the first place is to speed up the reading cycle, but from what you have indicated, this will not get me there… 

 

I have also been thinking about increasing my system/memory clock frequency (being generated by the Altera PLL). I have actually tried to do just this, but my system will not accept the program from the debugger… 

 

My input clock is running at 50 MHz, and I am using the PLL to run at 85MHz. Everything works fine. I am using a Cyclone II EP2C70F896C6 FPGA, and the Cypress CY7C1380D SRAM which will go up to 167 MHz, so unless I have some layout problems on my board, I should be able to increase the frequency out of the PLL, but so far, I have had no luck. My current settings are; 

 

Freq In – 50 MHz 

PLL multiplier = 17 

PLL divisor = 10 

Phase Shift = -4.80 ns 

 

I believe that I should be able to go up to 141.667Mhz using; 

 

Freq In – 50 MHz 

PLL multiplier = 17 

PLL divisor = 6 

Phase Shift = -?.?? ns 

 

After talking with the Altera My Support folks, I have tried several phase shift values. Altera said that is should be the 90 degree value depending on the frequency you are trying to generate (e.g. at 141.666667 MHz, => (1/141.66667MHz) * (0.25) ~= -1.76ns) but this did not work. Any suggestions on either another phase shift value, or another area to look at. Thanks much for your help. 

 

Fred 

 

0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Are you adjusting your timing constraints accordingly with the changes you're making? Are you using timequest? 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Jake, 

 

No, I have never used this tool. It has amazed me that I have this system working with knowing so little. It is not until you start tweeking that you get into the tools you should have used up front. Thanks for pointing me in this direction. 

 

Fred
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Unfortunately, FPGA design is a lot more involved than uController design. You gain a wealth of flexibility and power but with the cost of added design investment. 

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

I have given up on the DMA reading for it looks like it will indeed have too much overhead. I also believe that reading the PIO ports using the  

IORD_ALTERA_AVALON_PIO_DATA command is too slow. Is there another way to rtead the PIO data? Thanks and have a great weekend!! 

 

Fred 

0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

It seems like it's typically only about 3 clock cycles to do an IOWR or IORD. So for your 22 PIOs that's 66 clock cycles. Is that too slow? My calculation indicates that will take 0.7uS at your current clock speed. That is cutting it close. What are you going to do with the data after you get it? Are you going to have enough time to do whatever processing it is your going to do? 

 

Maybe I'll write a little HDL module that will read them all for you and store them into memory. 

 

If you could do one read per clock cycle, that would get your total sample time down to 0.260uS. 

 

In reality, you could do two reads per clock cycle in HDL and get it down to 0.130uS.  

 

Jake
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

I think you'd better design a custom component that either maps your 22 inputs to 22 consecutive addresses, or that creates an Avalon Stream ready to be fed to an SGDMA. 

 

edit: sorry jake, i didn't see your answer before posting mine
0 Kudos
Altera_Forum
Honored Contributor II
1,064 Views

Jake, 

 

Thank you for your reply. Hope you had a great weekend. We had some colder weather in our forecast (Bozeman, Montana), so we were busy preparing our garden over the weekend. The first frost of the season was on the car windows this morning… 

 

From what I was looking at on the scope, I felt there was a longer delay in the reading of the PIOs than what you indicated. I will take a closer look. The read for the PIOs is driven by a timer (RAW_Read). The timer is disabled when it generates an interrupt, and the Nios II program then reads the 22 PIOs. The timer is then re-started. This will provide a little break in the data reading, but it should be acceptable since we will only be reading data at a 1 usec sample rate very rarely. Still, I would like to get as close as possible to this rate. 

 

The PIOs are read into a buffer, and after the RAW_Read timer has been restarted, there is another routine that pulls the data from the buffer and places it into sequential memory ring buffer. This routine keeps track of the number of reads, and once we have 11 sets of readings (484 bytes), a header and footer are attached to the data to create a 512 byte block. This 512 byte block is then transferred into a USB ring buffer in another memory space. This frees up the reading memory for the high sample rate reads. The data is then accumulated in the USB ring buffer memory until we have (??, currently 10) complete 512 byte blocks to send to the host using USB in ‘BLOCK Mode’ (DMA controlled).  

 

I feel the balance between the size of the USB ring buffer and the rate at which we are reading data is something I need to play with.  

 

In addition, I am generating signals for the system to be sent to the 12 DACs. These majority of these signals are relatively slow speed (the one fastest signal is >= ~100 usec), and I will normally only be using half of the DACs to generate signals. 

 

Sorry for the long windedness but I wanted to give you a little more info on what else was going on. Because of the nature of the system and the requirements, any increase in reading speed/decrease in processing time needs to be implemented. If you have some pointers I can use to do this, I would appreciate it greatly and of course if you have some HDL code that could speed up the reading that would be super. 

 

As always, thank you very much for your help. Have a great day and take care.  

 

Fred
0 Kudos
Reply