Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

User programmable DMA controller in the system with Xeon E3-1275v3 and C226 PCH?

Slavisa_Z_
Beginner
1,569 Views

We are using Xeon E3-1275v3 and  C226 PCH on our board. I am aware that this system does not support I/OAT or NetDMA.

 

Is there any user programmable DMA controller?

 

We want to perform DMA transfers from the main memory (source) to PCIe devices (destination).

Note: PCIe devices do not have DMA controller. PCIe device (destination) can be either PCIe BAR in endpoint or multicast BAR residing inside PCIe switch.

I previously posted this question here: https://communities.intel.com/thread/58913 and got suggestion (from Intel) to come here...

0 Kudos
5 Replies
Slavisa_Z_
Beginner
1,569 Views
More details about this question... I am asking about DMA controller that can basically transfer data from memory to memory. DMA controllers built into PCH (Audio, USB, SATA, Legacy 8237, etc.) do not count, because these can only transfer data between system memory and mentioned peripheral device. Thanks, Slavisa
0 Kudos
McCalpinJohn
Honored Contributor III
1,569 Views

Most PCIe devices have DMA controllers -- processors don't have them.  (They have been invented numerous times, but not implemented in high-volume commercial products.)

Do you have a specific PCIe device in mind that you know does not have a DMA controller?

If the PCIe device does not have its own DMA controller, then the fastest way to copy data from system memory to that IO device is to use a processor core.   You would need to set up a memory-mapped IO range for the device with the write-combining attribute, then use a processor core (or thread) to read from (cacheable) system memory and write to the MMIO range using streaming stores.

For the Xeon E5-1275 v3, there should be no trouble saturating any outbound PCIe interface using a single thread.   HyperThreading may be useful if you need all four cores to be doing other work at the same time -- the other thread on the core that you use to push the data to the IO device will have reduced effective memory bandwidth, but should retain most of its performance capability for computation, and much of its performance capability for memory accesses that hit in cache.

 

0 Kudos
Slavisa_Z_
Beginner
1,569 Views
John D. McCalpin wrote:

Most PCIe devices have DMA controllers -- processors don't have them.  (They have been invented numerous times, but not implemented in high-volume commercial products.)

My comment is that in Xeon family, E3 CPUs do not have DMA controller, but E5-1600/2400/2600/4600 family have multiple channel DMA controller(Intel QuickData technology). I was hoping that Xeon E5-1275 v3 could have something similar...
John D. McCalpin wrote:

Do you have a specific PCIe device in mind that you know does not have a DMA controller?

We are designing our own boards (both CPU and I/O). In our application, Xeon E5-1275 v3 is crunching data and sending results (samples) to a lot of PCIe devices living as endpoints connected to PCIe switch. You can imagine each PCIe endpoint being simple I/O card with digital-to-analog converters. Now, the real beauty would come from the fact that we could initiate only one DMA transfer from system memory to the multicast BAR residing inside the PCIe switch. Using this method, all PCIe endpoints would receive data from the system memory using single DMA transfer. PCIe switch would detect that destination address in incoming packet is hitting multicast BAR and redistribute all incoming packets to endpoints. In other words, CPU and memory controller would not be aware what is happening with data after PCIe switch, but we cannot use this method, because there is no DMA controller for Xeon E5-1275 v3/ C226.
John D. McCalpin wrote:

If the PCIe device does not have its own DMA controller, then the fastest way to copy data from system memory to that IO device is to use a processor core.   You would need to set up a memory-mapped IO range for the device with the write-combining attribute, then use a processor core (or thread) to read from (cacheable) system memory and write to the MMIO range using streaming stores.

This sounds promising! In our case, we are reading from memory and writing to multicast BAR in PCIe switch. So, we first have to make multicast BAR to appear as MMIO? All HW access is done using Linux and drivers written in C. In order to implement above mentioned method, can we still use C and make our own Linux drivers or we have to use assembly for streaming stores? Can you please provide some link where we can educate ourselves more about this method? Thank you very much for your response. It seems like that there is solution for the problem. Regards, Slavisa
0 Kudos
Thomas_W_Intel
Employee
1,569 Views

I tried to check if your processor supports Intel QuckData but I must confess that I'm puzzled by the processor name "Xeon E5-1275 v3".. Are you sure that this is not a typo?

0 Kudos
Slavisa_Z_
Beginner
1,569 Views
Thomas Willhalm (Intel) wrote:

I tried to check if your processor supports Intel QuckData but I must confess that I'm puzzled by the processor name "Xeon E5-1275 v3".. Are you sure that this is not a typo?

It is mistyped in the last reply and then spread over by using copy and paste. So, the correct name is Xeon E3-1275 v3. Regards, Slavisa
0 Kudos
Reply