- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a use case where the x86 CPU has to write 64 bytes of data to PCIe slave device whose memory has been mmapp'ed into the user space. As of now, i use memcpy to do that, but it turns out that it is very slow. Can we use the Intel SSE intrinsics like _mm_stream_si128 to speed it up? Or any other mechanism other than using DMA.
The objective is to pack all the 64 bytes into one TLP and send it on the PCI bus to reduce the overhead.
The system config is: Dual socket haswell has a custom NIC connected on x16 PCIe bus.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This will work if the memory is properly mapped as Write-Combining.
The Linux kernel folks keep on changing the interfaces, but on one system the required kernel call was "ioremap_wc()". This set up combination of MTRRs and PATs required to get the write-combining type. This can be done in two different ways (as shown in Table 11-7 of Volume 3 of the SW Developer's Manual 325384-055), but I don't remember which approach was used. Performance was fine -- about 73% of peak, which is what I expected from a back-of-the-envelope estimate of packet header overhead.
In every version of the Linux kernel that I looked at the "ioremap_cache()" call is silently converted to "ioremap_nocache()" for memory-mapped IO space. This is usually the correct thing to do, but it makes it difficult to experiment....
More details are available at:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks John for your suggestion. Is there a way all these things can be done from user space. Mine is user-space driver. I believe these ioremap_* functions are only available in the kernel code.
Thanks
-Anil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anil A. wrote:
Thanks John for your suggestion. Is there a way all these things can be done from user space. Mine is user-space driver. I believe these ioremap_* functions are only available in the kernel code.
Thanks
-Anil
Hi Anil,
Please take a look Data Place Development Kit (DPDK) library: http://dpdk.org/doc/guides/prog_guide/env_abstraction_layer.html#pci-access
DPDK contains a set optimized C libraries that can accelerate packet processing on IA. It uses Linux's UIO framework to map required memory space from kernel space to user space, so user can simply open uio device to communicate with NIC.
Best Regards,
Patrick
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The ioremap_*() calls do have to be done in the kernel -- I was assuming that you would be able to modify the existing device driver code that mapped the PCIe device into user space.
Table 11-7 of Vol 3 of the SW Developer's manual (document 325384) shows how the combination of MTRR's and PAT's controls the caching mode. If the existing driver sets up any combination of MTRR and PAT values that map to UC, then you will not be able to perform a 64 Byte store. (The same should be true for WP or WT, though I have never seen them used in Linux. WB mode should not be used for MMIO.)
If you really only need to write 64 Bytes, you could try doing 128-bit or 256-bit stores. Intel cautions against using stores larger than 64 bits to MMIO, but (if I recall correctly) it is not guaranteed *not* to work, so you might get lucky? This would not give you a single 64 Byte store, but it might let you get away with less than 8 8-Byte stores
Anyway, the point is that the hardware controls the size of the write transactions, and this control is via a combination of the MTRR and PAT values -- both of which can only be controlled in the kernel. The Data Place Development Kit and the Linux UIO infrastructure don't change this -- they just make it easier to write a kernel device driver that allows the user to make the desired mapping requests.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page