- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
in the subsection 4.1 in the SCIF user guide it is written the following.
“Lower performance will likely be realized if the source and destination base addresses are not cacheline aligned but are
separated by some multiple of 64.”
This sentence confuses me. What do you mean by separated. An example would really help.
Could somebody clarify this case?
Thank you in advance,
Aram
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hmm...that phrasing doesn't make sense to me either. Rather than explaining the documentation, I'll explain the actual hardware.
The DMA engine on Xeon Phi cards is only capable of operating on entire 64 byte cachelines--so it can copy the 64 bytes from physical addresses 0x1000-0x103f to physical addresses 0x1080-0x10bf, but is incapable of copying the 64 bytes from physical addresses 0x1004-0x1043 to 0x1080-0x10bf, because that copy cannot be broken down into a set of whole cachelines copied over a different set of whole cachelines. The SCIF driver implements a software fallback path to work around this when necessary (the driver does the loads and stores itself, using the CPU), but of course it's slower than the hardware path.
Now, if one instead wanted to copy 0x1004-0x2003 to 0x8004-0x9003, then the DMA engine could be used to copy 0x1040-0x1fff to 0x8040-0x8fff and the software fallback would only be necessary for the partial cachelines at the beginning and end (0x1004-0x103f and 0x2000-0x2003).
If, instead, the destination were 0x8002-0x9001, then it would be impossible to use the DMA engine at all, and the entire 4KiB block would have to use the software fallback.
I'm suspicious that the documentation you cited is trying to convey this last point--and failing. It might also mean that the fallback path is used more often than absolutely required, and more often than I remember...it has been a while for me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Evan,
thank you for taking the time to answer my question. Now I understand what the user guide tries to explain.
Have a nice day.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I add a documentation reference that may help others with questions regarding alignment and DMA transfers.
MPSS performance guide
https://software.intel.com/sites/default/files/managed/72/db/mpss-performance-guide.pdf
Section 2.2.1.2 Buffer Alignment
Cheers

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page