performance of SCIF RMA

Jun_Z_2 · ‎01-05-2014

I tried to use SCIF RMA to exchange a large amount of data between two MICs but find it's almost impossible to align memory address of both card to 4K page.

Using memory exchange through host gives me 1.7 speed up over single card. How much more performance would SCIF RMA give if I can get it work? I want to decide if I should continue working in this direction.

Thanks a lot.

Loc_N_Intel · ‎04-02-2014

Hi Jun,

A recent paper, posted at http://software.intel.com/en-us/forums/topic/507126 , discussed about scif0 virtual Infiniband adapter. Hope this helps.

Evan_P_Intel · ‎04-02-2014

Jun Z. wrote:

it's almost impossible to align memory address of both card to 4K page.

You don't need to align SCIF transfers to page boundaries (4KB)--only cacheline (64B) boundaries. (Technically, you can transfer between arbitrary byte ranges using the SCIF API; it's just that a slow software fallback is used for the part(s) of the transfer where source and destination are not both cacheline aligned. In the worst case, that would be the entire transfer.)

Jun Z. wrote:

Using memory exchange through host gives me 1.7 speed up over single card. How much more performance would SCIF RMA give if I can get it work?

When you say "memory exchange through host", what do you mean, and what exactly are you comparing it to? SCIF achieves >6GB/s for large transfers between host and card on the system configuration discussed in http://www.intel.com/content/dam/www/public/us/en/documents/performance-briefs/xeon-phi-product-family-performance-brief.pdf; the possible transfer rate between two cards depends on the system chipset, but I would not expect it to be separated from the host/card rate by a factor so large as 1.7....