- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wanted to look at using infinband verbs on the MIC card, where we're talking about transfers between two different compute nodes (two different MICs, system memories or the system memory on one node and a MIC on the other). You are supposed to be able to do something like this with GPUDirect for Nvidia cards, but the documentation for GPUDirect that I could find is too sketchy right now to be useable. I wrote a program to do some transfers using RDMA and found that transfers from host memory to host memory are much faster than transfers from MIC memory to MIC memory.
My code was excised from a larger program and hard to understand and build, but I realized that the standard program in the OFED software, ibv_rc_pingpong, can be used to demonstrate the issue. This was already installed on compute nodes and on the MIC processors on Stampede. The data in the attached plot comes from ibv_rc_pingpong. If you try it, you need to specify the device with '-d mlx4_0' because there are two devices, and the other one, 'scif0', produces nonsense when trying to connect two separate compute nodes. I also increased the maximum transfer unit to 2048 with '-m 2048', improving rates somewhat. Depending on where in the network your two compute nodes are placed, there can be some variability in the rates, but all data in that plot was taken with the same two nodes, and I found the rates to be typical after trying several pairs of nodes.
From looking at the source code for ibv_rc_pingpong, it doesn't use RDMA: the opcodes being used in the work requests are IBV_WR_SEND and IBV_WR_RECV, not IBV_WC_RDMA_WRITE or READ. Regardless, the rates look the same as my RDMA code, where the best host-host transfer rate is around 5.77 GB/s and the best MIC-MIC transfer rate is around 0.92 GB/s.
Details of how ibv_rc_pingpong works aside, the important thing is that the program is doing the same thing whether run on the host or the MIC, and when the MIC is involved, the transfer rates are much lower. I also have host to MIC rates in the plot, which are a little better than MIC to MIC. Surely this is just a driver issue and there is no limit on the underlying hardware that makes the rates involving the MIC so low. In fact, from the SCIF transfer rates I posted elsewhere in this forum, it would be about 2x faster to do a three step process: sending data from the MIC to the local host, then out to the remote host, then up to the remote MIC.
I am working on a large sparse eigensolver for a material science problem where an eigenvector is needed, roughly speaking, for each electron in the system. Here, a very large collection of vectors is produced, and dense matrix operations are performed on those vectors. However, sparse matrix operations produce the vectors in the first place. Because system memory is so much larger than accelerator memory, I had been thinking that it might make sense to do the dense operations on the host ( where the vectors must eventually end up anyway because of the larger host memory size ) and do the sparse operations on the MIC. This would be counterintuitive since accelerators are typically thought of as poor for sparse matrix computations. The point of this test and the SCIF tests in another post is for trying to decide if that will actually work. At 0.92 GB/s, forget about it. Does anyone at Intel think this is just a driver issue and can be improved in the future through software updates?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am going to pass this issue to someone more knowledgeable to look at, but for the sake of completeness when I do, could you tell me what Linux distribution you are using on the host and what version of the MPSS?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Frances,
This is on Stampede. The operating system is CentOS 6.3. The only thing I could find on the MPSS version is in the file /etc/issue on the MIC card. It said this:
Intel MIC Platform Software Stack release 2.1
Kernel 2.6.34.11-g65c0cd9 on an k1om
Is there another way to check the MPSS version that will get the third number in the version string?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Grady,
You can type micinfo command:
% /opt/intel/mic/bin/micinfo to retrieve MPSS information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks loc. This is the output.
MicInfo Utility Log
Created Wed Feb 27 15:07:12 2013
System Info
Host OS : Linux
OS Version : 2.6.32-279.el6.x86_64
Driver Version : 4346-16
MPSS Version : 2.1.4346-16
Host Physical Memory : 32836 MB
CPU Family : GenuineIntel Family 6 Model 45 Stepping 7
CPU Speed : 2701.000
Threads per Core : 1
Device No: 0, Device Name: Intel(R) Xeon Phi(TM) coprocessor
Version
Flash Version : 2.1.01.0375
UOS Version : 2.6.34.11-g65c0cd9
Device Serial Number : ADKC23000348
Board
Vendor ID : 8086
Device ID : 225c
SubSystem ID : 2500
MIC Processor Stepping ID : 1
PCIe Width : Insufficient Privileges
PCIe Speed : Insufficient Privileges
PCIe Max payload size : Insufficient Privileges
PCIe Max read req size : Insufficient Privileges
MIC Processor Model : 0x01
MIC Processor Model Ext : 0x00
MIC Processor Type : 0x00
MIC Processor Family : 0x0b
MIC Processor Family Ext : 0x00
MIC Silicon Stepping : B0
Board SKU : ES2-P1750
ECC Mode : Enabled
SMC HW Revision : Product 300W Passive CS
Core
Total No of Active Cores: 61
Voltage : 1074000 uV
Frequency : 1090909 kHz
Thermal
Fan Speed Control : N/A
SMC Firmware Version : 1.6.3983
FSC Strap : 14 MHz
Fan RPM : N/A
Fan PWM : N/A
Die Temp : 41 C
GDDR
GDDR Vendor : Elpida
GDDR Version : 0x1
GDDR Density : 2048 Mb
GDDR Size : 7936 MB
GDDR Technology : GDDR5
GDDR Speed : 5.500000 GT/s
GDDR Frequency : 2750000 kHz
GDDR Voltage : 1000000 uV
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've passed this on - I will let you know when I hear something.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Grady,
A few more questions -
The version of the MPSS you are using on Stampede is not the latest. Do you have admin privileges on Stampede or have you been working with an admin there who can try installing the newest version (2.6.38) to see if the problem is reproducible with that release? Alternately, is there another system there which has the latest release installed on it?
What version of OFED are you using? Is this the version from the www.openfabrics.org site and is this where your copy of the ping pong code comes from?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Frances,
They are going to skip 2.6.38 and install the next version. I'll try this again when they do. The version of OFED does come from openfabrics.org.
![](/skins/images/7FC17B7B85029576C25F1E43CE255B51/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page