- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
looking at the MPSS 3.3 release notes, I discovered that there is now support for RHEL 7. I wanted to try that and hijacked one of our cluster nodes to try installing CentOS 7 and the MPSS stack. The installation of MPSS was completely painless (and I like the fact that there are even service files for systemd), but I haven't managed to install OFED. The MPSS User Manual states that the only supported options for RHEL 7 is OFED-3.5-2-mic, but trying to run its installation script after the MPSS installation failed with a compilation error while building the intel-mic-ofed-compat-rdma RPM. The build logs seem to imply that the OFED sources don't support the 3.10-based compiler in RHEL 7:
make -f scripts/Makefile.build obj=/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat gcc -Wp,-MD,/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/.main.o.d -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/4.8.2/include \ -D__OFED_BUILD__ \ -DCOMPAT_BASE="\"compat-2012-07-02-13-gde310fa\"" -DCOMPAT_BASE_TREE="\"unknown\"" -DCOMPAT_BASE_TREE_VERSION="\"v3.5\"" -DCOMPAT_PROJECT="\"Compat-rdma\"" -DCOMPAT_VERSION="\"a5bbb76-np\"" \ -include /lib/modules/3.10.0-123.4.2.el7.x86_64/build/include/generated/autoconf.h \ -include /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/autoconf.h \ -include /lib/modules/3.10.0-123.4.2.el7.x86_64/build/include/linux/kconfig.h \ -include /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h \ \ \ \ \ -I/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include \ -I/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/drivers/infiniband/debug \ -I/usr/local/include/scst \ -I/var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/drivers/infiniband/ulp/srpt \ -D__XEN_INTERFACE_VERSION__= \ -I/usr/src/kernels/3.10.0-123.4.2.el7.x86_64/arch/x86/include/mach-xen \ -I/usr/src/kernels/3.10.0-123.4.2.el7.x86_64/arch/x86/include \ -Iarch/x86/include/generated -Iinclude \ \ -I/usr/src/kernels/3.10.0-123.4.2.el7.x86_64/arch/x86/include \ -D__KERNEL__ -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -Werror-implicit-function-declaration -Wno-format-security -fno-delete-null-pointer-checks -O2 -m64 -mno-sse -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -mfentry -DCC_USING_FENTRY -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fconserve-stack -DCC_HAVE_ASM_GOTO -DMODULE -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(main)" -D"KBUILD_MODNAME=KBUILD_STR(compat)" -c -o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/.tmp_main.o /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/compat/main.c In file included from <command-line>:0:0: /var/tmp/OFED_topdir/BUILD/intel-mic-ofed-compat-rdma-3.5/include/linux/compat-2.6.h:6:27: fatal error: linux/version.h: No such file or directory #include <linux/version.h> ^ compilation terminated.
I fixed the version.h problem, but other missing headers keep coming up: asm/types.h, asm/bitsperlong.h,...
As the most recent version of OFED-3.5-2-mic is -beta1, which was released at the beginning of May, is there perhaps a set of patches that need to be applied to the OFED distribution before attempting an installation? For some reason, the MPSS user guide explicitly states that RHEL 7 users should use that OFED stack, and the relevant portion of the guide doesn't list any additional steps apart from a straightforward installation.
Is there anyone out there who has managed to get MIC + Infiniband running on RHEL 7?
Best regards,
Steffen
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
linux/version.h should be available under /usr/src/ if kernel sources (optional) are installed (and maybe configured) and their path correctly selected. I don't know if the latter is a step in OFED preparation. It doesn't look like a gcc version problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
thanks for the comment! I have installed both kernel-devel and kernel-headers for the currently running kernel (to be able to build kernel modules on RHEL, you need kernel-devel).
To me, the problem seems to be that a lot of kernel headers have been moved around, e.g. version.h used to be in /lib/modules/$VERSION/include/linux/version.h, but is now in lib/modules/$VERSION/include/generated/uapi/linux/version.h. There is a similar story for a bunch of other headers, e.g. asm/types.h has to be changed to uapi/asm-generic/types.h I think.
Unfortunaly, I'm really not up to that task, as there are probably more severe problems lurking around the corner, and I don't know enough about the kernel to fix those...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've passed the question on to the OFED developers here. We'll see what they have to say.
Frances
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And the developer got right back to me. There is a slight release timing mismatch here. A new version of OFED-3.5-2-mic, the one that was tested against RHEL 7, is due out shortly. The developer said he expects "shortly" to be a few days. I will add a note when I hear it is out there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frances,
thanks for clearing up the issue! I'll just wait for an update from you.
Steffen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Boy, this was faster than I had expected. The following announcement was sent to the ewg mail list at openfabrics.org this morning:
OFED-3.5-2-MIC-rc1 is available at:
https://www.openfabrics.org/downloads/ofed-mic/ofed-3.5-2-mic/OFED-3.5-2-MIC-rc1.tgz
OFED-3.5-2-MIC requires the Intel(R) MPSS 3.x (YOCTO) release for Linux to be
installed on your system. MPSS 3.x for Linux can be downloaded from:
http://software.intel.com/mic-developer
Changes from OFED-3.5-2-beta1 include:
- added support for RHEL 7.0
- updated DAPL package to release 2.0.42.2
- updated PSM package to intel-mic-psm-3.3
- updated ib_qib driver for mpss-3.3
- script and documentation updates
- bug fixes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, that really was fast!
Unfortunately, I didn't manage to get my setup up and running. The new version of the OFED-MIC stack installs cleanly, and the initial setup was straightforward - IPoIB works (tried doing NFS over it, no problems there). The basic InfiniBand diagnostic tools are also telling me that my network setup is fine.
But then I tried mounting NFS over RDMA and got strange errors. I can browse the mounted filesystem without any problem, but trying to read any file larger than about 800 bytes results in errors like this:
cat: README: Input/output error
I then tried to verify whether there is a compatibility problem between CentOS 6 and 7, so I installed CentOS 7 on a second node and tried to mount an NFS share from there using RDMA. Unfortunately, I keep getting the same error.
I then tested lower-level connectivity using ib_write_bw (installed as part of the OFED stack). After fiddling around with transfer sizes a little, I noticed that transfers of more than 64k fail for some reason. Here are the outputs (I ran "ib_write_bw -F -R" on the server side):
[root@node01 ~]# ib_write_bw --size=65536 -F -R node02.ib libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx4_0 Number of qps : 1 Transport type : IB Connection type : RC TX depth : 128 CQ Moderation : 100 Mtu : 2048 Link type : IB Max inline data : 0 rdma_cm QPs : ON Data ex. method : rdma_cm --------------------------------------------------------------------------------------- local address: LID 0x05 QPN 0x0070 PSN 0x537e81 remote address: LID 0x02 QPN 0x006b PSN 0xde3ac7 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] Conflicting CPU frequency values detected: 1680.109000 != 1397.593000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1705.156000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1797.359000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1424.828000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1519.656000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1736.875000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1398.468000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1708.546000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1730.859000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1302.765000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1436.093000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1398.687000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1337.437000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1357.671000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1381.953000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1360.406000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1371.234000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1314.687000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1414.109000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1469.781000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1564.171000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1381.406000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 2800.000000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1499.968000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1499.750000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1532.453000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1704.609000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1494.171000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 2365.343000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1354.171000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1429.203000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1332.296000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1333.390000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1317.640000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1341.375000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1330.875000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1328.687000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1315.781000 Test integrity may be harmed ! Conflicting CPU frequency values detected: 1680.109000 != 1316.109000 Test integrity may be harmed ! Warning: measured timestamp frequency 2800.04 differs from nominal 1680.11 MHz 65536 5000 3631.36 3631.36 0.058102 ---------------------------------------------------------------------------------------
Going to a larger size always fails like this:
[root@node01 ~]# ib_write_bw --size=131072 -F -R node02.ib libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs1 --------------------------------------------------------------------------------------- RDMA_Write BW Test Dual-port : OFF Device : mlx4_0 Number of qps : 1 Transport type : IB Connection type : RC TX depth : 128 CQ Moderation : 100 Mtu : 2048 Link type : IB Max inline data : 0 rdma_cm QPs : ON Data ex. method : rdma_cm --------------------------------------------------------------------------------------- local address: LID 0x05 QPN 0x0072 PSN 0xd8d988 remote address: LID 0x02 QPN 0x006d PSN 0xa6f8b6 --------------------------------------------------------------------------------------- #bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps] Problems with warm up Failed to complete run_iter_bw function successfully
Here, I doubled the transfer size, but any value larger than 65535 will trigger the same behavior.
At this point, I'm rather lost. Does someone with more experience with InfiniBand have any ideas? Our hardware setup is ConnectX-3 QDR cards (mlx4 driver) with a QDR Mellanox switch. I'd be very grateful!
BTW, I'll be on holidays until the beginning of August, so I won't be able to answer any further questions until then.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steffen!
I'm wondering have you solved your problem already? It seems I have the same issue with OFED 3.5.2 rc3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pavel Lavrenko wrote:
Hi Steffen!
I'm wondering have you solved your problem already? It seems I have the same issue with OFED 3.5.2 rc3.
Hi Pavel,
no, I asked on the OFED side and apparently this won't be fixed in OFED-3.5. But according to what's going on on the OFED mailing list, OFED-3.12 should be released next week or so and it includes MIC support in the mainline distribution. We'll try whether that works with CentOS 7 and RDMA once it is out.
Steffen
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So i am also working this issue and have had no luck with the mellenox version of OFED:
I got the version error here is a fix may not be correct:
Notes: Instructions in guide are wrong: Step 4) States: rpmbuild --rebuild --define “MOFED 1”
Correct command is: rpmbuild --rebuild --define 'MOFED 1' ofed-driver*.src.rpm Fatal Error Still occurs: fatal error: linux/version.h: No such file or directory #include
export C_INCLUDE_PATH=$C_INCLUDE_PATH:/usr/src/kernels/3.10.0-229.el7.x86_64/arch/x86/include/generated/
Now i get a lot of errors like:
error: dereferencing pointer to incomplete type entry->write_proc = ibscif_stats_write;
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm a bit surprised that /usr/include wasn't in your default include path. The file /usr/include/linux/version.h was there, so I am taking it as a given that you did install kernel-headers and kernel-devel.
I will put in a documentation bug report about the use of double quotes when single quotes should be used.
As far as the other problems, you don't say exactly which Linux release and which MPSS release you are using. If you are using CentOS 7 or later, as the originator of this thread was, then I would suggest using Open Fabrics OFED 3.12 or 3.18 rather than the Mellanox OFED. What it said in the MPSS 3.4 documentation but does not explicitly say in the MPSS 3.5 or 3.6 documentation is that with RHEL 7.0 and later you should use the Open Fabrics OFED. What the current documentation does say is:
Each OFED distribution supports a subset of the Intel® MPSS supported OS distros; most support SLES* 11 SP3 and RHEL* 6.2/3/4/5/6. Newer distros may not be officially supported by any released OFED (at the time of this writing: RHEL* 6.7, SLES* 11 SP4). Check the respective release notes for the exact supported distros.
So it sounds like another documentation bug report I should put in.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The way I got it to run is by using OFED-3.18 from http://downloads.openfabrics.org/OFED/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes centos 7.1
I tried OFED-3.18 and no dice. Ill try a clean install and try again.
![](/skins/images/3344F5B3B76C91485ED0E980FD0CA95E/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page