- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
By some reason, kernel modules, built from intel-mic-ofed-kmod source package, don't want to load on CentOS 6.3, for example:
[root@node001 ~]# modprobe ib_umad
FATAL: Error inserting ib_umad (/lib/modules/2.6.32-279.22.1.el6.x86_64/updates/drivers/infiniband/core/ib_umad.ko): Unknown symbol in module, or unknown parameter (see dmesg)
It seems CentOS 6.3 and intel-mic-ofed-kmod sources are not compatible (or I am doing something wrong). We are using the latest publicly available MPSS stack (Update 1) and we build intel-mic-ofed-* packages on boot. So, could you please answer (or point me to the correct documentation) the following two questions:
- There are at least 3 widly used OFED versions: OFA OFED, Mellanox OFED and QLogic OFED. What exact MPSS versions work with what OFED versions on what linux distrubutions? I suspect the correct answer changes all the time quickly, but I will be highly approtiated if somebody provides us with at least the current state.
- As far as I understand, the main reason to set up OFED on a host is to emulate HCA and allow ib-wise applications to communicate between the host and the card via "infiniband" (using rdma). Is it possible to use several MICs (installed on *different* hosts phisicaly connected to ib-switch) to get all advantages of ib communication between them? For example, it would be nice to run a mvapich2 native application on several MICs (on different hosts) in a cluster using ib only (Probably HCA emulation makes ib communication slower, but I am not sure).
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quick answer to question 1 - only OFED version 1.5.4.1 is currently supported. I think the version that comes with Red Hat is different. The recommended location to get the file from is OFA: http://www.openfabrics.org/downloads/OFED/ofed-1.5.4/OFED-1.5.4.1.tgz
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frances,
Frances Roth (Intel) wrote:
Quick answer to question 1 - only OFED version 1.5.4.1 is currently supported. I think the version that comes with Red Hat is different. The recommended location to get the file from is OFA: http://www.openfabrics.org/downloads/OFED/ofed-1.5.4/OFED-1.5.4.1.tgz
Thanks for your unswer. We are not using base distribution OFED. We can use QLogic 1.5.4.1, Mellanox 1.5.3 or OFA OFED 1.5.4.1. So, QLogic (Intel True Scale) and OFA versions look good. Probably my first question was not clean enough. I was asking whether QLogic (and OFA) OFED + RHEL6.2 (and 6.3, and aslo 6.4) + MPSS Update1 (since yesterday it is Update2) should work. It would be useful to know the same about SLES11 SP2 and SP3. Our software depends on all 3 versions (OFED + DISTRO + MPSS), therefore we should know what combination is supposed to work properly and what not.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The "unknown symbol" error is a result of mismatched kernel symbol versions, please check:
(1) Does the file "/lib/modules/`uname -r`/build/Module.symvers.mic" exist before building from the source rpm? This is to ensure the new modules have the correct symbol versions.
(2) After the building, have you run "sudo service openibd restart" (or just reboot the machine) before trying to load the newly built modules? This is to ensure that the old IB modules (with the wrong symbol versions) have been unloaded.
Only the OFA OFED 1.5.4.1 is offically supported and that is recommended.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jianxin,
Jianxin Xiong (Intel) wrote:
(2) After the building, have you run "sudo service openibd restart" (or just reboot the machine) before trying to load the newly built modules? This is to ensure that the old IB modules (with the wrong symbol versions) have been unloaded.
Right, openib was not restarted. We build intel-mic-ofed-* packages on a node boot, which means we should restart openib every time after boot (probably from /etc/init.d/ofed-mic). Thanks for the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Do I understand correctly that now two OFED versions are officially supported for a MIC host: OFA OFED and base distribution OFED in RHEL (in the latest version of MPSS)? All other OFED versions are not supported (and will not work)?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The current state of OFED -
You can replace the OFED from your Linux distribution with OFA OFED 1.5.4.1 from http://www.openfabrics.org/ and add in the MPSS support for OFED. This will give you the ability to directly communicate between a native application on the coprocessor and a Mellanox* InfiniBand Adapter.
You can replace the OFED from you Linux distribution with the OFED for Intel TrueScale InfiniBand adapters and add in the MPSS support for OFED. This will also give you the ability to directly communicated between a native application on the coprocessor and your Intel TrueScale InfiniBand adapter.
You can use the OFED from your LInux distribution (which will NOT allow you to add in any MPSS support for OFED.) Communication from native applications on the coprocessor will go through the regular virtual network to the host before it reaches the InfiniBand adapter.
If you are using RHEL 6.4 and Intel TrueScale Infiniband adapters, you will need to use this last solution. Because you are giving up the ability to communicate directly between a native application and the adapter, this is not the solution you will want in general.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Frances,
Thank you for the answer.
> You can use the OFED from your LInux distribution (which will NOT allow you to add in any MPSS support for OFED.)
Could you please give some details what exactly "will NOT allow you to add in any MPSS" means?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello --
Qucik question, will there be support for OFED-2.0-3.0.0 at some point in time ?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Taras - Rereading what I wrote, I think I was, perhaps, being too emphatic. After installing OFA OFED or TrueScale, you then install the rpm files from the ofed directory in the MPSS release. If you are using the OFED that came with your Linux distribution, there is no guarantee that those rpm files will install correctly. In the case of RHEL 6.4, you definitely cannot install them and get the direct coprocessor to adapter communication for native applications. I don't know the details of what and why it doesn't work. I can ask the MPSS team for more details.
Christian - that is a question I will need to ask the MPSS team.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Frances,
Thanks for the answer.
Frances Roth (Intel) wrote:
If you are using the OFED that came with your Linux distribution, there is no guarantee that those rpm files will install correctly. In the case of RHEL 6.4, you definitely cannot install them and get the direct coprocessor to adapter communication for native applications.
I am a bit confused now. According http://registrationcenter.intel.com/irc_nas/3529/readme-en.txt :
"Infiniband support for RHEL 6.4 is provided through the RDMA/infiniband packages that come wih the distribution."
This is said in the section "3.2 Steps to Install Intel(R) MPSS with OFED Support using Intel(R) True Scale InfiniBand Adapters". Maybe I understand incorrectly "Infiniband support"? Could you please explain again: should I install base distribution OFED (as said in the readme_en.txt) on RHEL6.4 with:
a. True Scale (QLogic) HCA
b. Mellanox HCA
Frances Roth (Intel) wrote:
I don't know the details of what and why it doesn't work. I can ask the MPSS team for more details.
Yes, I will approtiate any details. If it does not work it will be useful to know when it will work.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Frances -
Any update on the MPSS OFED 2-0 support for ?
	Thanks !
Christian
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Christian,
I'm trying to find a more definitive answer of you. Since things are relatively quiet around here due to the US holiday, you probably won't receive an answer until the later half of next week.
Regards
	--
	Taylor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are currently running into issues installing ofed on our hosts, and looking for any help. We have
2.6.32-573.7.1.el6.x86_64
mpss-3.6
I have tried OFED-3.5-MIC and OFED-3.12-1. In the case of OFED-3.12.1, running the installer with './install.pl -vvv --all --with-xeon-ph' results in:
Build ofed-scripts RPM
	Running  rpmbuild --rebuild -D 'PSM_HAVE_SCIF 1' --define '_topdir /var/tmp//OFED_topdir' --define 'dist %{nil}' --target x86_64 --define '_prefix /usr' --define '_exec_prefix /usr' --define '_sysconfdir /etc' --define '_usr /usr' /tmp/OFED-3.12-1/SRPMS/ofed-scripts-3.12-1.1.g7bdbeee.src.rpm
	TMPRPMS /var/tmp//OFED_topdir/RPMS/x86_64
	Created /var/tmp//OFED_topdir/RPMS/x86_64/ofed-scripts-3.12-1.1.g7bdbeee.x86_64.rpm
	Install ofed-scripts RPM:
	Running rpm -iv -D 'PSM_HAVE_SCIF 1' /tmp/OFED-3.12-1/RPMS/centos-release-6-7.el6.centos.12.3/x86_64/ofed-scripts-3.12-1.1.g7bdbeee.x86_64.rpm
	Build compat-rdma RPM
	Running rpmbuild --rebuild -D 'PSM_HAVE_SCIF 1' --define '_topdir /var/tmp//OFED_topdir' --nodeps --define '_dist .unsupported' --define 'configure_options   --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx5-mod --with-cxgb3-mod --with-cxgb4-mod --with-nes-mod --with-qib-mod --with-ocrdma-mod --with-ipoib-mod --with-srp-mod --with-iser-mod --with-nfsrdma-mod --with-ibscif-mod --with-ibp-server-mod --with-ibp-debug-mod' --define 'build_kernel_ib 1' --define 'build_kernel_ib_devel 1' --define 'KVERSION 2.6.32-573.7.1.el6.x86_64' --define 'K_SRC /lib/modules/2.6.32-573.7.1.el6.x86_64/build' --define '_release 1.1.g561c555.2.6.32_573.7.1.el6.x86_64' --define 'network_dir /etc/sysconfig/network-scripts' --define '_prefix /usr' --define '__arch_install_post %{nil}' /tmp/OFED-3.12-1/SRPMS/compat-rdma-3.12-1.1.g561c555.src.rpm
	Failed to build compat-rdma RPM
	See /tmp/OFED.3929.logs/compat-rdma.rpmbuild.log
The above mentioned log contains:
<snip to end of output>
make[1]: Entering directory `/usr/src/kernels/2.6.32-573.7.1.el6.x86_64'
	  CC 
	In file included from /var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.h:63,
	                 from <command-line>:0:
	/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.37.h:15: error: redefinition of 'proto_ports_offset'
	include/linux/in.h:292: note: previous definition of 'proto_ports_offset' was here
	In file included from /var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.h:65,
	                 from <command-line>:0:
	/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.39.h:196:1: warning: "PTR_RET" redefined
	In file included from /usr/src/kernels/2.6.32-573.7.1.el6.x86_64/arch/x86/include/asm/processor.h:31,
	                 from include/linux/prefetch.h:14,
	                 from include/linux/list.h:7,
	                 from include/linux/mm_types.h:7,
	                 from include/linux/kmemcheck.h:4,
	                 from include/linux/skbuff.h:18,
	                 from include/linux/if_ether.h:136,
	                 from include/linux/netdevice.h:29,
	                 from /var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.29.h:5,
	                 from /var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.h:55,
	                 from <command-line>:0:
	include/linux/err.h:64:1: warning: this is the location of the previous definition
	In file included from /var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-2.6.h:67,
	                 from <command-line>:0:
	/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/include/linux/compat-3.1.h:32: error: redefinition of 'ip_is_fragment'
	include/net/ip.h:249: note: previous definition of 'ip_is_fragment' was here
	make[3]: *** [/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/compat/main.o] Error 1
	make[2]: *** [/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12/compat] Error 2
	make[1]: *** [_module_/var/tmp/OFED_topdir/BUILD/compat-rdma-3.12] Error 2
	make[1]: Leaving directory `/usr/src/kernels/2.6.32-573.7.1.el6.x86_64'
	make: *** [kernel] Error 2
	error: Bad exit status from /var/tmp/rpm-tmp.jERoeH (%build)
	RPM build errors:
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    user vlad does not exist - using root
	    group vlad does not exist - using root
	    Bad exit status from /var/tmp/rpm-tmp.jERoeH (%build)
Can anyone help us get past this?
 
					
				
				
			
		
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
 
					
				
		
