- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
we have issue with infiniband card Qlogic on mic.
we use redhat 6.3, last mpss version and 2 phi card on each node.
ofed 1.5.4.1 has been installed
IB card is ok on head node is ok
we have rpm -U all intel-mic-ofed, reboot head node
but when we try to start ofed-mic,
we have the following error msg and only scif0 has sstarted on mic0 and mic1 :
"
ibpd: pid 4137 /dev/ibp1 started 4 threads
ibpd: pid 4159 /dev/ibp2 started 4 threads
kernel: ------------[ cut here ]------------
kernel: WARNING: at fs/sysfs/dir.c:512 sysfs_add_one+0xc9/0x130() (Not tainted)
kernel: Hardware name: S2600GZ
kernel: sysfs: cannot create duplicate filename '/devices/pci0000:80/0000:80:01.0/0000:81:00.0/infiniband/qib0/knx_node'
kernel: Modules linked in: ibscif(U) ibp_server(U) autofs4 nfs lockd fscache nfs_acl auth_rpcgss sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf bridge stp llc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables rdma_ucm(U) ib_sdp(U) rdma_cm(U) iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ib_sa(U) ipv6 ib_uverbs(U) ib_umad(U) mlx4_ib(U) mlx4_en(U) mlx4_core(U) ib_mthca(U) sg microcode ib_qib(U) mic(U) ib_mad(U) ib_core(U) sb_edac edac_core i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support ioatdma igb dca shpchp ext3 jbd mbcache sd_mod crc_t10dif isci libsas scsi_transport_sas ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ibp_server]
kernel: Pid: 1908, comm: qib/mic0 Not tainted 2.6.32-279.el6.x86_64 #1
kernel: Call Trace:
kernel: [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
kernel: [<ffffffff8106b836>] ? warn_slowpath_fmt+0x46/0x50
kernel: [<ffffffff811f3329>] ? sysfs_add_one+0xc9/0x130
kernel: [<ffffffff811f1612>] ? sysfs_add_file_mode+0x62/0xb0
kernel: [<ffffffff811f1671>] ? sysfs_add_file+0x11/0x20
kernel: [<ffffffff811f16a6>] ? sysfs_create_file+0x26/0x30
kernel: [<ffffffff8134ce79>] ? device_create_file+0x19/0x20
kernel: [<ffffffffa02bf82c>] ? qib_knx_server_listen+0x23c/0x720 [ib_qib]
kernel: [<ffffffffa02bf5f0>] ? qib_knx_server_listen+0x0/0x720 [ib_qib]
kernel: [<ffffffff81091d66>] ? kthread+0x96/0xa0
kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
kernel: ---[ end trace 640779d065f7b165 ]---
"
any idea ?
thanks
Quentin Bouyer
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If i shutdown one mic card, i can start ofed-mic
without any pb.
we will try to downgrade to redhat 6.2 because we think
these release is more compatible with our Qlogic IB cards.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the install instructions in the readme file that comes with the MPSS, it says to not install the kernel-ib* packages from the standard ofed 1.5.4.1 release and instead install the ones from the MPSS release. In your case, since you brought up ofed on the host before installing the MPSS, that would have meant uninstalling those packages before installing the ofed files from the MPSS. The readme also gives very strict instructions about the order in which things must be brought up: for RHEL 6.4 mpss->rdma->opensmd->ofed-mic; for other releases mpss->openibd->opensmd->ofed-mic. You probably did this, but just in case, I thought I would mention it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK, I am not (yet) very experienced with these drivers but I think (hope) I understand now.
In the readme file that comes with the MPSS, there are two sets of ofed driver install directions. One is for the TrueScale Infiniband drivers, the other is for all others. Mellanox definitely requires the drivers described in the all others section. When you said you use the 'QLogic driver from Intel (7.1.0.0.58)' , it hit me - that release number is for the Intel(r) True Scale Fabric Suite. For those drivers, your need to use the True Scale install directions from the MPSS readme file. They recommend using a 7.2 release of the True Scale drivers (which is not publically available yet but should be very shortly). But also, for this software to work with the coprocessor cards, you must install the drivers in the psm directory from the MPSS release. You cannot use these drivers with the RHEL 6.4 release but you can with 6.2 or 6.3.
Do you think that might be the problem? If so, could you try those other instructions on only the nodes using the Qlogic cards?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
7.2 is out there now. (Go to https://downloadcenter.intel.com/default.aspx and search on True Scale. I don't know if you will need the serial number from your adapter or not.) Do you want to give it a try?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It looks like a bug ? I don't have any qlogic card to try out but based on reading the source ... the server wants to create a new sys file. Since it now gets two MICs, my guess is that the file name gets duplicated (?). It is then stuck while trying to put the warning message into the log "sysfs: cannot create duplicate filename ....".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quentin, I looked through the text file you attached and everything looks right up until it goes badly wrong. I think this should be reported at http://premier.intel.com as an MPSS issue. If do not have access (you need to register through the registrationcenter.intel.com), I will submit the issue for you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, that document is out of date - do you remember where you saw it? This changed with the update 3 release last month.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was looking through some very old posts and realized I had left this issue hanging. Ultimately, the issue was a combination of poor documentation (for which I share part of the guilt), differences in the installation process for OFED for True Scale and Mellanox adapters and a bug. The documentation which is included in the MPSS release is much better these days (for which I cannot take the credit) and provides good directions for installing the True Scale version of the OFED software. The bug was fixed in MPSS 3.0.17461. So that's the story here.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page