- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I bet this is most common issue. We have installed the Card and the MPSS software. We are getting this message.
# micctrl --initdefaults
[Warning] No Mic cards found or specified on command line
We are able to detect the module:
# find /lib/modules -name mic.ko
/lib/modules/3.10.0-123.el7.x86_64/extra/mic.ko
OS sees the card:
lspci | grep coprocessor
04:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev 11)
And here are other details:
# lsmod | grep mic
mic 666125 0
# micinfo
MicInfo Utility Log
Created Wed Oct 29 04:36:53 2014
System Info
HOST OS : Linux
OS Version : 3.10.0-123.el7.x86_64
Driver Version : 3.4-1
MPSS Version : 3.4
Host Physical Memory : 131754 MB
What could be the issue? Please help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Adding some more information:
# miccheck
MicCheck 3.4-r1
Copyright 2013 Intel Corporation All Rights Reserved
Executing default tests for host
Test 0: Check number of devices the OS sees in the system ... pass
Test 1: Check mic driver is loaded ... pass
Test 2: Check number of devices driver sees in the system ... fail
SCIF nodes do not match number of PCI detected devices
Status: FAIL
Failure: SCIF nodes do not match number of PCI detected devices
OS - Red Hat 7
# uname -r
3.10.0-123.el7.x86_64
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Usually when you get the "No Mic cards found" message, either the host system cannot detect the card at all or the mic kernel module did not load, but you have already checked for these two things.
Can you look in /var/log/dmesg or /var/log/messages on the host and see if you find a message like:
mic 0000:83:00.0: device not available (can't reserve [mem 0x00000000-0x1ffffffff 64bit pref])
pci_enable failed board #0
and run lspci -vvv and see if you find a message for the Co-processor entry like:
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt
In the first case search for "can't reserve". The first would indicate that you do not have large base addresses enabled in the BIOS for your host system. Try enabling support for large BAR (>4GB) in the host BIOS. In the second case, if the speed isn't 5GT/s or the width isn't 16, there was a problem in the card "training". Try reseating the card. If that doesn't make the training issue go away, you may need to talk to your card's supplier.
Let us know if either of these resolves the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have the exact same problem. We have a 5038A-IL Supermicro workstation and a 3120A xeon phi.
even with the "above 4G decoding" enabled in the BIOS we still get this message in dmesg
mic 0000:01:00.0: device not available (can't reserve [mem 0x00000000-0x3ffffffff 64bit pref]) pci_enable failed board #0 mic: probe of 0000:01:00.0 failed with error -22 mic: No MIC boards present. SCIF available in loopback mode
and lspci -s 01:00.0 -vv gives
01:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20) Subsystem: Intel Corporation Device 3c98 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 255 Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=16G] Region 4: Memory at de200000 (64-bit, non-prefetchable) [disabled] [size=128K] Capabilities: <access denied>
launching the mpss service hangs and gives this in /var/log/mpss
Tue Nov 11 15:31:46 2014: MPSS Daemon start Tue Nov 11 15:31:46 2014: mic0: opening sysfs 'state' entry failed No such file or directory Tue Nov 11 15:31:46 2014: ^A: opening sysfs 'state' entry failed No such file or directory Tue Nov 11 15:31:46 2014: <<<<<<<< mpssd: segmentation violation - dumping stack >>>>>>>> Tue Nov 11 15:31:46 2014: /usr/sbin/mpssd(segv_handler+0x1e) [0x40589e] Tue Nov 11 15:31:46 2014: /lib64/libpthread.so.0() [0x314120f710] Tue Nov 11 15:31:46 2014: /usr/lib64/libmpssconfig.so.0.0.1(mpss_clear_config+0xe) [0x3141a0474e] Tue Nov 11 15:31:46 2014: /usr/lib64/libmpssconfig.so.0.0.1(mpss_parse_config+0x35) [0x3141a05ce5] Tue Nov 11 15:31:46 2014: /usr/sbin/mpssd(boot_mic+0x38) [0x404d48] Tue Nov 11 15:31:46 2014: /lib64/libpthread.so.0() [0x31412079d1] Tue Nov 11 15:31:46 2014: /lib64/libc.so.6(clone+0x6d) [0x3140ee89dd] Tue Nov 11 15:31:46 2014: <<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Starting Intel(R) MPSS: ^C/etc/init.d/mpss: line 61: 4293 Segmentation fault (core dumped) $exec [FAILED]
All this was with the 8 pin pci and 6pin power cable plugged, the phi is silent but detected by lspci and the memory unassigned error appears in dmesg and mpss cannot find it. We tried wit both pcie power cable plugged with another power supply of 850W with the same results. Next we tried without the 6pin pci power cable plugged and the phi started to make te usual fan at top speed sound, the dmesg log was different only the no mic board present, but none of the "cant reserve memory", but lspci could not find the phi and the mpss had the same problems and crashed.
more info:
uname -r 2.6.32-431.el6.x86_64
micinfo MicInfo Utility Log Created Wed Nov 12 11:21:06 2014 System Info HOST OS : Linux OS Version : 2.6.32-431.el6.x86_64 Driver Version : 3.3.2-1 MPSS Version : 3.3.2 Host Physical Memory : 32900 MB
lsmod | grep mic mic 594588 0 microcode 112685 0
miccheck MicCheck 3.3.2-r1 Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... fail scif nodes does not match number of PCI detected devices Status: FAIL Failure: scif nodes does not match number of PCI detected devices
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tecnicos,
I apologize for your post sitting so long without a response.
Did you try your card in a PCIe x16 slot?
Frances
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Frances
Thanks for the reply. Yes, we tried several pcie x16 slots to no avail. It was all a hardware incompatibility with our workstation, we tried the same phi with another certified server and everything was ok

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page