Software Archive
Read-only legacy content
17061 Discussions

mpss 3.4 - No Mic cards found or specified on command line

Saood_K_
Beginner
1,458 Views

I bet this is most common issue. We have installed the Card and the MPSS software. We are getting this message.

# micctrl --initdefaults
[Warning] No Mic cards found or specified on command line

We are able to detect the module:

# find /lib/modules -name mic.ko
/lib/modules/3.10.0-123.el7.x86_64/extra/mic.ko

OS sees the card:

 lspci | grep coprocessor

04:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev 11)

And here are other details:

# lsmod | grep mic
mic                   666125  0

# micinfo

MicInfo Utility Log

Created Wed Oct 29 04:36:53 2014

       System Info

                HOST OS                 : Linux

                OS Version              : 3.10.0-123.el7.x86_64

                Driver Version          : 3.4-1

                MPSS Version            : 3.4

               Host Physical Memory    : 131754 MB

What could be the issue? Please help.

0 Kudos
5 Replies
Saood_K_
Beginner
1,458 Views

Adding some more information:

# miccheck
MicCheck 3.4-r1
Copyright 2013 Intel Corporation All Rights Reserved

Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... pass
  Test 2: Check number of devices driver sees in the system ... fail
    SCIF nodes do not match number of PCI detected devices

Status: FAIL
Failure: SCIF nodes do not match number of PCI detected devices

 

OS - Red Hat 7

# uname -r

3.10.0-123.el7.x86_64

 

 

0 Kudos
Frances_R_Intel
Employee
1,458 Views

Usually when you get the "No Mic cards found" message, either the host system cannot detect the card at all or the mic kernel module did not load, but you have already checked for these two things.

Can you look in /var/log/dmesg or /var/log/messages on the host and see if you find a message like:

mic 0000:83:00.0: device not available (can't reserve [mem 0x00000000-0x1ffffffff 64bit pref])
pci_enable failed board #0

and run lspci -vvv and see if you find a message for the Co-processor entry like:

LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt

In the first case search for "can't reserve". The first would indicate that you do not have large base addresses enabled in the BIOS for your host system. Try enabling support for large BAR (>4GB) in the host BIOS. In the second case, if the speed isn't 5GT/s or the width isn't 16, there was a problem in the card "training". Try reseating the card. If that doesn't make the training issue go away, you may need to talk to your card's supplier.

Let us know if either of these resolves the issue.

0 Kudos
Tecnicos_A_
Beginner
1,458 Views

We have the exact same problem. We have a 5038A-IL Supermicro workstation and a 3120A xeon phi.

even with the "above 4G decoding" enabled in the BIOS we still get this message in dmesg

mic 0000:01:00.0: device not available (can't reserve [mem 0x00000000-0x3ffffffff 64bit pref])
pci_enable failed board #0
mic: probe of 0000:01:00.0 failed with error -22
mic: No MIC boards present.  SCIF available in loopback mode

and lspci -s 01:00.0 -vv gives

01:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)
        Subsystem: Intel Corporation Device 3c98
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 255
        Region 0: Memory at <unassigned> (64-bit, prefetchable) [disabled] [size=16G]
        Region 4: Memory at de200000 (64-bit, non-prefetchable) [disabled] [size=128K]
        Capabilities: <access denied>

launching the mpss service hangs and gives this in /var/log/mpss

Tue Nov 11 15:31:46 2014: MPSS Daemon start
Tue Nov 11 15:31:46 2014: mic0: opening sysfs 'state' entry failed No such file or directory
Tue Nov 11 15:31:46 2014: ^A: opening sysfs 'state' entry failed No such file or directory
Tue Nov 11 15:31:46 2014: <<<<<<<< mpssd: segmentation violation - dumping stack >>>>>>>>
Tue Nov 11 15:31:46 2014: /usr/sbin/mpssd(segv_handler+0x1e) [0x40589e]
Tue Nov 11 15:31:46 2014: /lib64/libpthread.so.0() [0x314120f710]
Tue Nov 11 15:31:46 2014: /usr/lib64/libmpssconfig.so.0.0.1(mpss_clear_config+0xe) [0x3141a0474e]
Tue Nov 11 15:31:46 2014: /usr/lib64/libmpssconfig.so.0.0.1(mpss_parse_config+0x35) [0x3141a05ce5]
Tue Nov 11 15:31:46 2014: /usr/sbin/mpssd(boot_mic+0x38) [0x404d48]
Tue Nov 11 15:31:46 2014: /lib64/libpthread.so.0() [0x31412079d1]
Tue Nov 11 15:31:46 2014: /lib64/libc.so.6(clone+0x6d) [0x3140ee89dd]
Tue Nov 11 15:31:46 2014: <<<<<<<<<<<<<<<<<<<<<<<<<<<<>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Starting Intel(R) MPSS: ^C/etc/init.d/mpss: line 61:  4293 Segmentation fault
(core dumped) $exec
                                                           [FAILED]

All this was with the 8 pin pci and 6pin power cable plugged, the phi is silent but detected by lspci and the memory unassigned error appears in dmesg and mpss cannot find it. We tried wit both pcie power cable plugged with another power supply of 850W with the same results. Next we tried without the 6pin pci power cable plugged and the phi started to make te usual fan at top speed sound, the dmesg log was different only the no mic board present, but none of the "cant reserve memory", but lspci could not find the phi and the mpss had the same problems and crashed. 

more info:

uname -r
2.6.32-431.el6.x86_64
micinfo 
MicInfo Utility Log
Created Wed Nov 12 11:21:06 2014


        System Info
                HOST OS                 : Linux
                OS Version              : 2.6.32-431.el6.x86_64
                Driver Version          : 3.3.2-1
                MPSS Version            : 3.3.2
                Host Physical Memory    : 32900 MB
lsmod | grep mic
mic                   594588  0 
microcode             112685  0 
miccheck
MicCheck 3.3.2-r1
Copyright 2013 Intel Corporation All Rights Reserved

Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... pass
  Test 2: Check number of devices driver sees in the system ... fail
    scif nodes does not match number of PCI detected devices

Status: FAIL
Failure: scif nodes does not match number of PCI detected devices

 

0 Kudos
Frances_R_Intel
Employee
1,458 Views

Tecnicos,

I apologize for your post sitting so long without a response. 

Did you try your card in a PCIe x16 slot?

Frances

0 Kudos
Tecnicos_A_
Beginner
1,458 Views

Hi Frances

Thanks for the reply. Yes, we tried several pcie x16 slots to no avail.  It was all a hardware incompatibility with our workstation, we tried the same phi with another certified server and everything was ok

0 Kudos
Reply