Community
cancel
Showing results for 
Search instead for 
Did you mean: 
华_黄_
Beginner
118 Views

Troubleshooting about the MIC card

Hello,

I had a Xeon Phi 31S1P running on a CentOS 7.1 (3.10.0-327.18.2.el7.x86_64) perfectly. Some days ago another naughty user updated the kernel into 3.10.0-514.2.2.el7.x86_64 , and I found that the Xeon Phi does not work. Therefore I uninstalled the old driver and reinstall the MPSS 3.8, following the installation guidance. Now some new problems occur. I can use root to run micinfo, micsmc -a and miccheck get the information, but when I use my own user to run this two commands, the result is :

(micsmc -a):Error: mic0: unable to determine device status: check RAS: scif_open failed: Permission denied

micinfo: a lot of NotAvailable

miccheck:

MicCheck 3.8-1
Copyright (c) 2016, Intel Corporation.

Executing default tests for host
  Test 0: Check number of devices the OS sees in the system ... pass
  Test 1: Check mic driver is loaded ... pass
  Test 2: Check number of devices driver sees in the system ... pass
  Test 3: Check mpssd daemon is running ... pass
Executing default tests for device: 0
  Test 4 (mic0): Check device is in online state and its postcode is FF ... pass
  Test 5 (mic0): Check ras daemon is available in device ... fail
    ras daemon is not available

Status: FAIL
Failure: A device test failed

Both  root and my user can ssh to mic0.

I tried to run an Xeon Phi offload mode program using my user and root, both failed. It seems that the program was unable to communicate with Xeon Phi.

How can I fix this problem? Thank you very much!

0 Kudos
6 Replies
JJK
New Contributor III
118 Views

what are the permissions on the device /dev/mic/scif ?  they should be something similar to

ls -al /dev/mic/scif 
crw-rw-rw- 1 root root 243, 1 Dec 22 10:47 /dev/mic/scif

 

华_黄_
Beginner
118 Views

JJK wrote:

what are the permissions on the device /dev/mic/scif ?  they should be something similar to

ls -al /dev/mic/scif 
crw-rw-rw- 1 root root 243, 1 Dec 22 10:47 /dev/mic/scif

 

On my device, the result is:

crw------- 1 root root 244, 1 Jan  6 11:37 /dev/mic/scif

 

I tried to chmod 666, but offload programs are still unavailable. And, in micinfo , PCIe items are "Insufficient Privileges" with my user, but can show correct data when using sudo.

JJK
New Contributor III
118 Views

Sorry for the delay, it took me a while to think about this some more.

Step 1: upgrade the mpss-modules RPM to match the current kernel. This RPM is part of the mpss-3.8-linux.tar tarball and can be found in the mpss-3.8/modules directory. The package you want is mpss-modules-3.10.0-514.el7.x86_64-3.8-1.x86_64.rpm

reboot

Step 2: check that the udev rules for the mic are correctly executed; these udev rules are located in /etc/udev/rules.d/50-udev-mic.rules and they set the permissions.

华_黄_
Beginner
118 Views

JJK wrote:

Sorry for the delay, it took me a while to think about this some more.

Step 1: upgrade the mpss-modules RPM to match the current kernel. This RPM is part of the mpss-3.8-linux.tar tarball and can be found in the mpss-3.8/modules directory. The package you want is mpss-modules-3.10.0-514.el7.x86_64-3.8-1.x86_64.rpm

reboot

Step 2: check that the udev rules for the mic are correctly executed; these udev rules are located in /etc/udev/rules.d/50-udev-mic.rules and they set the permissions.

 

Sorry for being away for so many days. I re-installed the mpss 3.8 with current kernel, and the premission problem still exists. 

华_黄_
Beginner
118 Views

华 黄. wrote:

Quote:

JJK wrote:

 

Sorry for the delay, it took me a while to think about this some more.

Step 1: upgrade the mpss-modules RPM to match the current kernel. This RPM is part of the mpss-3.8-linux.tar tarball and can be found in the mpss-3.8/modules directory. The package you want is mpss-modules-3.10.0-514.el7.x86_64-3.8-1.x86_64.rpm

reboot

Step 2: check that the udev rules for the mic are correctly executed; these udev rules are located in /etc/udev/rules.d/50-udev-mic.rules and they set the permissions.

 

 

 

Sorry for being away for so many days. I re-installed the mpss 3.8 with current kernel, and the premission problem still exists. 

Now, I can use root user to run offload programs. But normal users are still unable to run offload programs.

Xue_Y_
Beginner
118 Views

I am having exactly the same problem.

miccheck passes all tests from root, but miccheck fails on Test 5 check ras daemon for normal users.

It appears to be a common problem for CentOS 7.x users who recently updated their system (yum update). As pointed above.

chmod a+rwx /dev/mic/scif

would fix the problem, at least miccheck works fine from a normal user. I have not tested my program yet.

Hope this helps.

 

Reply