Software Archive
Read-only legacy content
17061 Discussions

lspci not recognizing some Xeon Phi's

Storey__Jed
Beginner
674 Views

Hi,

My system:
Motherboard: Asus Z10PA-D8
Processor: 
Xeon E5 v3 processor
OS: CentOS 7.3

I obtained a large quantity of used Xeon Phi's (mostly 5110P, 71S1P, and 7110P's). I followed the instructions in the readme and user guides for MPSS 3.8.3. No problems with installation and some of the Phi's are recognized by lspci and pass miccheck. This means my system is working fine and is compatible. However, a large proportion of them are not recognized by lspci. They are all adequately powered, cooled (custom fan adapter), and have a flashing blue light. I know the cooling is sufficient for two reasons: 1. Some of the Phis are working fine 2. I actually measured the flow rate through them with a flow meter and compared it to the spec in the datasheet.

The troubleshooting flow chart just suggests to contact the system manufacturer, but since it works for some Phi's, that's kind of pointless because they'll just claim it is something wrong with the Phi's,

Is there any other way to troubleshoot these? Some sort of hardware reset or re-flash? I'd like to get them all working.

Thanks

0 Kudos
4 Replies
JJK
New Contributor III
674 Views

if the host does not recognize the card, then you are mostly out of luck. Here's what I would try:

  1. remove all Phis from the server and insert a single, WORKING card.
  2. boot the box and do an 'lspci -vv'
  3. remove the working card and replace it with a "faulty" one. Power off the server completely.
  4. boot the box and do an 'lspci -vv' again
  5. Check the differences

If the host does not recognize/power up the Phi then there's very little that can be done. Also, see if you another box to insert a Phi into.

 

0 Kudos
Storey__Jed
Beginner
674 Views

Yeah, this is pretty much exactly what I've tried. The blue blinking light comes on in all of them, but they aren't recognized by lspci. I guess something is fried/corrupted in those.

Anyone know how to repair these? It's probably not worth the time, but just wondering.

0 Kudos
JJK
New Contributor III
674 Views

you could try querying using 'ipmitool' ; this will only worh the E5 v3 CPUs or with special mobos. Read up on this at

https://software.intel.com/en-us/articles/determining-the-idle-power-of-an-intel-xeon-phi-coprocessor
 

0 Kudos
Storey__Jed
Beginner
674 Views

That's an impressive piece of work, and it looks like it might help with figuring out what is wrong with the non-lspci recognized ones. Thanks!

0 Kudos
Reply