- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My system:
Motherboard: Asus Z10PA-D8
Processor: Xeon E5 v3 processor
OS: CentOS 7.3
I obtained a large quantity of used Xeon Phi's (mostly 5110P, 71S1P, and 7110P's). I followed the instructions in the readme and user guides for MPSS 3.8.3. No problems with installation and some of the Phi's are recognized by lspci and pass miccheck. This means my system is working fine and is compatible. However, a large proportion of them are not recognized by lspci. They are all adequately powered, cooled (custom fan adapter), and have a flashing blue light. I know the cooling is sufficient for two reasons: 1. Some of the Phis are working fine 2. I actually measured the flow rate through them with a flow meter and compared it to the spec in the datasheet.
The troubleshooting flow chart just suggests to contact the system manufacturer, but since it works for some Phi's, that's kind of pointless because they'll just claim it is something wrong with the Phi's,
Is there any other way to troubleshoot these? Some sort of hardware reset or re-flash? I'd like to get them all working.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
if the host does not recognize the card, then you are mostly out of luck. Here's what I would try:
- remove all Phis from the server and insert a single, WORKING card.
- boot the box and do an 'lspci -vv'
- remove the working card and replace it with a "faulty" one. Power off the server completely.
- boot the box and do an 'lspci -vv' again
- Check the differences
If the host does not recognize/power up the Phi then there's very little that can be done. Also, see if you another box to insert a Phi into.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, this is pretty much exactly what I've tried. The blue blinking light comes on in all of them, but they aren't recognized by lspci. I guess something is fried/corrupted in those.
Anyone know how to repair these? It's probably not worth the time, but just wondering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you could try querying using 'ipmitool' ; this will only worh the E5 v3 CPUs or with special mobos. Read up on this at
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That's an impressive piece of work, and it looks like it might help with figuring out what is wrong with the non-lspci recognized ones. Thanks!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page