Community
cancel
Showing results for 
Search instead for 
Did you mean: 
103 Views

Xeon Phi 7120A fails to reset on Linux, works fine on Windows

Greetings.

I got a Xeon Phi 7120A, and am out of ideas on how to make it work.

The device itself is functional and works fine under Windows (MPSS 3.8.4, Win 7) - it is detected, boots up, i can SSH into it and so on.

However it is useless for me on Windows, so i have been trying to get it to work under Linux. More precisely, i got a blank SSD into the same machine changing nothing else, installed CentOS 7.3 on it, installed MPSS 3.8.4. Trying to insert module or reset it with micctrl gives "mic0: reset failed", and nothing makes it go past that into a ready state.

[   16.394237] mic0: Resetting (Post Code \xffffffff\xffffffff)
[   16.394241] mic0: Transition from state resetting to reset failed

Followed the troubleshooting flowchart, nothing. Tried rebooting, reinstalling, checked BIOS settings (over 4G decode is active), looked through forums (noapic and pci=realloc kernel parameters do nothing), to no effect.

Another important observation is that micinfo on Windows was giving out correct info, but on Linux it gives out odd stuff and errors (as if it tried to decode a block of 0xFF instead of proper data). Logs and info attached.

Anyone knows what could this be and how to fix it?

0 Kudos
3 Replies
103 Views

Figured it out.

What fixed it was going into BIOS and setting the "PCIE1 Link Speed" from "Auto" to Gen1 or Gen2. Setting it to Gen3 breaks things as before. So apparently if your 7120A does not work and it looks like most fields and device parameters read some sort of ffffffff, downgrading your PCIe link speed does the trick.

I wonder what Windows does differently that this does not affect it, and if setting it to Gen2 would slow anything down?

JJK
New Contributor III
103 Views

the 7100 series coprocessor is a PCI Express v2 device, so setting the link explicitly to Gen2 will not slow things down. I do wonder why Linux fails to detect that it's a gen2 device, but Windows does.

 

Sirius
Beginner
103 Views

I think the difference is in the way the device driver is implemented in both Operating Systems.

Reply