Server Products
Data Center Products including boards, integrated systems, Intel® Xeon® Processors, RAID Storage, and Intel® Xeon® Processors
4922 Discussions

Server not recognizing Intel Xeon Phi Coprocessor 7120P

Vikrant1
Novice
4,993 Views

Hello everyone,

I'm a PhD student and have 2 Xeon Phi 7120P Coprocessor which i want to use for computations. However, I'm facing problem in getting the 7120P coprocessor to be recognized by my server.

My server is a Huawei RH2288 V2-8S with 2 x Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz. The board details are below.

The card got detected once in CentOS 7 when i manually set the PCIe mode to Gen1 (dont know why), but once i rebooted the server, the card was gone. After that i tried several settings for PCIe (Auto/Gen1/Gen2/Gen3) but it simply doesn't detect the card. The blue light keeps blinking when the server is ON but it doesn't show in lspci.

After that i even tried different OS like windows server 2019, Ubuntu, CentOS8 but it simply doesn't show the card.

To check the card, i installed it in a Dell Precision Tower 5810 with Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz and Dell baseboard 0K240Y and Intel chipset Haswell-E X99. First the pc did not boot but then i went in bios and set PCIe to Gen2 and it detected the 7120P coprocessor but after that when i tried to install the card, it would not respond. I noticed that the card was getting a bit hot in Dell system because 7120P does not have any cooling fan and can be used only in a server with active cooling. But the problem is server doesn't detect it.

Any help would be really appreciated.

My email is VS00350@surrey.ac.uk

Vikrant1_0-1605704845145.png

Vikrant1_1-1605705308952.png

 

0 Kudos
1 Solution
JoseH_Intel
Moderator
4,790 Views

Hello Vikrant1,


I am glad to hear that you were able to do progress on this issue up to get it almost fully resolved. I could get the case in hold for about a week. If you consider that it might take longer, and considering the coming holidays, you can tell me if better close the thread for now and submit a new one on next year.


I will look forward to hearing from you


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


View solution in original post

0 Kudos
17 Replies
JoseH_Intel
Moderator
4,976 Views

Hello Vikrant1,

 

Thank you for joining the community

 

You stated that you own 2 x Intel Xeon Phi Coprocessors 7120P. Is this issue happening with both cards or just one of them? Is it showing up in BIOS at least?  Are you booting the system in legacy mode or UEFI mode?

Besides that by any chance have you check with the server manufactured (Huawei and/or Dell) if this card is compatible with their systems?

We'll look forward for your updates

 

Regards

 

Jose A.

Intel Customer Support

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios

 

0 Kudos
Vikrant1
Novice
4,960 Views

Hi Jose,

Thank you for your reply.

Both the cards are 7120P and both show the same behaviour. They get detected in my Dell Precision Tower T5810 but since it does not have any cooling mechanism for passive cards so they become quite hot initially and then become cold. I'm assuming they auto shutdown because of no cooling.

Unfortunately the Huawei servers that we have no longer have any support available from Huawei, so we are completely on our own there.

The card got detected only once in the bios and the OS but once i rebooted the server, it was gone from both the bios and the OS. There is a blue light at the back of the card that keeps blinking. Not sure what this means.

The BIOS is currently set to dual EFI/Legacy (Auto select) mode. Can this have any effect on the Phi Coprocessor?

 

 

Regards,

Vikrant Singh

0 Kudos
JoseH_Intel
Moderator
4,954 Views

Hello Vikrant1,


At this point it is difficult to tell what could be wrong with these cards. It seems unlikely that both cards are damaged and makes me think about a compatibility issue. By any chance do you have access to another different server (ideally an Intel Server) that you could try the cards on, so we can discard an actual hardware failure? I think you could request compatibility information to both Dell and Huawei for this card. The following site shows compatibility with Intel servers only though Intel® Xeon Phi™ Coprocessor 7120P (16GB, 1.238 GHz, 61 core) Product Specifications


Regards


Jose A.

Intel Customer Support Technician


0 Kudos
Vikrant1
Novice
4,926 Views

Hi Jose,

Unfortunately, at the moment i just have access to these servers. I'm quite disappointed at the moment. Was really looking forward for a way to accelerate my computations with these cards.

 

 

Regards,

Vikrant

0 Kudos
JoseH_Intel
Moderator
4,922 Views

Hello Vikrant1,


Let me check with our senior team if there is something else that we can try.


Regards


Jose A.

Intel Customer Support Technician


0 Kudos
Vikrant1
Novice
4,903 Views

Hi Jose,

Many thanks for your help. Much appreciated.

I have added some snapshots of my BIOS, just in case you might want to have a look at them.

HyperThreading is Disabled and TurboMode is EnabledVikrant1_0-1606657764139.png

 

The server has only 1 PCIex16 slot linked to CPU0 (First CPU in the server) which is Port 3 and the XeonPhi 7120P.

Vikrant1_1-1606657847500.png

The respective PCIe port (3a) is set at Gen2 and PCI-E port max payload is set to 256B.

Vikrant1_2-1606657907096.png

Intel VT is disabled

Vikrant1_3-1606657957277.png

PCIe 64-bit Decode is enabled

Vikrant1_4-1606658028095.png

 

Regards,

Vikrant

0 Kudos
JoseH_Intel
Moderator
4,888 Views

Hello Vikrant1,


After some research we found the following. It looks like the issue could be not enough power from PSU.

Could you please confirm if the Xeon Phi are installed together and if they are working separately (one at a time)

Xeon phi 7120 power wattage is 300W. Both together 600W. We would suggest to check if PSU can support 2 Xeon Phi.


Besides that you also need to check if the host system supports Xeon Phi. In this case with Huawei and Dell.

Once the Power requirements are addressed, you can try lspci command in Linux to check if both cards are detected.


The following are useful information you want to take a look at:

Hardware requirements for Xeon PHi - https://www.intel.com/content/www/us/en/support/articles/000023229/processors/intel-xeon-processors.html

Here is a good Troubleshooting flow chart - https://software.intel.com/sites/default/files/managed/fa/ac/Intel_MPSS_Debugging_Linux_MPSS3.pdf

 

Best regards,


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
JoseH_Intel
Moderator
4,867 Views

Hello Vikrant1,


I am just following up to double-check if you found the provided information useful. If you have further questions please don't hesitate to ask. If you consider the issue to be completed please let us know so we can proceed to mark this ticket as closed. I will try to reach you back on next Monday 7th if still not hearing from you by then.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
Vikrant1
Novice
4,858 Views

Hi Jose,

Many thanks for the follow up.

At the moment, i have given up on using the Huawei servers as we have absolutely no support whatsoever for them.

But i have managed to get 2 Dell Precision Workstations (T5810) which are detecting the XeonPhi cards and have active cooling as well. I have managed to physically install the Xeon Phi 7120P cards and they are getting detected. However, i'm facing a problem in installing/configuring them for use.

Here is a snapshot of the card and the software platform that i'm using.

[root@XeonPhi00 ~]# lspci | grep "Co-processor"
04:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor SE10/7120 series (rev 20)
[root@XeonPhi00 ~]# uname -r
3.10.0-1127.el7.x86_64
[root@XeonPhi00 ~]# cat /etc/centos-release
CentOS Linux release 7.8.2003 (Core)

 

But im getting stuck at section 3.3.3.3 Install Base Intel MPSS with below output:

Vikrant1_2-1607086653036.png

Output:

[root@XeonPhi00 mpss-3.5.1]# modprobe mic
modprobe: FATAL: Module mic not found.

 

Any suggestion would be really helpful.

 

Regards,

Vikrant Singh

 

0 Kudos
JoseH_Intel
Moderator
4,830 Views

Hello Vikrant1,


It's good to hear that you are getting some progress with these cards and to know that both cards are fine and functional with no issues.


Can you tell where did you get the MPSS files? The ones officially hosted by Intel are located in the Developer Zone. Intel® Manycore Platform Software Stack (Intel® MPSS) You want to take a look because I don't see version 3.5.1 available to download.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
Vikrant1
Novice
4,822 Views

Hi Jose,

Thanks for the link.

I tried a bit of troubleshooting myself and found windows 10 was detecting the card properly. I installed the latest windows drivers from the link that you provided and below are the snapshots.

Looking at the control panel, I can see that the card is running pretty hot and the reason for that being that the cooling in my workstation is not controlled by the XeonPhi Card and therefore the Phi card  cannot ramp up the fans based on its load/temperature. I do see a fan connector on the Xeon Phi 7120P card with 4 pins but i do not have the pin out of it to know which pin is for +V, Gnd, PWM and Sense. Can you advise me in anyway for the same.

 

7120P.PNG

7120P_ethernet.PNG

Vikrant1_0-1607431511255.png

 

0 Kudos
JoseH_Intel
Moderator
4,812 Views

Hello Vikrant1,


Let me see if I can find this pinout information for your. I will let you know as soon as I have updates.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
JoseH_Intel
Moderator
4,809 Views

Hello Vikrant1,


We were able to get the pinout information regarding the J9E1 Fan Connector:


Pin 1 - Ground

Pin 2 - 12V power

Pin 3 - Fan Tach

Pin 4 - Fan control input


Hope it helps


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
Vikrant1
Novice
4,795 Views

Hey Jose,

Thank you so much. You have been such a great help. Thanks a ton.

I will be getting in touch with a team to 3D print some parts so I can attach a fan with the 7120P card.

I think this will take a while, so you can either close this ticket or put it on hold based on whatever is suitable for you. I will post an update as soon as i have some progress.

Many many thanks once again.

Regards,

Vikrant

0 Kudos
JoseH_Intel
Moderator
4,791 Views

Hello Vikrant1,


I am glad to hear that you were able to do progress on this issue up to get it almost fully resolved. I could get the case in hold for about a week. If you consider that it might take longer, and considering the coming holidays, you can tell me if better close the thread for now and submit a new one on next year.


I will look forward to hearing from you


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios


0 Kudos
Vikrant1
Novice
4,781 Views

Hi Jose,

I think considering the Christmas holidays, it would be better to close this thread for now as I'm not sure that when will the required stuff be ready. 

If i need any further support i can anyways raise another thread, if at all required.

Once again, many thanks for your help and support.

 

Regards,

Vikrant

0 Kudos
JoseH_Intel
Moderator
4,769 Views

Hello Vikrant1,


Its been my pleasure to assist. We will proceed to mark this thread as closed for now. If you have further issues or questions just go ahead and create a new topic.


Regards


Jose A.

Intel Customer Support Technician

For firmware updates and troubleshooting tips, visit:

https://intel.com/support/serverbios



0 Kudos
Reply