Software Archive
Read-only legacy content
17061 Discussions

Strange behavior of Xeon Phi 31s1p

Mihail_C_
Beginner
563 Views

Hi!

I am experiencing problems with a Xeon Phi 31s1p.

The motherboard is Asus P9X79WS (BIOS version 4802 - the most recent), the procesor is Intel I7 3820 @3.6GHz, the video card is Sapphire Radeon HD 7990 and the operating system is Windows 10.

The option for Xeon Phi is activated in the BIOS.

The MPSS version is 3.6.

The problem is the card shows up in Device Manager both as Coprocessor and as Virtual Ethernet Card but after about 10 minutes it disappears.  Even worse, micctrl -s hangs even when it is still showing in the Device Manager.  By hanging I mean there is no answer returned and a cursor _ is blinking.  During all this time, the blue led of the coprocessor is blinking continuously and very fast

In order to make it reappear I need to turn off the computer and let it cool off for a while.  The thermal and power behavior is this (using a thermal sensor from Zlaman ZM-MFC2 fan controller and its powe readings):

- when the PC is turned on after cooling off, the power goes around 300W and the temperature of the Xeon Phi is somewhere around 70 Celsius.

- when it disappears from the Device Manager, the power falls to 150W and the temperature of the Xeon Phi drops to 32 Celsius.

Even when the Xeon Phi is not visible anymore in the Device Manager, its blue led still continues to blink very fast.

I would be grateful for any suggestions.

 

Thanks!

0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
563 Views

The "p" versions of the Xeon Phi's are "passive".

Meaning: No cooling fan supplied (but required by system integrator)

NOT meaning: No fan required

These cards are configured such that the fan(s) must push/pull air from the (either) end of the card. The Xeon Phi Coprocessor Datasheet (,pdf) will give you information on cooling requirements.

Look on this forum for other threads containing "cooling" and/or "fan". You will find some examples of user created cooling solutions.

Jim Dempsey

0 Kudos
Mihail_C_
Beginner
563 Views

Thank you very much for the tips.

I have made an air duct from a tofu box (plastic) and secured it with screws at the rear of the coprocessor.  It funnels the air from a 92mm fan with 55 cf/min.  Problem solved.  However, I think there is room for improvement since the temperature at rest is 62 Celsius.

What would be a good temperature at rest?

0 Kudos
jimdempseyatthecove
Honored Contributor III
563 Views

Make sure your fan controller can handle a high capacity fan. The first one I tried had a maximum output of 6W/channel. This did not work. I replaced it with one that had 10W per channel. This worked.

Duct design makes a big difference too.

Jim Dempsey

0 Kudos
Mihail_C_
Beginner
563 Views

The fan controller is giving the fan adequate power, as shown by the fact it rotates at its maximum value (2500 rpm).

I can confirm duct design makes a difference.  I tweaked the tofu box by adding a "ramp" inside, leading the air straight from the fan to the board's intake.  The temperature dropped to 58o Celsius (room temperature is 27o C) when idle.  I suppose that before adding the ramp, a vortex was forming in the box corners opposite the fan, interfering with the intended flow.

So now my question is: what should be the target temperature for the idle state, so I know when to stop experimenting with duct designs?

Thanks!

0 Kudos
jimdempseyatthecove
Honored Contributor III
563 Views

The most critical part is can your fans keep the board 100C or less under full load.

As the temperature rises, you will have to determine at what point you crank the fan speed up to full speed. I do not know if your fan controller has a hysteresis loop (sensitive to if temperature is rising or falling).

When you have the high end working, then you can experiment with the low end settings.

Your primary interests is:

Whatever you set the low end at, it must not cause the high end to run over the 100C when you run under maximum load

Your secondary interest is in keeping the noise down when not computing. As long as you can keep the primary interest satisfied, you can keep lowering fan speed (and letting rest temperature rise).

Jim Dempsey

0 Kudos
Mihail_C_
Beginner
563 Views

I've improved the cooling system even further by placing a 120mm fan outside the case, pulling theair through the accelerator and away.  This fan also helps cooling the Sapphire HD 7990, as it covers both the exhaust from the Phi and the exhaust of the video card.

The temperature dropped to 52o C idle and 92o C maximum temperature under load during the Intel MKL linpack benchmark tests running in native mode.  Given it's below 100o C, I will stop fiddling with the cooling system.

Thank you very much for the tips.

0 Kudos
Reply