Software Archive
Read-only legacy content
17061 Discussions

Observed CPU frequency of Xeon Phi 5110P

YW
Beginner
735 Views

Hi,

We have a cluster of Xeon Phi 5110P cards. Recently we noticed that most the cards are running at lower CPU frequency of 842MHz instead of 1052MHz, which makes our application run 15% slower than the full speed.

Does anybody know why our Xeon Phi cards are under utilizing the resource?

Thanks!

0 Kudos
5 Replies
Frances_R_Intel
Employee
735 Views

First check to make sure your coprocessor cards are not running too hot. Above around 104 degrees C, the coprocessor begins scaling back the frequency to reduce the heat being produce.

With passively cooled boards, you need to be careful that you have adequate airflow for cooling. You can run micsmc on the host to monitor the temperature. To find more information, see https://www-ssl.intel.com/content/www/us/en/processors/xeon/xeon-phi-coprocessor-datasheet.html. If your host system was sold as being compatible with the coprocessor, the solution my be as simple as updating the system firmware so that it can properly monitor the cards and increase the fan speed as needed: https://software.intel.com/en-us/forums/topic/558706. For systems that were not designed with adequate airflow, a number of people have come up with very inventive solutions: https://software.intel.com/en-us/forums/topic/537661.

If you find that the problem is not overheating, make sure you have the latest flash and smc firmware installed. Also make sure the card has sufficient power. 

Let us know what happens.

0 Kudos
YW
Beginner
735 Views

Thanks for your reply, Frances.

I don't think the problem is overheating as I always keep an eye on the temperature which is at most around 65 C. We installed the same flash and smc firmware version to all cards, while most of them running at 842 MHz, we also get some (3 out of 96) running at 1052 MHz, so I don't think the version matters... Any other possibilities?

Thanks again for your help!

0 Kudos
jimdempseyatthecove
Honored Contributor III
735 Views

Can you experiment by taking one of the slower cards and inserting it into the same slot/same system as one of your faster cards?

This would isolate the issue as to if it were a card issue or Motherboard BIOS or PCIe x16 issue.

Jim Dempsey

0 Kudos
Frances_R_Intel
Employee
735 Views

Swapping the cards is a good idea - try it and let us know what happens.

As to what else could cause the frequency drop - 

You have already eliminated overheating, which is the big concern. The next most likely culprit is power. It is possible for a well tuned program to draw more power than normally expected. In this case, the coprocessor will drop the frequency to cut the power requirements.

If the problem does, indeed, follow the slot and not the card, see if for some reason the card may be getting more power at the new location. A funny thing about the 5110P card is that, even though you don't need both the 2x4 and 2x3 power connectors to power your passively cooled cards, giving the card that extra boost will allow it to draw up to 240W before it drops the frequency. So if it looks like it might be a power issue, try using both power connectors - but do keep an eye on the temperature if you do.

 

You might also want to check the power management settings for the different cards - 'micctrl --pm' or use the micsmc GUI. Perhaps some of the cards are not adjusting their frequency because they have been told not to.

0 Kudos
Roshan_M_
Beginner
735 Views

YW wrote:

Thanks for your reply, Frances.

I don't think the problem is overheating as I always keep an eye on the temperature which is at most around 65 C. We installed the same flash and smc firmware version to all cards, while most of them running at 842 MHz, we also get some (3 out of 96) running at 1052 MHz, so I don't think the version matters... Any other possibilities?

Thanks again for your help!

Were you able to get to the bottom on this issue?

I have bumped into this issue with one of my MIC cards running on 842MHz. The host has only one card on the system, we have another 7 such nodes which are all running at 1052 MHz.

As you mentioned temperature is normal and the firmware is consistent across the card too. We still have not figured out why this card is misbehaving.

Thank you.

0 Kudos
Reply