Software Archive
Read-only legacy content
17060 Discussions

Xeon Phi 7120P always runs at lowest frequency

kehw
Beginner
958 Views

I recently installed one 7120P in one of my servers. It seems working fine, but I noticed that it always runs at the lowest available frequency. Even I am running the benchmark application coming with intel compiler, the frequency stays at 0.57GHz.

Any idea about this?

Here is some information about my machine

        System Info
                HOST OS                 : Linux
                OS Version              : 2.6.32-504.8.1.el6.x86_64
                Driver Version          : 3.4.2-1
                MPSS Version            : 3.4.2
                Host Physical Memory    : 65939 MB

Device No: 0, Device Name: mic0

------------------------------------------------------------------------------------

        Version
                Flash Version            : 2.1.02.0390
                SMC Firmware Version     : 1.16.5078
                SMC Boot Loader Version  : 1.8.4326
                uOS Version              : 2.6.38.8+mpss3.4.2
                Device Serial Number     : ADKC44700618

        Board
                Vendor ID                : 0x8086
                Device ID                : 0x225c
                Subsystem ID             : 0x7d95
                Coprocessor Stepping ID  : 2
                PCIe Width               : x16
                PCIe Speed               : 5 GT/s
                PCIe Max payload size    : 256 bytes
                PCIe Max read req size   : 512 bytes
                Coprocessor Model        : 0x01
                Coprocessor Model Ext    : 0x00
                Coprocessor Type         : 0x00
                Coprocessor Family       : 0x0b
                Coprocessor Family Ext   : 0x00
                Coprocessor Stepping     : C0
                Board SKU                : C0PRQ-7120 P/A/X/D
                ECC Mode                 : Enabled
                SMC HW Revision          : Product 300W Passive CS

        Cores
                Total No of Active Cores : 61
                Voltage                  : 995000 uV
                Frequency                : 571428 kHz

        Thermal
                Fan Speed Control        : N/A
                Fan RPM                  : N/A
                Fan PWM                  : N/A
                Die Temp                 : 43 C

        GDDR
                GDDR Vendor              : Samsung
                GDDR Version             : 0x6
                GDDR Density             : 4096 Mb
                GDDR Size                : 15872 MB
                GDDR Technology          : GDDR5
                GDDR Speed               : 5.500000 GT/s
                GDDR Frequency           : 2750000 kHz
                GDDR Voltage             : 1501000 uV

------------------------------------------------------------------------------------

micsmc -f

mic0 (freq):
   Core Frequency: .......... 0.57 GHz
   Total Power: ............. 100.00 Watts
   Low Power Limit: ......... 315.00 Watts
   High Power Limit: ........ 375.00 Watts
   Physical Power Limit: .... 395.00 Watts

------------------------------------------------------------------------------------

0 Kudos
14 Replies
Sunny_G_Intel
Employee
958 Views

Hi Hongwei,

Can you please verify if while executing the benchmark all the cores are up and running. You can verify this by taking a snapshot of "micsmc"  Core Histogram View while the benchmark is running. Power states snapshot you attached looks normal.

Thanks

0 Kudos
kehw
Beginner
958 Views

Hello Sunny,

It seems that all cores are running. I run two benchmarks on this card, one is the linpack benchmark comes with icc and the other is the helloflops3 from the book Intel Xeon Phi Coprocessor High- Performance Programming. I attached snapshots for both of these two benchmarks. The linpack benchmark gives me up to 400+GFLOPS, which is about one sixth of the theoretical value. The helloflops3 supports to give me 2+TFLFOPS according to the book, but it gives me 100+GLOFS in this test. I have seen the 2+TFLFOPS result with the same code on a 5110P before on a different machine.

0 Kudos
Sunny_G_Intel
Employee
958 Views

Hi 

Can you please share the output of the following commands:

micsmc -f
micsmc --tthrottle mic0
micsmc --pthrottle mic0

The coprocessor frequency is forced to drop to minimum supported value (~600 MHz) when the thermal sensors on the coprocessor detect temperature rising above 105° C. Once the temperature drops below 105° C the coprocessor should return to the normal high frequency. Similar behavior is also seen when power level (PL0) is reached. 

Thanks

 

0 Kudos
kehw
Beginner
958 Views

Hi Sunny,

Here is the information, when the device is idle

----------------------------------------------------------------

micsmc -f

mic0 (freq):
   Core Frequency: .......... 0.57 GHz
   Total Power: ............. 98.00 Watts
   Low Power Limit: ......... 315.00 Watts
   High Power Limit: ........ 375.00 Watts
   Physical Power Limit: .... 395.00 Watts

----------------------------------------------------------------

micsmc --tthrottle mic0

mic0 (tthrottle):
   Throttle state: ......... inactive
   Current throttle time: .. 0 msec
   Throttle event count: ... 0
   Total throttle time: .... 0 msec

----------------------------------------------------------------

 

micsmc --pthrottle mic0

mic0 (pthrottle):
   Throttle state: ......... inactive
   Current throttle time: .. 0 msec
   Throttle event count: ... 0
   Total throttle time: .... 0 msec

----------------------------------------------------------------

0 Kudos
Sunny_G_Intel
Employee
958 Views

Hello Hongwei,

Thanks. Output looks OK to me. Can you please also verify if all the temperatures are OK. You can do this using 

micsmc -t

//Temperatures should be in the following ranges
mic0 (temp):
   Cpu Temp: ................ 59.00 C
   Memory Temp: ............. 37.00 C
   Fan-In Temp: ............. 30.00 C
   Fan-Out Temp: ............ 37.00 C
   Core Rail Temp: .......... 36.00 C
   Uncore Rail Temp: ........ 37.00 C
   Memory Rail Temp: ........ 37.00 C

Thanks

0 Kudos
Sunny_G_Intel
Employee
958 Views

Hi Hongwei,

Also if the above temperature are displayed out to be OK then I would recommend updating the flash. This would help eliminate the chances of any microkernel issues which might be forcing the coprocessor to run at the lowest frequency (~600 MHz). You can find steps for updating flash in latest version of System Administration guide for Intel® Xeon Phi™ Coprocessors

Thanks

0 Kudos
kehw
Beginner
958 Views

Hi Sunny,

Thank you very much for the help. Here are the temperatures

micsmc -t

mic0 (temp):
   Cpu Temp: ................ 38.00 C
   Memory Temp: ............. 26.00 C
   Fan-In Temp: ............. 22.00 C
   Fan-Out Temp: ............ 29.00 C
   Core Rail Temp: .......... 0.00 C (SMC reports sensor read invalid)
   Uncore Rail Temp: ........ 30.00 C
   Memory Rail Temp: ........ 30.00 C

It seems that the Core Rail Temp is not quite right.

0 Kudos
Sunny_G_Intel
Employee
958 Views

Hi Hongwei,

I am not sure if that is the cause of your problem. Do you get the same output on running "micsmc -t" multiple times. The error what you see above just states that SMC read was invalid it doesn't report any temperature issues. 

I would still recommend updating the flash. 

Thanks

0 Kudos
kehw
Beginner
958 Views

Hi Sunny,

When I repeat the command 'micsmc -t', I can get reasonable output

mic0 (temp):
   Cpu Temp: ................ 52.00 C
   Memory Temp: ............. 40.00 C
   Fan-In Temp: ............. 32.00 C
   Fan-Out Temp: ............ 45.00 C
   Core Rail Temp: .......... 45.00 C
   Uncore Rail Temp: ........ 46.00 C
   Memory Rail Temp: ........ 46.00 C

I updated the flash as suggested. It seems that  nothing changed after system reboot. The benchmarks still report the same GFLOPS. I watched the micsmc GUI when the benchmarks are runing, the temperature never goes higher than 65C event the device utilization is 100% sometimes. 

0 Kudos
Sunny_G_Intel
Employee
958 Views

Hi Hongwei,

I would like to touch base on the issue you were facing regarding the Intel Xeon Phi coprocessor running on low frequency. Were you able to resolve this issue?

Thanks

0 Kudos
kehw
Beginner
958 Views

Hello Sunny,

Sorry for being late. I did not notice your message.

I have not resolved this problem. It is quite strange, everything look fine, but just not in the full power.

Thank you.

0 Kudos
Edwin_G_
Beginner
958 Views

Hi Hongwei,

Did you get to resolve the problem?  if yes, would you please share what the resolution was?

Thank you!

Edwin G.

0 Kudos
kehw
Beginner
958 Views

Hi Edwin,

I did not resolve the problem. I plan to connect the manufacturer later. I will share if I have news on this problem.

0 Kudos
Sunny_G_Intel
Employee
958 Views

One of the reason the coprocessor is forced to run at lowest frequency is that it may not be getting enough power. Can you please verify again if the coprocessor is properly connected in the PCIe slot. Also can you please verify all the cables (to/from coprocessor) are intact.  

Thanks 

0 Kudos
Reply