Software Archive
Read-only legacy content
17061 Discussions

Stability issues for the Xeon Phi card?

Peng_Z_2
Beginner
378 Views

I have been trying to get the Intel Xeon Phi card to run in my Intel s2600gz server (with an external power supply if you are wondering).

I have gotten the card to run under both Ubuntu 12.04 and Centos 6.4. The card was able to boot and I was able to do some initial benchmarking.

Unfortunately, the card appeared to be hung intermittently, sometimes after only about 10 minutes or so. The only way to recover is to power cycle the system. Here are some typical output from the mpssd file:

Wed May 15 13:20:21 2013: MPSS Daemon start
Wed May 15 13:20:21 2013: Configuration version 0.6
Wed May 15 13:20:21 2013: Overlay /opt/intel/mic/amplxe-userapi /opt/intel/mic/amplxe-userapi/amplxe-userapi.filelist declaration style is deprecated
Wed May 15 13:20:21 2013: Overlay /opt/intel/mic/sep3.10 /opt/intel/mic/sep3.10/k1om/sep.filelist declaration style is deprecated
Wed May 15 13:20:21 2013: mic0: Command line: "quiet root=ramfs console=hvc0 highres=off clocksource=tsc cgroup_disable=memory micpm=cpufreq_on;corec6_off;pc3_on;pc6_on"
Wed May 15 13:20:21 2013: mic0: log_buf_addr: ffffffff832552d0
Wed May 15 13:20:21 2013: mic0: log_buf_len: ffffffff81724c70
Wed May 15 13:20:21 2013: mic0: Booting /lib/firmware/mic/uos.img
Wed May 15 13:20:21 2013: mic0: State ready -> booting
Wed May 15 13:20:23 2013: Wait for download requests
Wed May 15 13:20:42 2013: Configure node 0
Wed May 15 13:20:42 2013: mic0: Configure Connection
Wed May 15 13:20:46 2013: mic0: Set time of day
Wed May 15 13:20:46 2013: mic0: Transfer file system /opt/intel/mic/filesystem/mic0.image
Wed May 15 13:20:48 2013: mic0: Configuration Finished
Wed May 15 13:21:00 2013: mic0: State booting -> online
Wed May 15 16:12:11 2013: mic0: State online -> lost
Wed May 15 16:12:11 2013: mic0: open /proc/mic_vmcore/mic0 failed No such file or directory
Wed May 15 16:13:22 2013: mic0: State lost -> resetting
Wed May 15 16:13:24 2013: mic0: State resetting -> reset failed
Thu May 16 09:48:46 2013: mic0: State reset failed -> resetting
Thu May 16 09:48:48 2013: mic0: State resetting -> reset failed

The MIC card went from online to the lost state and resetting failed and nothing works from this point.

Does anyone know what's going on? Why is the card so unstable?

0 Kudos
3 Replies
Frances_R_Intel
Employee
378 Views

Are you running micrasd? If not, try starting it up in daemon mode ("micrasd -daemon") and see if it logs any more information when this happens.

0 Kudos
Peng_Z_2
Beginner
378 Views

I was finally able to update both the flash and the bootloader on the card. This seemed to have stablized the card. I have been running benchmark tests for about 1 day now and it is still going. Thanks, Frances.

0 Kudos
Chris_Samuel
Beginner
378 Views

You may also want to disable the PC3 and PC6 power saving settings as well, otherwise you might find other odd stability issues.

0 Kudos
Reply