I just updated my Phi to the latest MPSS version (3.2.1) and also the OpenCL Runtime (14.1) as well as the SDK (2014 4.4.0).
Since then, every OCL example and code will crash when I let it run on the Phi, the CPU works fine as always.
I tried rebooting and everything I could imagine in my situation but I cannot figure out what is going wrong.
The MonteCarlo Example gives me this output:
Build program options: "-D__DO_FLOAT__ -cl-denorms-are-zero -cl-fast-relaxed-math -cl-single-precision-constant -DNSAMP=262144" *** OPENCL MIC DEVICE HW EXCEPTION ***: Segmentation fault (Address not mapped to object [0xfffffffffffffff8]) BACKTRACE: /tmp/coi_procs/1/4991/mic_server[0x407132] /lib64/libpthread.so.0(+0xf4d0)[0x7f588b47d4d0] /tmp/coi_procs/1/4991/mic_server[0x41e8dd] /tmp/coi_procs/1/4991/mic_server[0x4223b8] /tmp/coi_procs/1/4991/mic_server[0x41fced] /tmp/coi_procs/1/4991/mic_server[0x41e59d] /tmp/coi_procs/1/4991/mic_server[0x41672d] /tmp/coi_procs/1/4991/mic_server(copy_program_to_device+0x21)[0x4165f1] /usr/lib64/libcoi_device.so.0(+0x31ef0)[0x7f588bd2bef0] /usr/lib64/libcoi_device.so.0(+0x322c3)[0x7f588bd2c2c3] /usr/lib64/libcoi_device.so.0(+0x326d9)[0x7f588bd2c6d9] /lib64/libpthread.so.0(+0x7bce)[0x7f588b475bce] /lib64/libc.so.6(clone+0x6d)[0x7f588a89d1cd] ****************** terminate called after throwing an instance of 'std::runtime_error' what(): Segmentation fault Segmentation fault
System status for the Phi seems ok:
MicCheck 3.2.1-r1 Copyright 2013 Intel Corporation All Rights Reserved Executing default tests for host Test 0: Check number of devices the OS sees in the system ... pass Test 1: Check mic driver is loaded ... pass Test 2: Check number of devices driver sees in the system ... pass Test 3: Check mpssd daemon is running ... pass Executing default tests for device: 0 Test 4 (mic0): Check device is in online state and its postcode is FF ... pass Test 5 (mic0): Check ras daemon is available in device ... pass Test 6 (mic0): Check running flash version is correct ... pass Status: OK
MicInfo Utility Log Copyright 2011-2013 Intel Corporation All Rights Reserved. Created Fri May 23 16:13:38 2014 System Info HOST OS : Linux OS Version : 3.0.13-0.27-default Driver Version : 3.2.1-1 MPSS Version : 3.2.1 Host Physical Memory : 264519 MB Device No: 0, Device Name: mic0 Version Flash Version : 2.1.02.0390 SMC Firmware Version : 1.16.5078 SMC Boot Loader Version : 1.7.4172 uOS Version : 220.127.116.11+mpss3.2.1 Device Serial Number : ADKC25104125 Board Vendor ID : 0x8086 Device ID : 0x2250 Subsystem ID : 0x2500 Coprocessor Stepping ID : 3 PCIe Width : x16 PCIe Speed : 5 GT/s PCIe Max payload size : 256 bytes PCIe Max read req size : 512 bytes Coprocessor Model : 0x01 Coprocessor Model Ext : 0x00 Coprocessor Type : 0x00 Coprocessor Family : 0x0b Coprocessor Family Ext : 0x00 Coprocessor Stepping : B1 Board SKU : B1PRQ-5110P/5120D ECC Mode : Enabled SMC HW Revision : Product 225W Passive CS Cores Total No of Active Cores : 60 Voltage : 1032000 uV Frequency : 1052631 kHz Thermal Fan Speed Control : N/A Fan RPM : N/A Fan PWM : N/A Die Temp : 45 C GDDR GDDR Vendor : Elpida GDDR Version : 0x1 GDDR Density : 2048 Mb GDDR Size : 7936 MB GDDR Technology : GDDR5 GDDR Speed : 5.000000 GT/s GDDR Frequency : 2500000 kHz GDDR Voltage : 1501000 uV
Any advice would be greatly appreciated!
This error message basically says that your application has crashed. This can be caused by many reasons and it's hard to suggest something without looking into the code.
Can you share the source code?
as I said, it's the MonteCarlo Example from the SDK:
But it also happens with every other OCL application I tried. All examples work fine on the CPU.
The release notes for the OpenCL Runtime and the OpenCL SDK have CONFLICTING version requirements for the MPSS, as Michael H. empirically discovered.
In the SDK release notes:
"NOTE: For Intel Xeon Phi coprocessor device support, you must install the 3.2.1 version of Intel MPSS"
In the Runtime release notes:
"NOTE: Using OpenCL Runtime 14.1 with MPSS 3.2.1 is not recommended, as this combination introduces stability issues."
This needs to be resolved for people to use Intel's OpenCL on the Phi with any hope of success. I don't know what to ask my sysadmin to do in this case.
I'm experiencing the same problem using OpenCL Runtime 14.1 and MPSS 3.2.1.
Does the above release note mean that with the currently available Intel API it's NOT possible to run OpenCL code on Xeon Phi??
We’ve found a critical issue in the latest release package of the OpenCL runtime for Xeon Phi devices.
We’re currently working to provide a fixed package which will be released soon.
We’re truly sorry for the incontinence and will do our best to upload the fixed package as soon as possible.
Thanks everyone for the great and important feedbacks,