- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
the compiler tells me to contact Intel ;-) I'm a beginner to MIC programming and having a hard time teaching the offload mechanism to accept C++ complex numbers.
[l_stadler_h@merlinx01 complex-test]$ icpc --version
icpc (ICC) 15.0.2 20150121
Copyright (C) 1985-2015 Intel Corporation. All rights reserved.
[l_stadler_h@merlinx01 complex-test]$ icpc -std=c++11 complex-test.cc
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.
compilation aborted for complex-test.cc (code 4)
[l_stadler_h@merlinx01 complex-test]$ cat complex-test.cc
#include <iostream>
#pragma offload_attribute(push, _Cilk_shared)
#include <complex>
#pragma offload_attribute(pop)
namespace {
using std::complex;
_Cilk_shared complex<float> r;
_Cilk_shared void product (_Cilk_shared complex<float> &result, _Cilk_shared complex<float> *a, _Cilk_shared complex<float> *b, unsigned int sz)
{
using std::conj;
complex<float> res = complex<float>(.0f, .0f);
_Cilk_for (unsigned int i=0; i<sz; i++) {
complex<float> c(b.real(), -b.imag()), d(a.real(), a.imag());
res += d * c;
}
result = res;
}
}
int main (int argc, char *argv[])
{
using std::cout;
constexpr unsigned int sz = 10000;
_Cilk_shared complex<float> *a = (_Cilk_shared complex<float> *)_Offload_shared_aligned_malloc(sz * sizeof(*a), 64);
_Cilk_for (unsigned int i=0; i<sz; i++)
a = complex<float>(1.f/float(i), 1.f/float(i));
_Cilk_offload product(r, a, a, sz);
_Offload_shared_aligned_free(a);
float real = r.real();
float imag = r.imag();
complex<float> res(real, imag);
cout << "Result = " << res << '\n';
return 0;
}
-----------------------------
micinfo:
MicInfo Utility Log
Created Thu Feb 19 16:41:41 2015
System Info
HOST OS : Linux
OS Version : 3.10.0-123.20.1.el7.x86_64
Driver Version : 3.4.2-1
MPSS Version : 3.4.2
Host Physical Memory : 131753 MB
Device No: 0, Device Name: mic0
Version
Flash Version : 2.1.02.0390
SMC Firmware Version : 1.16.5078
SMC Boot Loader Version : 1.8.4326
uOS Version : 2.6.38.8+mpss3.4.2
Device Serial Number : ADKC43600092
Board
Vendor ID : 0x8086
Device ID : 0x225c
Subsystem ID : 0x7d95
Coprocessor Stepping ID : 2
PCIe Width : Insufficient Privileges
PCIe Speed : Insufficient Privileges
PCIe Max payload size : Insufficient Privileges
PCIe Max read req size : Insufficient Privileges
Coprocessor Model : 0x01
Coprocessor Model Ext : 0x00
Coprocessor Type : 0x00
Coprocessor Family : 0x0b
Coprocessor Family Ext : 0x00
Coprocessor Stepping : C0
Board SKU : C0PRQ-7120 P/A/X/D
ECC Mode : Enabled
SMC HW Revision : Product 300W Passive CS
Cores
Total No of Active Cores : 61
Voltage : 950000 uV
Frequency : 1238095 kHz
Thermal
Fan Speed Control : N/A
Fan RPM : N/A
Fan PWM : N/A
Die Temp : 41 C
GDDR
GDDR Vendor : Samsung
GDDR Version : 0x6
GDDR Density : 4096 Mb
GDDR Size : 15872 MB
GDDR Technology : GDDR5
GDDR Speed : 5.500000 GT/s
GDDR Frequency : 2750000 kHz
GDDR Voltage : 1501000 uV
Device No: 1, Device Name: mic1
Version
Flash Version : 2.1.02.0390
SMC Firmware Version : 1.16.5078
SMC Boot Loader Version : 1.8.4326
uOS Version : 2.6.38.8+mpss3.4.2
Device Serial Number : ADKC43600046
Board
Vendor ID : 0x8086
Device ID : 0x225c
Subsystem ID : 0x7d95
Coprocessor Stepping ID : 2
PCIe Width : Insufficient Privileges
PCIe Speed : Insufficient Privileges
PCIe Max payload size : Insufficient Privileges
PCIe Max read req size : Insufficient Privileges
Coprocessor Model : 0x01
Coprocessor Model Ext : 0x00
Coprocessor Type : 0x00
Coprocessor Family : 0x0b
Coprocessor Family Ext : 0x00
Coprocessor Stepping : C0
Board SKU : C0PRQ-7120 P/A/X/D
ECC Mode : Enabled
SMC HW Revision : Product 300W Passive CS
Cores
Total No of Active Cores : 61
Voltage : 1001000 uV
Frequency : 1238095 kHz
Thermal
Fan Speed Control : N/A
Fan RPM : N/A
Fan PWM : N/A
Die Temp : 43 C
GDDR
GDDR Vendor : Samsung
GDDR Version : 0x6
GDDR Density : 4096 Mb
GDDR Size : 15872 MB
GDDR Technology : GDDR5
GDDR Speed : 5.500000 GT/s
GDDR Frequency : 2750000 kHz
GDDR Voltage : 1501000 uV
[l_stadler_h@merlinx01 complex-test]$ lsb_release -d
Description: CentOS Linux release 7.0.1406 (Core) (Maipo)
So, any hints on how C++ complex numbers and arrays of these are tranferred easily from host to mic and back?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I reproduced the internal error and will forward the details to development (see internal tracking id below) and keep the post updated about progress on a fix and any work around.
Regarding hints on offloading complex data, was there a particular interest in the Virtual Shared model you tried using here?
(Internal tracking id: DPD200366703)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kevin,
no special interest, just trying to find a method how to transfer arrays of complex numbers to the mic, doing the calc there and transfer the resulting arrays of complex numbers back using SOME offload mechanism.
The only method that worked for me so far is the simple #pragma offload, and telling the compiler to ignore errors about not bitwise copyable data (option -wd2568), which seems not to be a great solution.
I would prefer OpenMP 4.0, since I'm used to that, but so far I was not able to use this method sucessfully, because the compiler reports strange linking errors on the attached example:
[l_stadler_h@merlinx01 matrix-test]$ icpc -DUSE_OFFLOAD -std=c++11 -mkl -finline-functions -fno-exceptions -fno-alias -qopenmp -Ofast -debug all mat-test.cpp -o matrix-test-off
/tmp/icpcurmzUU.o: In function `L__ZN12_GLOBAL__N_18mat_testIdEEijj_76__par_loop1_2_0':
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:79: undefined reference to `double std::norm<double>(std::complex<double> const&)'
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:79: undefined reference to `std::complex<double> std::operator*<double>(std::complex<double> const&, double const&)'
/tmp/icpcurmzUU.o: In function `L__ZN12_GLOBAL__N_18mat_testIdEEijj_67__par_region0_2_1':
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:91: undefined reference to `char* std::copy<__gnu_cxx::__normal_iterator<char*, std::string>, char*>(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, char*)'
/tmp/icpcurmzUU.o: In function `L__ZN12_GLOBAL__N_18mat_testIfEEijj_76__par_loop1_2_6':
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:79: undefined reference to `float std::norm<float>(std::complex<float> const&)'
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:79: undefined reference to `std::sqrt(float)'
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:79: undefined reference to `std::complex<float> std::operator*<float>(std::complex<float> const&, float const&)'
/tmp/icpcurmzUU.o: In function `L__ZN12_GLOBAL__N_18mat_testIfEEijj_67__par_region0_2_7':
/nfs/home/l_stadler_h/matrix-test/mat-test.cc:91: undefined reference to `char* std::copy<__gnu_cxx::__normal_iterator<char*, std::string>, char*>(__gnu_cxx::__normal_iterator<char*, std::string>, __gnu_cxx::__normal_iterator<char*, std::string>, char*)'
And my attempt at using Cilk_offload failed even more miserably.
So the situation looks quite miserable at the moment, especially when taking into account that all the simple benchmarks I did so far with complex math compiled natively for mic perform worse than on the CPUs.
Maybe intrinsics will help, I hope.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry about all this misery. I too tried the data marshaling model with your earlier test case and met with a non-bitwise copyable error for variable "r". I will look at your mat-test.cpp and consult w/Development as necessary about any possible OpenMP 4.0 solution. Stay tuned.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page