- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've seen a few posts on this kind of subject in the past, but nothing recently that seems to be relevant (unless I missed it). I tried to run what I thought was a basic TBB / offloading test based on one of the examples (see below). It works on a machine with attached and available phis, but not otherwise.
Teaser: offload pragma:
#pragma offload target(mic) in(size) in(data:length(size)), out(result)
Compiled (clean) with:
. /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64
icpc -std=c++11 -Wall -Wextra -Werror -tbb -o tbb-offload_t tbb-offload_t.cc
on a RHEL6-like system (SLF6.4), with:
[greenc@mic] ~ $ icpc -dumpversion
14.0.2
[greenc@mic] ~ $ which gcc
/usr/bin/gcc
[greenc@mic] ~ $ gcc -dumpversion
4.4.7
and the intel TBB as-shipped with the compiler.
On a system with Xeon Phi cards, I get an answer, as expected:
[greenc@phi1] ~ $ . /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64
[greenc@phi1] ~ $ ./tbb-offload_t
Result: 1.25e+13
According to everything I've read, I would have expected the same on a system with no mic cards also (and no recompilation), but that appears not to be the case:
[greenc@mic] ~ $ ./tbb-offload_t
offload error: cannot offload to MIC - device is not available
If anyone can point me in the direction of my mistake(s), I'd be grateful.
Code follows:
#pragma offload_attribute(push,target(mic)) #include "tbb/task_scheduler_init.h" #include "tbb/blocked_range.h" #include "tbb/parallel_reduce.h" #include "tbb/task.h" #pragma offload_attribute(pop) using namespace tbb; class __attribute__((target(mic))) ReduceTBB { public: ReduceTBB(float data[]) : data_(data), sum_(0.0f) { } ReduceTBB(ReduceTBB & x, split) : data_(x.data_), sum_(0.0f) { } float sum() const { return sum_; } void operator() (const blocked_range<size_t>& r) { for (size_t i = r.begin(), e = r.end(); i != e; ++i) { sum_ += data_; } } void join(const ReduceTBB & y) { sum_ += y.sum_; } private: float *data_; float sum_; }; __attribute__((target(mic))) float MICReductionTBB_(float *data, int size) { ReduceTBB redc(data); task_scheduler_init init; parallel_reduce(blocked_range<size_t>(0, size), redc); return redc.sum(); } float MICReductionTBB(float *data, int size) { float result(0.f); #pragma offload target(mic) in(size) in(data:length(size)), out(result) result = MICReductionTBB_(data, size); return result; } #include <iostream> #include <numeric> #include <vector> int main() { std::vector<float> data(5000000); std::iota(data.begin(), data.end(), 0.0f); std::cout << "Result: " << MICReductionTBB(data.data(), data.size()) << std::endl; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ack! Missed that little gem in the release notes, thank you. All better now! Sorry for the line noise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No problem. Glad you asked as that can always help others.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page