- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've seen a few posts on this kind of subject in the past, but nothing recently that seems to be relevant (unless I missed it). I tried to run what I thought was a basic TBB / offloading test based on one of the examples (see below). It works on a machine with attached and available phis, but not otherwise.
Teaser: offload pragma:
#pragma offload target(mic) in(size) in(data:length(size)), out(result)
Compiled (clean) with:
. /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64
icpc -std=c++11 -Wall -Wextra -Werror -tbb -o tbb-offload_t tbb-offload_t.cc
on a RHEL6-like system (SLF6.4), with:
[greenc@mic] ~ $ icpc -dumpversion
14.0.2
[greenc@mic] ~ $ which gcc
/usr/bin/gcc
[greenc@mic] ~ $ gcc -dumpversion
4.4.7
and the intel TBB as-shipped with the compiler.
On a system with Xeon Phi cards, I get an answer, as expected:
[greenc@phi1] ~ $ . /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64
[greenc@phi1] ~ $ ./tbb-offload_t
Result: 1.25e+13
According to everything I've read, I would have expected the same on a system with no mic cards also (and no recompilation), but that appears not to be the case:
[greenc@mic] ~ $ ./tbb-offload_t
offload error: cannot offload to MIC - device is not available
If anyone can point me in the direction of my mistake(s), I'd be grateful.
Code follows:
#pragma offload_attribute(push,target(mic))
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_reduce.h"
#include "tbb/task.h"
#pragma offload_attribute(pop)
using namespace tbb;
class
__attribute__((target(mic)))
ReduceTBB
{
public:
ReduceTBB(float data[]) : data_(data), sum_(0.0f) { }
ReduceTBB(ReduceTBB & x, split) : data_(x.data_), sum_(0.0f) { }
float sum() const { return sum_; }
void operator() (const blocked_range<size_t>& r)
{
for (size_t i = r.begin(), e = r.end(); i != e; ++i) {
sum_ += data_;
}
}
void join(const ReduceTBB & y) { sum_ += y.sum_; }
private:
float *data_;
float sum_;
};
__attribute__((target(mic)))
float
MICReductionTBB_(float *data, int size)
{
ReduceTBB redc(data);
task_scheduler_init init;
parallel_reduce(blocked_range<size_t>(0, size), redc);
return redc.sum();
}
float MICReductionTBB(float *data, int size)
{
float result(0.f);
#pragma offload target(mic) in(size) in(data:length(size)), out(result)
result = MICReductionTBB_(data, size);
return result;
}
#include <iostream>
#include <numeric>
#include <vector>
int main() {
std::vector<float> data(5000000);
std::iota(data.begin(), data.end(), 0.0f);
std::cout << "Result: "
<< MICReductionTBB(data.data(), data.size())
<< std::endl;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ack! Missed that little gem in the release notes, thank you. All better now! Sorry for the line noise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No problem. Glad you asked as that can always help others.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page