Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

14.0.2 can't do fallback for offload with no mics?

Chris_G_
Beginner
887 Views
 
 

Hi,

I've seen a few posts on this kind of subject in the past, but nothing recently that seems to be relevant (unless I missed it). I tried to run what I thought was a basic TBB / offloading test based on one of the examples (see below). It works on a machine with attached and available phis, but not otherwise.

Teaser: offload pragma:

#pragma offload target(mic) in(size) in(data:length(size)), out(result)

Compiled (clean) with:

. /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64

icpc -std=c++11 -Wall -Wextra -Werror -tbb -o tbb-offload_t tbb-offload_t.cc

on a RHEL6-like system (SLF6.4), with:

[greenc@mic] ~ $ icpc -dumpversion
14.0.2
[greenc@mic] ~ $ which gcc
/usr/bin/gcc
[greenc@mic] ~ $ gcc -dumpversion
4.4.7

and the intel TBB as-shipped with the compiler.

On a system with Xeon Phi cards, I get an answer, as expected:

[greenc@phi1] ~ $ . /opt/intel/composer_xe_2013_sp1.2.144/bin/iccvars.sh intel64
[greenc@phi1] ~ $ ./tbb-offload_t
Result: 1.25e+13

According to everything I've read, I would have expected the same on a system with no mic cards also (and no recompilation), but that appears not to be the case:

[greenc@mic] ~ $ ./tbb-offload_t
offload error: cannot offload to MIC - device is not available

If anyone can point me in the direction of my mistake(s), I'd be grateful.

Code follows:

#pragma offload_attribute(push,target(mic))
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_reduce.h"
#include "tbb/task.h"
#pragma offload_attribute(pop)

using namespace tbb;

class
__attribute__((target(mic)))
ReduceTBB
{
public:
  ReduceTBB(float data[]) : data_(data), sum_(0.0f) { }
  ReduceTBB(ReduceTBB & x, split) : data_(x.data_), sum_(0.0f) { }

  float sum() const { return sum_; }
  void operator() (const blocked_range<size_t>& r)
  {
    for (size_t i = r.begin(), e = r.end(); i != e; ++i) {
      sum_ += data_;
    }
  }

  void join(const ReduceTBB & y) { sum_ += y.sum_; }

private:
  float *data_;
  float sum_;
};

__attribute__((target(mic)))
float
MICReductionTBB_(float *data, int size)
{
  ReduceTBB redc(data);
  task_scheduler_init init;
  parallel_reduce(blocked_range<size_t>(0, size), redc);
  return redc.sum();
}

float MICReductionTBB(float *data, int size)
{
  float result(0.f);
#pragma offload target(mic) in(size) in(data:length(size)), out(result)
  result = MICReductionTBB_(data, size);
  return result;
}

#include <iostream>
#include <numeric>
#include <vector>

int main() {
  std::vector<float> data(5000000);
  std::iota(data.begin(), data.end(), 0.0f);
  std::cout << "Result: "
            << MICReductionTBB(data.data(), data.size())
            << std::endl;
}

 

 
0 Kudos
1 Solution
Kevin_D_Intel
Employee
887 Views

The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.

View solution in original post

0 Kudos
3 Replies
Kevin_D_Intel
Employee
888 Views

The default was changed from optional to mandatory in 14.0 so just add the optional clause to your #pragma offload and the code will run on the system without any coprocessors too.

0 Kudos
Chris_G_
Beginner
887 Views

Ack! Missed that little gem in the release notes, thank you. All better now! Sorry for the line noise.

0 Kudos
Kevin_D_Intel
Employee
887 Views

No problem. Glad you asked as that can always help others.

0 Kudos
Reply