Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
184 Views

launch a kernel inside a template function

Jump to solution

Hi,

When I use a plain function defining and launching a kernel it works fine but when I try to use a template function I have a compilation error :

/src/expvec.cpp:130:28: error: reference to non-static member function must be called
        auto acc_a = buf_a.get_access<access::mode::read>(h);// Create accessors

 

template <class T>
double computeOnDevice(const std::vector<T> & a, 
                     const std::vector<T> & bref,
                     std::vector<T> & b){                      
  cout << "vector size: " << a.size() << " \n\n";
  // host_selector device_selector; //cpu/gpu/host/defaut_selector
  DeviceSelector device_selector;
   // Create a device queue using DPC++ class queue
  queue q(device_selector, dpc_common::exception_handler);
  cout << "Computing on device ..\n";
  PrintTargetInfo(q);// Display info about device
  // Start timer
  dpc_common::TimeInterval t_offload;// Start timer
  const size_t vsize=a.size();
  {  // Begin buffer scope
    buffer buf_a(&a[0], range(vsize));// Create buffers using DPC++ class buffer
    buffer buf_b(&b[0], range(vsize));
    q.submit([&](auto &h) {// Submit command group for execution
        auto acc_a = buf_a.get_access<access::mode::read>(h);// Create accessors
        auto acc_b = buf_b.get_access<access::mode::write>(h);
        
        auto global_range = range<1>(vsize);// Define local and global range

        h.parallel_for(global_range,[=](id<1> i){
          for (size_t iter=0 ; iter<NITER ; iter++) acc_b[i]+=cl::sycl::acos(cl::sycl::cos(acc_a[i]));
        });
    });  // end for
  }  // buffer scope (should wait for b to be accessible)

 

while changing to a plain function is OK :

 

double computeOnDevice(const std::vector<Scalar> & a, 
                     const std::vector<Scalar> & bref,
                     std::vector<Scalar> & b){                  
  cout << "vector size: " << a.size() << " \n\n";
  // host_selector device_selector; //cpu/gpu/host/defaut_selector
  DeviceSelector device_selector;
  ...
)

I attach the corresponding complete code.

Any hint ?

 

 

0 Kudos

Accepted Solutions
Highlighted
Moderator
140 Views

Hi,

 

To attach your source files, you need to zip them first and attach it, since .cpp file format is not supported as a valid file format for attachment.

 

A small correction in your code.

Since you are working on templates, you need to use buf.template get_access<>() method on accessors(Line 68 and 69).

 

Inserting modified code for your reference.

 

#include <fstream>
#include <iostream>
#include <CL/sycl.hpp>
#include <chrono>
#include <cmath>
#include <cstring>
#include <stdio.h>
#include <iostream>
// #include "dpc_common.hpp"

using namespace cl::sycl;
using namespace std;

constexpr int NITER=100;
using Scalar=float;
using DeviceSelector=gpu_selector;

static auto exception_handler = [](cl::sycl::exception_list eList) {
  for (std::exception_ptr const &e : eList) {
    try {
      std::rethrow_exception(e);
    } catch (std::exception const &e) {
#if _DEBUG
      std::cout << "Failure" << std::endl;
#endif
      std::terminate();
    }
  }
};

/*
 * Host-Code
 * Utility function to print device info
 */
void PrintTargetInfo(queue& q) {
  auto device = q.get_device();
  auto max_block_size =
      device.get_info<info::device::max_work_group_size>();

  auto max_EU_count =
      device.get_info<info::device::max_compute_units>();

  cout<< " Running on " << device.get_info<info::device::name>()<<"\n";
  std::string dn=device.get_info<info::device::name>();
  cout<< " The Device Max Work Group Size is : "<< max_block_size<<"\n";
  cout<< " The Device Max EUCount is : " << max_EU_count<<"\n";
}


// PROBLEM : try to use the following template signature
// in place of the actual plain function signature
// -> I got error: reference to non-static member function must be called
//
// template <class T>
// double computeOnDevice(const std::vector<T> & a, 
//                      std::vector<T> & b){  
double computeOnDevice(const std::vector<Scalar> & a, 
                       std::vector<Scalar> & b){                                                      
  DeviceSelector device_selector;
  queue q(device_selector, exception_handler);
  cout << "Computing on device ..\n";
  PrintTargetInfo(q);// Display info about device
  auto start=std::chrono::high_resolution_clock::now();
  const size_t vsize=a.size();
  {  // Begin buffer scope
    buffer buf_a(&a[0], range(vsize));// Create buffers using DPC++ class buffer
    buffer buf_b(&b[0], range(vsize));
    q.submit([&](auto &h) {// Submit command group for execution
            auto acc_a = buf_a.template get_access<access::mode::read>(h);// Create accessors
            auto acc_b = buf_b.template get_access<access::mode::write>(h);
        
        auto global_range = range<1>(vsize);// Define local and global range

        h.parallel_for(global_range,[=](id<1> i){
          for (size_t iter=0 ; iter<NITER ; iter++) acc_b[i]+=cl::sycl::acos(cl::sycl::cos(acc_a[i]));
        });
    });  // end for
  }  // buffer scope (should wait for b to be accessible)
  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  double time = elapsed_seconds.count();
  cout << "Offload time: " << time << " s\n\n";
  return time;
}



int main(int argc, char* argv[]) {

  const size_t s=1e6;
  std::vector<Scalar> a(s,Scalar(0));
  std::vector<Scalar> b(s,Scalar(1));

  computeOnDevice(a,b);

  return 0;
}

 

 

Regards,

Rahul

View solution in original post

6 Replies
Highlighted
Moderator
163 Views

Hi,


I don't see anything in the attachment. I need to take a look at your host code too, to understand why this is happening. Could you zip your source file and attach it again?


Thanks,

Rahul


0 Kudos
Highlighted
159 Views

Well, I can't attach the file

0 Kudos
Highlighted
158 Views

So here is the complete source file

#include <fstream>
#include <iostream>
#include <CL/sycl.hpp>
#include <chrono>
#include <cmath>
#include <cstring>
#include <stdio.h>
#include <iostream>
// #include "dpc_common.hpp"

using namespace cl::sycl;
using namespace std;

constexpr int NITER=100;
using Scalar=float;
using DeviceSelector=gpu_selector;

static auto exception_handler = [](cl::sycl::exception_list eList) {
  for (std::exception_ptr const &e : eList) {
    try {
      std::rethrow_exception(e);
    } catch (std::exception const &e) {
#if _DEBUG
      std::cout << "Failure" << std::endl;
#endif
      std::terminate();
    }
  }
};

/*
 * Host-Code
 * Utility function to print device info
 */
void PrintTargetInfo(queue& q) {
  auto device = q.get_device();
  auto max_block_size =
      device.get_info<info::device::max_work_group_size>();

  auto max_EU_count =
      device.get_info<info::device::max_compute_units>();

  cout<< " Running on " << device.get_info<info::device::name>()<<"\n";
  std::string dn=device.get_info<info::device::name>();
  cout<< " The Device Max Work Group Size is : "<< max_block_size<<"\n";
  cout<< " The Device Max EUCount is : " << max_EU_count<<"\n";
}


// PROBLEM : try to use the following template signature
// in place of the actual plain function signature
// -> I got error: reference to non-static member function must be called
//
// template <class T>
// double computeOnDevice(const std::vector<T> & a, 
//                      std::vector<T> & b){  
double computeOnDevice(const std::vector<Scalar> & a, 
                       std::vector<Scalar> & b){                                                      
  DeviceSelector device_selector;
  queue q(device_selector, exception_handler);
  cout << "Computing on device ..\n";
  PrintTargetInfo(q);// Display info about device
  auto start=std::chrono::high_resolution_clock::now();
  const size_t vsize=a.size();
  {  // Begin buffer scope
    buffer buf_a(&a[0], range(vsize));// Create buffers using DPC++ class buffer
    buffer buf_b(&b[0], range(vsize));
    q.submit([&](auto &h) {// Submit command group for execution
        auto acc_a = buf_a.get_access<access::mode::read>(h);// Create accessors
        auto acc_b = buf_b.get_access<access::mode::write>(h);
        
        auto global_range = range<1>(vsize);// Define local and global range

        h.parallel_for(global_range,[=](id<1> i){
          for (size_t iter=0 ; iter<NITER ; iter++) acc_b[i]+=cl::sycl::acos(cl::sycl::cos(acc_a[i]));
        });
    });  // end for
  }  // buffer scope (should wait for b to be accessible)
  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  double time = elapsed_seconds.count();
  cout << "Offload time: " << time << " s\n\n";
  return time;
}



int main(int argc, char* argv[]) {

  const size_t s=1e6;
  std::vector<Scalar> a(s,Scalar(0));
  std::vector<Scalar> b(s,Scalar(1));

  computeOnDevice(a,b);

  return 0;
}
0 Kudos
Highlighted
Moderator
141 Views

Hi,

 

To attach your source files, you need to zip them first and attach it, since .cpp file format is not supported as a valid file format for attachment.

 

A small correction in your code.

Since you are working on templates, you need to use buf.template get_access<>() method on accessors(Line 68 and 69).

 

Inserting modified code for your reference.

 

#include <fstream>
#include <iostream>
#include <CL/sycl.hpp>
#include <chrono>
#include <cmath>
#include <cstring>
#include <stdio.h>
#include <iostream>
// #include "dpc_common.hpp"

using namespace cl::sycl;
using namespace std;

constexpr int NITER=100;
using Scalar=float;
using DeviceSelector=gpu_selector;

static auto exception_handler = [](cl::sycl::exception_list eList) {
  for (std::exception_ptr const &e : eList) {
    try {
      std::rethrow_exception(e);
    } catch (std::exception const &e) {
#if _DEBUG
      std::cout << "Failure" << std::endl;
#endif
      std::terminate();
    }
  }
};

/*
 * Host-Code
 * Utility function to print device info
 */
void PrintTargetInfo(queue& q) {
  auto device = q.get_device();
  auto max_block_size =
      device.get_info<info::device::max_work_group_size>();

  auto max_EU_count =
      device.get_info<info::device::max_compute_units>();

  cout<< " Running on " << device.get_info<info::device::name>()<<"\n";
  std::string dn=device.get_info<info::device::name>();
  cout<< " The Device Max Work Group Size is : "<< max_block_size<<"\n";
  cout<< " The Device Max EUCount is : " << max_EU_count<<"\n";
}


// PROBLEM : try to use the following template signature
// in place of the actual plain function signature
// -> I got error: reference to non-static member function must be called
//
// template <class T>
// double computeOnDevice(const std::vector<T> & a, 
//                      std::vector<T> & b){  
double computeOnDevice(const std::vector<Scalar> & a, 
                       std::vector<Scalar> & b){                                                      
  DeviceSelector device_selector;
  queue q(device_selector, exception_handler);
  cout << "Computing on device ..\n";
  PrintTargetInfo(q);// Display info about device
  auto start=std::chrono::high_resolution_clock::now();
  const size_t vsize=a.size();
  {  // Begin buffer scope
    buffer buf_a(&a[0], range(vsize));// Create buffers using DPC++ class buffer
    buffer buf_b(&b[0], range(vsize));
    q.submit([&](auto &h) {// Submit command group for execution
            auto acc_a = buf_a.template get_access<access::mode::read>(h);// Create accessors
            auto acc_b = buf_b.template get_access<access::mode::write>(h);
        
        auto global_range = range<1>(vsize);// Define local and global range

        h.parallel_for(global_range,[=](id<1> i){
          for (size_t iter=0 ; iter<NITER ; iter++) acc_b[i]+=cl::sycl::acos(cl::sycl::cos(acc_a[i]));
        });
    });  // end for
  }  // buffer scope (should wait for b to be accessible)
  auto end = std::chrono::high_resolution_clock::now();
  std::chrono::duration<double> elapsed_seconds = end-start;
  double time = elapsed_seconds.count();
  cout << "Offload time: " << time << " s\n\n";
  return time;
}



int main(int argc, char* argv[]) {

  const size_t s=1e6;
  std::vector<Scalar> a(s,Scalar(0));
  std::vector<Scalar> b(s,Scalar(1));

  computeOnDevice(a,b);

  return 0;
}

 

 

Regards,

Rahul

View solution in original post

Highlighted
130 Views

I should have thought about this.

Thank you very much Rahul !

0 Kudos
Highlighted
Moderator
104 Views

Good to know that it worked.


Intel will no longer monitor this thread. However, this thread will remain open for community discussion.


0 Kudos