Re: Re:Vector and accessor operations with each other inside kernel

Suraj · ‎08-27-2020

Hello

In the attached code, operations (like addition/multiplication) between a vector defined inside the kernel and an accessor seems to throw following error compiling,

InvalidFunctionCall: Unexpected llvm intrinsic: llvm.intel.std.container.ptr.p4f32 [Src: /netbatch/donb47193_00/runDir/55/20200715_000000/llvm/llvm-spirv/lib/SPIRV/SPIRVWriter.cpp:2125 ]
llvm-foreach:
dpcpp: error: llvm-spirv command failed with exit code 1 (use -v to see invocation)

The operations seem to compile for arrays and accessors. Can you please advise on the nature of this error/ problem ?

System info: oneAPI beta-08, Ubuntu 18.04.4 LTS
Intel® Core™ i7-7700HQ CPU @ 2.80GHz × 8, Intel® HD Graphics 630 (Kaby Lake GT2)

RahulV_intel · ‎08-28-2020

Hi Suraj,

Vector declaration/operations are not allowed inside the kernel. Only a few functions from the "std" namespace are supported inside the kernel currently.

Please refer to the below link for supported std functions inside the kernel:

https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-library-guide/top/tested-standard-c-apis.html

Please do note that when you are creating data inside the parallel_for kernel, every work-item will create the same data inside its private memory. Since, the data is getting created directly on the device, you will not be able to access it on the host. If this is what you were intending to do (as per your sample code), kindly refer to the modified code below (Here I have used a float array instead of a float vector).

#include <iostream>
#include <vector>
#include <array>
#include <CL/sycl.hpp>

using namespace std;
using namespace cl::sycl;

constexpr auto dp_r = cl::sycl::access::mode::read;
constexpr auto dp_rw = cl::sycl::access::mode::read_write;
constexpr auto dp_w = cl::sycl::access::mode::write;


int main(){

  gpu_selector device_selector;
  //cpu_selector device_selector;
  //host_selector device_selector;

  queue q(device_selector);
  std::cout << "Device: " << q.get_device().get_info<info::device::name>() << std::endl;
  
  std::vector<float> cellVec(5); 
 
  for(int i=0; i<5; i++)
    cellVec[i] = 2.;
  
  const auto cellVecsize = cellVec.size(); 
  auto R = range<1>(cellVecsize);
  { //Sycl scope begins
  buffer<float> B1(cellVec.data(), R);
  
  q.submit([&](handler &h) {
      auto acc1 = B1.get_access<dp_rw>(h);
      h.parallel_for(R, [=](id<1> ind) { 
        size_t i = ind.get(0);
        float mul[5] = {10.0,20.0,30.0,40.0,50.0};
        //std::vector<float> vec1(5,2);
        //std::vector<float> vec2(5);
        //acc1[i] += vec1[i];
        for(int j=0;j<5;j++)
            acc1[i] += mul[j];
        //vec2[i] = acc1[i] * vec1[i];
        //vec2[i] = vec1[i] * vec1[i];
        });
    }).wait_and_throw();
  } //Sycl scope ends

  for(auto i:cellVec)
    cout<<i<<" ";
  cout<<"\n";
  
  return 0;
}

In the above code sample, I'm creating a float array inside the parallel_for kernel and running a for loop for every work-item. Like I said, every work-item will create a new float array of size 5 and the for loop will add the values of every index of the float array to the accessor. This computation will repeat for every work-item. Note that I'm enclosing a new sycl scope that starts before the buffer creation and ends after the kernel computation. This scope will ensure that the data gets copied back to the host implicitly once the kernel finishes its execution on the device.

Also, do note that you don't have to explicitly typecast cellVec.data() as (float*) since you have already given buffer<float> while declaration. This buffer will take ownership of the data beginning from cellVec.data() (Initial vector address) and continues till the end as specified by the range argument.

Hope this helps.

Regards,

Rahul

Suraj · ‎08-28-2020

Hello Rahul

Thank you for the clarification. Will vector declaration/operations be possible inside the kernel in the near future releases?

RahulV_intel · ‎09-01-2020

Hi,

Since its a feature request, I've informed the concerned team regarding the same. Thanks for your feedback.

Regards,

Rahul

Suraj · ‎09-01-2020

Hello Rahul

Thank you. On an unrelated note, can questions on using cuda backend be asked here? or it has to be only in the Intel llvm repo?

Subarnarek_G_Intel · ‎09-03-2020

Before I start investigating in this case further, I would like to know what do you want to achieve by using a vector inside the kernel?

Suraj · ‎09-03-2020

Hello

The first case I encountered was trying to call a function from inside the kernel that did some calculation and had a vector defined inside. But, I changed it to array as a vector is not allowed yet. Another use case of mine is I need temporary vectors (actually vector of arrays currently) being defined and operated on which belong only inside the kernel. I understand them being created for every work item. Right now I'm creating empty buffers and accessors and using inside the kernel. Vectors for me would completely encapsulate the temp data/operations inside the kernel only and I won't need to create many buffers/accessors (of different range dimensions) and a bit of cleaner code. This is my use case for the need for vectors and thought process.

Subarnarek_G_Intel · ‎02-16-2021

Vector is supported today by oneAPI. Attaching an example for your reference.

Subarnarek_G_Intel · ‎02-16-2021

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

Vector and accessor operations with each other inside kernel

Intel® oneAPI Data Parallel C++