Intel® oneAPI DPC++/C++ Compiler
Talk to fellow users of Intel® oneAPI DPC++/C++ Compiler and companion tools like Intel® oneAPI DPC++ Library, Intel® DPC++ Compatibility Tool, and Intel® Distribution for GDB*
720 Discussions

DPC++ crash inside llvm code in runtime, if kernel don't use output buffer

sirgienko
Employee
1,944 Views

The problem mostly described in title, but here more details.

In lambda, submitted in queue, we often create some output buffer, like this

auto addend_1_accessor = addend_1_buf.template get_access<sycl_read>(cgh);
auto addend_2_accessor = addend_2_buf.template get_access<sycl_read>(cgh);

// This is output (because we use write) buffer
auto sum_accessor = sum_buf.template get_access<sycl_write>(cgh);

And in kernel code, for example, inside parallel for, we modify it

cgh.parallel_for<class VectorAdd>(num_items, [=](id<1> wiID) {
    sum_accessor[wiID] = addend_1_accessor[wiID] + addend_2_accessor[wiID];
});

The problem is, that if you remove writing in the output, buffer, like this 

cgh.parallel_for<class VectorAdd>(num_items, [=](id<1> wiID) {
    int sum = addend_1_accessor[wiID] + addend_2_accessor[wiID];
});

then your code will crush in runtime with error inside llvm code from DPC++ libOclCpuBackEnd library. 

Minimal reproduce example is attached. File with building instruction, full backtrace and system description is attached too (inside archive).

Labels (1)
0 Kudos
2 Replies
RahulV_intel
Moderator
1,930 Views

Hi,

 

Looks like .so shared object file has got something to do with this error. Error is reproducible with CPU device(cpu_selector). However, It works fine with GPU device(gpu_selector). Tried with gen 9(Intel iGPU).

 

In the absence of dynamic library, the program(even without using output buffer) compiles/runs fine with CPU selector as well. Refer to the code below.

 

#include <CL/sycl.hpp>
#include <iostream>
#include <vector>

#define size 1024

void vecAdd(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C) {
    //cl::sycl::queue q(cl::sycl::gpu_selector{});
    cl::sycl::queue q(cl::sycl::cpu_selector{});
    cl::sycl::range<1> R(size);
    std::cout<< "Running on: "<<q.get_device().get_info<cl::sycl::info::device::name>()<<"\n";
    {
    cl::sycl::buffer<int,1> buffA(A.data(),R);
    cl::sycl::buffer<int,1> buffB(B.data(),R);
    cl::sycl::buffer<int,1> buffC(C.data(),R);
    q.submit([&](cl::sycl::handler &cgh) {
            auto acc_buffA = buffA.get_access<cl::sycl::access::mode::read>(cgh);
            auto acc_buffB = buffB.get_access<cl::sycl::access::mode::read>(cgh);
            auto acc_buffC = buffC.get_access<cl::sycl::access::mode::write>(cgh);
            //cl::sycl::stream out(10240, 25, cgh);
            cgh.parallel_for(R,[=](cl::sycl::id<1> it) {
                auto id = it[0];
                //auto id = it.get_global_linear_id();
                //out << it.get_global_linear_id()<< cl::sycl::endl;
                //acc_buffC[id] = acc_buffB[id] + acc_buffA[id];
                int sum = acc_buffB[id] + acc_buffA[id];
            }
            );
        }
    ).wait();
    }
}

int main() {
    std::vector<int> A(size),B(size),C(size);
    for(int i=0;i<size;i++) {
        A[i] = i;
        B[i] = size - i;
    }
    vecAdd(A,B,C);
    std::vector<int> C_chk(size,size);
    (C==C_chk) ? std::cout << "Success\n" : std::cout<<"Failure\n";
    return 0;
}
//To compile and run
dpcpp vecadd_sample.cpp && ./a.out

 

 

I need to investigate a bit more on this issue(with dynamic library) and will get back to you at the earliest.

 

Thanks,

Rahul

 

0 Kudos
RahulV_intel
Moderator
1,898 Views

Hi,


I have escalated this issue as a bug to the concerned team. Thanks for reporting this.


--Rahul


0 Kudos
Reply