Re:DPC++ crash inside llvm code in runtime, if ker...

sirgienko · ‎07-17-2020

The problem mostly described in title, but here more details.

In lambda, submitted in queue, we often create some output buffer, like this

auto addend_1_accessor = addend_1_buf.template get_access<sycl_read>(cgh);
auto addend_2_accessor = addend_2_buf.template get_access<sycl_read>(cgh);

// This is output (because we use write) buffer
auto sum_accessor = sum_buf.template get_access<sycl_write>(cgh);

And in kernel code, for example, inside parallel for, we modify it

cgh.parallel_for<class VectorAdd>(num_items, [=](id<1> wiID) {
    sum_accessor[wiID] = addend_1_accessor[wiID] + addend_2_accessor[wiID];
});

The problem is, that if you remove writing in the output, buffer, like this

cgh.parallel_for<class VectorAdd>(num_items, [=](id<1> wiID) {
    int sum = addend_1_accessor[wiID] + addend_2_accessor[wiID];
});

then your code will crush in runtime with error inside llvm code from DPC++ libOclCpuBackEnd library.

Minimal reproduce example is attached. File with building instruction, full backtrace and system description is attached too (inside archive).

RahulV_intel · ‎07-20-2020

Hi,

Looks like .so shared object file has got something to do with this error. Error is reproducible with CPU device(cpu_selector). However, It works fine with GPU device(gpu_selector). Tried with gen 9(Intel iGPU).

In the absence of dynamic library, the program(even without using output buffer) compiles/runs fine with CPU selector as well. Refer to the code below.

#include <CL/sycl.hpp>
#include <iostream>
#include <vector>

#define size 1024

void vecAdd(std::vector<int> &A, std::vector<int> &B, std::vector<int> &C) {
    //cl::sycl::queue q(cl::sycl::gpu_selector{});
    cl::sycl::queue q(cl::sycl::cpu_selector{});
    cl::sycl::range<1> R(size);
    std::cout<< "Running on: "<<q.get_device().get_info<cl::sycl::info::device::name>()<<"\n";
    {
    cl::sycl::buffer<int,1> buffA(A.data(),R);
    cl::sycl::buffer<int,1> buffB(B.data(),R);
    cl::sycl::buffer<int,1> buffC(C.data(),R);
    q.submit([&](cl::sycl::handler &cgh) {
            auto acc_buffA = buffA.get_access<cl::sycl::access::mode::read>(cgh);
            auto acc_buffB = buffB.get_access<cl::sycl::access::mode::read>(cgh);
            auto acc_buffC = buffC.get_access<cl::sycl::access::mode::write>(cgh);
            //cl::sycl::stream out(10240, 25, cgh);
            cgh.parallel_for(R,[=](cl::sycl::id<1> it) {
                auto id = it[0];
                //auto id = it.get_global_linear_id();
                //out << it.get_global_linear_id()<< cl::sycl::endl;
                //acc_buffC[id] = acc_buffB[id] + acc_buffA[id];
                int sum = acc_buffB[id] + acc_buffA[id];
            }
            );
        }
    ).wait();
    }
}

int main() {
    std::vector<int> A(size),B(size),C(size);
    for(int i=0;i<size;i++) {
        A[i] = i;
        B[i] = size - i;
    }
    vecAdd(A,B,C);
    std::vector<int> C_chk(size,size);
    (C==C_chk) ? std::cout << "Success\n" : std::cout<<"Failure\n";
    return 0;
}

//To compile and run
dpcpp vecadd_sample.cpp && ./a.out

I need to investigate a bit more on this issue(with dynamic library) and will get back to you at the earliest.

Thanks,

Rahul

RahulV_intel · ‎07-22-2020

Hi,

I have escalated this issue as a bug to the concerned team. Thanks for reporting this.

--Rahul

DPC++ crash inside llvm code in runtime, if kernel don't use output buffer

Intel® oneAPI Data Parallel C++