- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a minimal program:
#include <CL/sycl/queue.hpp>
#include <CL/sycl/device.hpp>
#include <CL/sycl/context.hpp>
#include <CL/sycl.hpp>
#include <iostream>
namespace
{
auto is_sign_same(sycl::short3 idx1, sycl::short3 idx2)
{
return (idx1 < 0) == (idx2 < 0);
}
} // namespace
int main()
{
sycl::device device = sycl::device{sycl::gpu_selector{}};
std::cout
<< "\n\nRunning occupancy grid profile. The profile will have the following "
"properties:\n\n Device:\t"
<< device.get_info<sycl::info::device::name>() << "\n\n";
sycl::context context{device};
sycl::property_list properties{sycl::property::queue::enable_profiling()};
sycl::queue queue{device, properties};
auto event = queue.submit(
[](sycl::handler& cgh)
{
// 1. This must be captured or it does not crash. If i put this in the
// kernel, then it does not fail.
sycl::id<3> robot_index{0, 0, 0};
sycl::stream out(1024, 256, cgh);
cgh.parallel_for(
sycl::range<3>{4, 4, 4},
[out, robot_index](sycl::id<3> id)
{
sycl::short3 new_signed_idx{short(0)};
// 2. I cannot remove the subtract between the 2 sycl::short3 here.
// It will not fail.
sycl::short3 old_signed_idx =
sycl::short3{
(short)id.get(0), (short)id.get(1), (short)id.get(2)} -
sycl::short3{
(short)robot_index.get(0),
(short)robot_index.get(1),
(short)robot_index.get(2)};
// 3. I cannot replace this function call with the operation that
// the function performs inline here. It does not fail.
auto s_same = is_sign_same(new_signed_idx, old_signed_idx);
out << s_same;
}
);
}
);
return 0;
}
When compiled using:
/opt/intel/oneapi/compiler/2022.1.0/linux/bin/dpcpp -fclang-abi-compat=7 -fsycl --gcc-toolchain=/usr -sycl-std=2020 -fp-model=precise -Wall -Werror -fsycl -O2 -g -DNDEBUG -std=gnu++17 sgfaulting_file.cpp
will fail at runtime. The failure is a segfault. It is caused by something do do with building the kernel. If we run the output in GDB we get the following stack trace when it dies:
(gdb) where
#0 0x00007f49e3683b8c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#1 0x00007f49e36b440c in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#2 0x00007f49e36b0dda in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#3 0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#4 0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#5 0x00007f49e36b0bed in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#6 0x00007f49e36b430f in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#7 0x00007f49e36bac6a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#8 0x00007f49e36bf027 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#9 0x00007f49e36bf908 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#10 0x00007f49e35ab7bc in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#11 0x00007f49e35abfba in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#12 0x00007f49e35ae90d in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#13 0x00007f49e36ec3d4 in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#14 0x00007f49e35b21fb in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#15 0x00007f49e36ced9a in ?? () from /usr/lib/x86_64-linux-gnu/libigc.so.1
#16 0x00007f49f487f1bb in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#17 0x00007f49f43ef178 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#18 0x00007f49f4397b33 in ?? () from /usr/lib/x86_64-linux-gnu/intel-opencl/libigdrcl.so
#19 0x00007f49f9327aa4 in cl::sycl::detail::ProgramManager::build(std::unique_ptr<_pi_program, _pi_result (*)(_pi_program*)>, std::shared_ptr<cl::sycl::detail::context_impl>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _pi_device* const&, std::map<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*>, _pi_program*, std::less<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> >, std::allocator<std::pair<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> const, _pi_program*> > >&, unsigned int) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#20 0x00007f49f9321336 in cl::sycl::detail::ProgramManager::getBuiltPIProgram(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*, bool) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#21 0x00007f49f932243c in cl::sycl::detail::ProgramManager::getOrCreateKernel(long, std::shared_ptr<cl::sycl::detail::context_impl> const&, std::shared_ptr<cl::sycl::detail::device_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cl::sycl::detail::program_impl const*) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#22 0x00007f49f93630f1 in cl::sycl::detail::enqueueImpKernel(std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::NDRDescT&, std::vector<cl::sycl::detail::ArgDesc, std::allocator<cl::sycl::detail::ArgDesc> >&, std::shared_ptr<cl::sycl::detail::kernel_bundle_impl> const&, std::shared_ptr<cl::sycl::detail::kernel_impl> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long const&, std::vector<_pi_event*, std::allocator<_pi_event*> >&, _pi_event**, std::function<void* (cl::sycl::detail::AccessorImplHost*)> const&) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#23 0x00007f49f9369f3b in cl::sycl::detail::ExecCGCommand::enqueueImp() ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#24 0x00007f49f93566c5 in cl::sycl::detail::Command::enqueue(cl::sycl::detail::EnqueueResultT&, cl::sycl::detail::BlockingT, std::vector<cl::sycl::detail::Command*, std::allocator<cl::sycl::detail::Command*> >&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#25 0x00007f49f9373b7b in cl::sycl::detail::Scheduler::addCG(std::unique_ptr<cl::sycl::detail::CG, std::default_delete<cl::sycl::detail::CG> >, std::shared_ptr<cl::sycl::detail::queue_impl>) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#26 0x00007f49f93aef30 in cl::sycl::handler::finalize() ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#27 0x00007f49f93dc3ea in cl::sycl::detail::queue_impl::finalizeHandler(cl::sycl::handler&, cl::sycl::detail::CG::CGTYPE const&, cl::sycl::event&) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#28 0x00007f49f93dc13b in cl::sycl::detail::queue_impl::submit_impl(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std::function<void (bool, bool, cl::sycl::event&)> const*) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#29 0x00007f49f93db744 in cl::sycl::detail::queue_impl::submit(std::function<void (cl::sycl::handler&)> const&, std::shared_ptr<cl::sycl::detail::queue_impl> const&, cl::sycl::detail::code_location const&, std:--Type <RET> for more, q to quit, c to continue without paging--
:function<void (bool, bool, cl::sycl::event&)> const*) ()
from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#30 0x00007f49f93db715 in cl::sycl::queue::submit_impl(std::function<void (cl::sycl::handler&)>, cl::sycl::detail::code_location const&) () from /opt/intel/oneapi/compiler/2022.1.0/linux/lib/libsycl.so.5
#31 0x00000000004026d8 in cl::sycl::queue::submit<main::{lambda(cl::sycl::handler&)#1}>(main::{lambda(cl::sycl::handler&)#1}, cl::sycl::detail::code_location const&) (this=0x7ffc5da1b200, CodeLoc=..., CGF=...)
at /opt/intel/oneapi/compiler/2022.1.0/linux/bin-llvm/../include/sycl/CL/sycl/queue.hpp:275
#32 main () at segfault_program.cpp:31
The important part being stack position #19:
`cl::sycl::detail::ProgramManager::build`
The runtime compilation is occurring on the device (from `sycl-ls`):
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) UHD Graphics [0x9bc4] 3.0 [22.28.23726.1]
if we run the same program but use a host or cpu selector, we do not fail to build and can run successfully. It also seems that if we change minimal details about the program, it also no longer segfaults. These small changes are detailed in the comments in the program. Any ideas?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems to be this operation:
return (idx1 < 0) == (idx2 < 0);
Changing it to an equivelent sign check makes the program compile. I think the compiler has a bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
We can see that you are directly declaring the below variable in submit block
sycl::id<3> robot_index{0, 0, 0};
Could you please try creating buffers and accessors in submit block in order to access in the kernel? The data on the "host" lives on the CPU normally and needs to be transferred to the "device" (GPU usually) in order to be accessed for use in an SYCL kernel.
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So no. This should be fully legal. Based on the standard:
https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:kernel.parameter.passing
Kernel parameter passing, 4.12.4:
Regardless of how the parameter is passed, the following rules define the allowable types for a kernel parameter:
-
Any device copyable type is a legal parameter type.
- ...
- id
- ...
-
vec<T, NumElements>.
I can move the construction of the vector out of the submit block, like this:
sycl::id<3> robot_index{0, 0, 0};
auto event = queue.submit(
[&robot_index](sycl::handler& cgh)
{
But it still fails.
Regardless, this does not fail at the run of the kernel, if you check the backtrace you will see it fails during kernel compilation. It fails because of the operation on the line:
return (idx1 < 0) == (idx2 < 0);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Furthermore, i tried your suggestions:
#include <CL/sycl.hpp>
#include <CL/sycl/context.hpp>
#include <CL/sycl/device.hpp>
#include <CL/sycl/queue.hpp>
#include <iostream>
namespace {
auto is_sign_same(sycl::short3 idx1, sycl::short3 idx2) {
return (idx1 < 0) == (idx2 < 0);
}
} // namespace
int main() {
sycl::device device = sycl::device{sycl::gpu_selector{}};
std::cout << "\n\nRunning occupancy grid profile. The profile will have the "
"following "
"properties:\n\n Device:\t"
<< device.get_info<sycl::info::device::name>() << "\n\n";
sycl::context context{device};
sycl::property_list properties{sycl::property::queue::enable_profiling()};
sycl::queue queue{device, properties};
sycl::short3 val;
sycl::buffer<sycl::short3> buf{&val, 1};
auto event = queue.submit([&buf](sycl::handler &cgh) {
sycl::stream out(1024, 256, cgh);
auto writer = buf.get_access<sycl::access_mode::read>(cgh);
cgh.parallel_for(sycl::range<3>{4, 4, 4}, [out, writer](sycl::id<3> id) {
sycl::short3 new_signed_idx{short(0)};
sycl::short3 old_signed_idx =
sycl::short3{(short)id.get(0), (short)id.get(1), (short)id.get(2)} -
sycl::short3{(short)writer[0].x(), (short)writer[0].z(),
(short)writer[0].y()};
auto s_same = is_sign_same(new_signed_idx, old_signed_idx);
out << s_same;
});
});
return 0;
}
And it still crashes. The backtrace is still:
#19 0x00007f0896a4faa4 in cl::sycl::detail::ProgramManager::build(std::unique_ptr<_pi_program, _pi_result (*)(_pi_program*)>, std::shared_ptr<cl::sycl::detail::context_impl>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, _pi_device* const&, std::map<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*>, _pi_program*, std::less<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> >, std::allocator<std::pair<std::pair<cl::sycl::detail::DeviceLibExt, _pi_device*> const, _pi_program*> > >&, unsigned int) ()
So this is absolutely a compiler bug.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could you please try replacing (idx1 < 0) == (idx2 < 0); with (idx1[0] < 0) == (idx2[0] < 0) ? As idx1 and idx2 are sycl::short3 type and they are not supposed to be compared to a single number 0 directly.
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi there,
You appear to have misunderstood how sycl works. The statement you have made:
As idx1 and idx2 are sycl::short3 type and they are not supposed to be compared to a single number 0 directly.
This is entirely incorrect. There is a specific operator for this:
|
Construct a new instance of the SYCL vec class template with the DataT parameter of RET with each element of the new SYCL vec instance the result of an element-wise OP relational operation between each element of lhs vec and the rhs scalar. Each element of the SYCL vec that is returned must be -1 if the operation results in true and 0 if the operation results in false. The ==, <, >, <= and >= operations result in false if either the lhs element or the rhs is a NaN. The != operation results in true if either the lhs element or the rhs is a NaN. The DataT template parameter of the constructed SYCL vec, RET, varies depending on the DataT template parameter of this SYCL vec. For a SYCL vec with DataT of type int8_t or uint8_t RET must be int8_t. For a SYCL vec with DataT of type int16_t, uint16_t or half RET must be int16_t. For a SYCL vec with DataT of type int32_t, uint32_t or float RET must be int32_t. For a SYCL vec with DataT of type int64_t, uint64_t or double RET must be uint64_t. |
From table 144 in the sycl vec interface: https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#_vec_interface
From this text:
> The DataT template parameter of the constructed SYCL vec, RET, varies depending on the DataT template parameter of this SYCL vec.
You could argue that the type given to `sycl::short3 < x` should be a short. But if we replace it with:
return (idx1 < short(0)) == (idx2 < short(0));
It still crashes. Even if you replace it with the direct `sycl::short3` comparitor:
return (idx1 < sycl::short3{0}) == (idx2 < sycl::short3{0});
It still crashes.
I guarantee you that this line is fully correct. The DPCPP compiler has a bug.
Maybe i can ask you a question. If this line was hypothetically incorrect, what would you expect?
a. The compiler fails during initial compilation with an error?
b. The runtime compilation fails and throws an exception?
c. The runtime compiler crashes with a segfault?
I would expect a, but am experiencing c.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the details.
We are working on your issue. We will get back to you soon.
Thanks & Regards,
Noorjahan.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page