- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A simple program with SYCL kernel, running on CPU produce incorrect result (works fine running on GPU):
#include <sycl/sycl.hpp>
#include <stdio.h>
sycl::queue q(sycl::cpu_selector_v);
struct st {
int64_t a;
int64_t b[64];
};
void OffendingFunction(st *base)
{
q.parallel_for(1, [=](auto i) {
int n = 64-base[i].a;
for(int p=0; p<n; ++p) base[i].b[p]=0;
}).wait();
}
int main(void)
{
uint64_t *arr = sycl::malloc_device<uint64_t>(5000, q);
q.parallel_for(5000, [=](auto i) { arr[i] = 1; }).wait(); // 4000 works, 5000 does not
uint64_t tmp;
q.memcpy(&tmp, arr, sizeof(uint64_t)).wait();
printf("should equal 1: %llu\n", tmp);
return 0;
}Compile and run:
D:\SYCL-testing>icx -fsycl fill-test.cpp
Intel(R) oneAPI DPC++/C++ Compiler for applications running on Intel(R) 64, Version 2025.3.0 Build 20251010
Copyright (C) 1985-2025 Intel Corporation. All rights reserved.
D:\SYCL-testing>fill-test
should equal 1: 0What fixes the issue (any line works):
- Commenting out OffendingFunction() (which is not even called)
- Reducing parallel_for in main to 4000
- Adding cast to (int) before assigning to n: int n = 64-(int)base[i].a;
- Or changing assignment to n to a simpler one: int n = base[i].a;
Apparently, OffendingFunction cause no troubles at compile time, but somehow wrecks havoc at runtime.
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page