Newbie problem. I have a oneAPI program that is intended to run on the graphics unit of an intel TGL. The base code is openCL based and has successfully run under python. I have also been able to run a simple multiply add with openCL, but I am attempting to port over these more complicated kernels to run under openCL. When I run my program on the CPU core, I get no run time errors. However, when running on a TGL system, I get the following errors which I don't know how to resolve:
Running on Intel(R) Gen12LP HD Graphics NEO
Native API failed. Native API returns: -50 (CL_INVALID_ARG_VALUE) -50 (CL_INVALID_ARG_VALUE)
I found this post:
So I sourced the setvars.sh
source /opt/intel/oneapi/setvars.sh
...
:: oneAPI environment initialized ::
But I get the same error. I am thinking it is the calls, but these calls are the same as the ones I am using for my simple multiply add.
Here is the calls into the parallel program:
int ComputeExecuteXtalk_init(Compute *vector) {
try {
std::cout<<"Running on "<<vector->q->get_device().get_info<sycl::info::device::name>()<<"\n";
const float A(0.00);
vector->q->submit([&](sycl::handler& h) {
auto baseAddress = vector->d_W->get_access<sycl::access::mode::read_write>(h);
sycl::stream out(1000000, 256, h);
int debug = vector->debug;
// code copied from molten and addapted to oneAPI calls
// as #define executeWritePhaseXTalk(iteration, baseAddress, subKernelInfo, errorHandlerInfo)
// in file xTalk.h
h.parallel_for<class xtalk_init>( sycl::range<1>{vector->threads}, [=] (sycl::id<1> it) {
const int threadId = it[0];
int debugcount=0;
if(threadId == 0){
....
What statements produce CL_INVALID_ARG_VALUE and what should I be looking for?
Link Copied
Hi,
Thanks for reporting this issue. Could you please attach your minimal reproducible code (compilable), so that I can try it out at my end?
Regards,
Rahul
I turns out that a fellow employee took my code and isolated to statements within my kernel that were referencing pointers off from the processor core and not accessible from my device. (I am learning a bit more about parallel programming on graphics devices - the program does work on a core only system).
One example statement that was causing a problem was:
unsigned long randomNumber = vector->seed;
vector was being passed into the launching of my kernel but is only known from the core memory space. A rewrite where the seed is passed in as a variable fixed this problem. I would suggest that the compiler report a warning if a variable being used in the kernel may not be available to EUs. I would think that possibility would be known at compile time. It isn't wrong if the execution device can find the variable.
I would still like to understand the statement (that I thought was the problem)
h.parallel_for<class xtalk_init>
Is the class xtalk_init pretty much any name I want, it doesn't have to match an actual class? I understand I cannot use the same name for all loops due to link errors, but I am not sure what this is doing. Is there a good book that explains this with examples (vs the specs I am finding on oneAPI website that doesn't explain what happens).
Hi,
That's right. Since the vector is created on the host, it is not accessible on the device. Regarding compiler warning, I can check with the team on this and get back to you.
Regarding kernel naming convention, when using the DPC++ compiler, the kernel name is optional (additional compilation flags not needed). When using Clang or any other compiler, you may pass the compilation flag -fsycl-unnamed-lambda to avoid passing kernel names to SYCL lambdas (As per SYCL specs 1.2.1, kernel name is mandatory but it can be avoided by passing the compilation flag -fsycl-unnamed-lambda).
However, as per the latest SYCL 2020 specs, it looks they have made the kernel name optional.
SYCL kernels are extracted from C++ source files and stored in an implementation-defined format. When the SYCL runtime needs to enqueue a SYCL kernel, it is necessary for the SYCL runtime to load the kernel and pass it to a SYCL backend API. This requires the kernel to have a name that is unique at enclosing namespace scope, to enable an association between the kernel invocation and the kernel itself. The association is achieved using a kernel name, which is a C++ type name.
For a lambda function, the user may optionally provide a name for debugging or other reasons. In SYCL, this optional name is provided as a template parameter to the kernel invocation, e.g. parallel_for, and this name may optionally be forward declared at namespace scope (but must always avoid conflict with another name at enclosing namespace scope).
SYCL provisional specs (2020):
https://www.khronos.org/registry/SYCL/specs/sycl-2020-provisional.pdf (3.10.2)
SYCL specs 2020:
https://www.khronos.org/registry/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf (5.2)
DPC++ book:
https://www.apress.com/us/book/9781484255735
Thanks,
Rahul
Hi,
Could you please try to run your original code (The original code with the error CL_INVALID_ARG_VALUE) on the Level-zero backend and let me know if the error message is more meaningful?
To set the backend to Level-zero:
export SYCL_BE=PI_LEVEL0
If there is no change in the runtime error message, I request you to share your minimal reproducible code.
Thanks,
Rahul
Hi @Mark7,
Just a quick reminder to try running your original code on the level0 backend and let us know if there is any change in the error message.
Thanks,
Rahul
For more complete information about compiler optimizations, see our Optimization Notice.