What calls produce CL_INVALID_ARG_VALUE and are there examples of how to fix?

Mark7 · ‎02-11-2021

Newbie problem. I have a oneAPI program that is intended to run on the graphics unit of an intel TGL. The base code is openCL based and has successfully run under python. I have also been able to run a simple multiply add with openCL, but I am attempting to port over these more complicated kernels to run under openCL. When I run my program on the CPU core, I get no run time errors. However, when running on a TGL system, I get the following errors which I don't know how to resolve:

Running on Intel(R) Gen12LP HD Graphics NEO
Native API failed. Native API returns: -50 (CL_INVALID_ARG_VALUE) -50 (CL_INVALID_ARG_VALUE)

I found this post:

https://community.intel.com/t5/Intel-oneAPI-Base-Toolkit/dpct-s-result-of-dot-does-not-run-successfully/m-p/1211549

So I sourced the setvars.sh

source /opt/intel/oneapi/setvars.sh

...

:: oneAPI environment initialized ::

But I get the same error. I am thinking it is the calls, but these calls are the same as the ones I am using for my simple multiply add.

Here is the calls into the parallel program:

int ComputeExecuteXtalk_init(Compute *vector) {
try {

std::cout<<"Running on "<<vector->q->get_device().get_info<sycl::info::device::name>()<<"\n";

const float A(0.00);

vector->q->submit([&](sycl::handler& h) {

auto baseAddress = vector->d_W->get_access<sycl::access::mode::read_write>(h);
sycl::stream out(1000000, 256, h);
int debug = vector->debug;

// code copied from molten and addapted to oneAPI calls
// as #define executeWritePhaseXTalk(iteration, baseAddress, subKernelInfo, errorHandlerInfo)
// in file xTalk.h
h.parallel_for<class xtalk_init>( sycl::range<1>{vector->threads}, [=] (sycl::id<1> it) {
const int threadId = it[0];
int debugcount=0;
if(threadId == 0){

....

What statements produce CL_INVALID_ARG_VALUE and what should I be looking for?

RahulV_intel · ‎02-15-2021

Hi,

Thanks for reporting this issue. Could you please attach your minimal reproducible code (compilable), so that I can try it out at my end?

Regards,

Rahul

Mark7 · ‎02-16-2021

I turns out that a fellow employee took my code and isolated to statements within my kernel that were referencing pointers off from the processor core and not accessible from my device. (I am learning a bit more about parallel programming on graphics devices - the program does work on a core only system).

One example statement that was causing a problem was:

unsigned long randomNumber = vector->seed;

vector was being passed into the launching of my kernel but is only known from the core memory space. A rewrite where the seed is passed in as a variable fixed this problem. I would suggest that the compiler report a warning if a variable being used in the kernel may not be available to EUs. I would think that possibility would be known at compile time. It isn't wrong if the execution device can find the variable.

I would still like to understand the statement (that I thought was the problem)

h.parallel_for<class xtalk_init>

Is the class xtalk_init pretty much any name I want, it doesn't have to match an actual class? I understand I cannot use the same name for all loops due to link errors, but I am not sure what this is doing. Is there a good book that explains this with examples (vs the specs I am finding on oneAPI website that doesn't explain what happens).

RahulV_intel · ‎02-19-2021

Hi,

That's right. Since the vector is created on the host, it is not accessible on the device. Regarding compiler warning, I can check with the team on this and get back to you.

Regarding kernel naming convention, when using the DPC++ compiler, the kernel name is optional (additional compilation flags not needed). When using Clang or any other compiler, you may pass the compilation flag -fsycl-unnamed-lambda to avoid passing kernel names to SYCL lambdas (As per SYCL specs 1.2.1, kernel name is mandatory but it can be avoided by passing the compilation flag -fsycl-unnamed-lambda).

However, as per the latest SYCL 2020 specs, it looks they have made the kernel name optional.

SYCL kernels are extracted from C++ source files and stored in an implementation-defined format. When the SYCL runtime needs to enqueue a SYCL kernel, it is necessary for the SYCL runtime to load the kernel and pass it to a SYCL backend API. This requires the kernel to have a name that is unique at enclosing namespace scope, to enable an association between the kernel invocation and the kernel itself. The association is achieved using a kernel name, which is a C++ type name.

For a lambda function, the user may optionally provide a name for debugging or other reasons. In SYCL, this optional name is provided as a template parameter to the kernel invocation, e.g. parallel_for, and this name may optionally be forward declared at namespace scope (but must always avoid conflict with another name at enclosing namespace scope).

SYCL provisional specs (2020):

https://www.khronos.org/registry/SYCL/specs/sycl-2020-provisional.pdf (3.10.2)

SYCL specs 2020:

https://www.khronos.org/registry/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf (5.2)

DPC++ book:

https://www.apress.com/us/book/9781484255735

Thanks,

Rahul

RahulV_intel · ‎02-25-2021

Hi,

Could you please try to run your original code (The original code with the error CL_INVALID_ARG_VALUE) on the Level-zero backend and let me know if the error message is more meaningful?

To set the backend to Level-zero:

export SYCL_BE=PI_LEVEL0

If there is no change in the runtime error message, I request you to share your minimal reproducible code.

Thanks,

Rahul

RahulV_intel · ‎03-03-2021

Hi @Mark7,

Just a quick reminder to try running your original code on the level0 backend and let us know if there is any change in the error message.

Thanks,

Rahul

RahulV_intel · ‎03-09-2021

Hi,

I have not heard back from you, so I will go ahead and close this thread from my end. Feel free to post a new query if you require further assistance from Intel.

Thanks,

Rahul