Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Mark7
Employee
169 Views

What calls produce CL_INVALID_ARG_VALUE and are there examples of how to fix?

Newbie problem. I have a oneAPI program that is intended to run on the graphics unit of an intel TGL.  The base code is openCL based and has successfully run under python.  I have also been able to run a simple multiply add with openCL, but I am attempting to port over these more complicated kernels to run under openCL.  When I run my program on the CPU core, I get no run time errors.  However, when running on a TGL system, I get the following errors which I don't know how to resolve:

 Running on Intel(R) Gen12LP HD Graphics NEO
Native API failed. Native API returns: -50 (CL_INVALID_ARG_VALUE) -50 (CL_INVALID_ARG_VALUE)

I found this post:

https://community.intel.com/t5/Intel-oneAPI-Base-Toolkit/dpct-s-result-of-dot-does-not-run-successfu...

 

So I sourced the setvars.sh

   source /opt/intel/oneapi/setvars.sh

   ...

  :: oneAPI environment initialized ::

But I get the same error.  I am thinking it is the calls, but these calls are the same as the ones I am using for my simple multiply add.

Here is the calls into the parallel program:

int ComputeExecuteXtalk_init(Compute *vector) {
try {

std::cout<<"Running on "<<vector->q->get_device().get_info<sycl::info::device::name>()<<"\n";

const float A(0.00);

vector->q->submit([&](sycl::handler& h) {

auto baseAddress = vector->d_W->get_access<sycl::access::mode::read_write>(h);
sycl::stream out(1000000, 256, h);
int debug = vector->debug;

// code copied from molten and addapted to oneAPI calls
// as #define executeWritePhaseXTalk(iteration, baseAddress, subKernelInfo, errorHandlerInfo)
// in file xTalk.h
h.parallel_for<class xtalk_init>( sycl::range<1>{vector->threads}, [=] (sycl::id<1> it) {
const int threadId = it[0];
int debugcount=0;
if(threadId == 0){

....

What statements produce CL_INVALID_ARG_VALUE and what should I be looking for?

0 Kudos
5 Replies
RahulV_intel
Moderator
124 Views

Hi,


Thanks for reporting this issue. Could you please attach your minimal reproducible code (compilable), so that I can try it out at my end?


Regards,

Rahul


Mark7
Employee
117 Views

I turns out that a fellow employee took my code and isolated to statements within my kernel that were referencing pointers off from the processor core and not accessible from my device.  (I am learning a bit more about parallel programming on graphics devices - the program does work on a core only system).

One example statement that was causing a problem was: 

unsigned long randomNumber = vector->seed;

vector was being passed into the launching of my kernel but is only known from the core memory space.  A rewrite where the seed is passed in as a variable fixed this problem.  I would suggest that the compiler report a warning if a variable being used in the kernel may not be available to EUs.  I would think that possibility would be known at compile time.  It isn't wrong if the execution device can find the variable.

I would still like to understand the statement (that I thought was the problem)

h.parallel_for<class xtalk_init>

Is the class xtalk_init pretty much any name I want, it doesn't have to match an actual class?  I understand I cannot use the same name for all loops due to link errors, but I am not sure what this is doing.  Is there a good book that explains this with examples (vs the specs I am finding on oneAPI website that doesn't explain what happens).

RahulV_intel
Moderator
98 Views

Hi,


That's right. Since the vector is created on the host, it is not accessible on the device. Regarding compiler warning, I can check with the team on this and get back to you.


Regarding kernel naming convention, when using the DPC++ compiler, the kernel name is optional (additional compilation flags not needed). When using Clang or any other compiler, you may pass the compilation flag -fsycl-unnamed-lambda to avoid passing kernel names to SYCL lambdas (As per SYCL specs 1.2.1, kernel name is mandatory but it can be avoided by passing the compilation flag -fsycl-unnamed-lambda).


However, as per the latest SYCL 2020 specs, it looks they have made the kernel name optional.


SYCL kernels are extracted from C++ source files and stored in an implementation-defined format. When the SYCL runtime needs to enqueue a SYCL kernel, it is necessary for the SYCL runtime to load the kernel and pass it to a SYCL backend API. This requires the kernel to have a name that is unique at enclosing namespace scope, to enable an association between the kernel invocation and the kernel itself. The association is achieved using a kernel name, which is a C++ type name.


For a lambda function, the user may optionally provide a name for debugging or other reasons. In SYCL, this optional name is provided as a template parameter to the kernel invocation, e.g. parallel_for, and this name may optionally be forward declared at namespace scope (but must always avoid conflict with another name at enclosing namespace scope). 


SYCL provisional specs (2020):

https://www.khronos.org/registry/SYCL/specs/sycl-2020-provisional.pdf (3.10.2)


SYCL specs 2020:

https://www.khronos.org/registry/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf (5.2)


DPC++ book:

https://www.apress.com/us/book/9781484255735


Thanks,

Rahul


RahulV_intel
Moderator
70 Views

Hi,


Could you please try to run your original code (The original code with the error CL_INVALID_ARG_VALUE) on the Level-zero backend and let me know if the error message is more meaningful?


To set the backend to Level-zero:

export SYCL_BE=PI_LEVEL0


If there is no change in the runtime error message, I request you to share your minimal reproducible code.


Thanks,

Rahul


RahulV_intel
Moderator
22 Views

Hi @Mark7,

 

Just a quick reminder to try running your original code on the level0 backend and let us know if there is any change in the error message.

 

Thanks,

Rahul