GPU Compute Software
Ask questions about Intel® Graphics Compute software technologies, such as OpenCL* GPU driver and oneAPI Level Zero
238 Discussions

Opencl program stops after executing 2 to the power of 32 times

Wangwenwen
Beginner
7,024 Views

I tried to run a simple OpenCL program on UHD630 GPU and it stopped after a certain number of runs. The log is like this.

---------------------------------------------

[2022-04-25 00:14:23.874] [info] test_kernel_func count: 33294318 errcode 0
[2022-04-25 00:14:23.874] [info] clEnqueueWriteBuffer()
[2022-04-25 00:14:23.874] [info] clSetKernelArg()
[2022-04-25 00:14:23.874] [info] clEnqueueNDRangeKernel()
[2022-04-25 00:14:23.874] [info] clFlush()
[2022-04-25 00:14:23.874] [info] test_kernel_func count: 33294319 errcode 0
[2022-04-25 00:14:23.875] [info] clEnqueueWriteBuffer()
[2022-04-25 00:14:23.875] [info] clSetKernelArg()
[2022-04-25 00:14:23.875] [info] clEnqueueNDRangeKernel()

[2022-04-25 00:14:23.875] [info] clFlush()

[2022-04-25 00:14:23.875] [info] test_kernel_func count: 33294320 errcode 0

---------------------------------------------

The number of executions at stop times the number of kernel operations is 2 to the power of 32

kernel operations is clSetKernelArg and clEnqueueNDRangeKernel.

Looks like there's an Overflow happening somewhere

Is there a maximum limit for Command Queues?

 

Detailed software and hardware information is as follows:

SDK: intel_sdk_for_opencl_applications_2020.0.245

OpenCL Platform: Intel(R) OpenCL, version:OpenCL 2.1

GPU: Intel(R) UHD Graphics 630

Labels (1)
0 Kudos
22 Replies
NoorjahanSk_Intel
Moderator
6,186 Views

Hi,


Thanks for reaching out to us.


Could you please provide us with a sample reproducer and the steps you have followed so that we can try it at our end?


Also please provide the OS details and compiler being used?


Thanks & Regards,

Noorjahan.


0 Kudos
Wangwenwen
Beginner
6,102 Views

I shared the C++ code, I found that when clEnqueueWriteBuffer is called 4,294,967,296 times(2 to the power of 32), Opencl will be stopped, I think this problem may be related to the number of times cl_command_queue is accessed, so I tried to use 2 cl_command_queue alternately, but it did not improve.

 

The OS is Window 10,and i used Visual Studio 2020 Community to compile it.

 

---------------------------------host.cpp------------------------------------------

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>
#include <process.h>

#include <CL/cl.h>

#define KERNEL(...) #__VA_ARGS__

#define MAX_ARG_N 128

LONGLONG count = 0;
static UINT32 param_buffer[MAX_ARG_N];
cl_context context = NULL;
cl_command_queue command_queue[2] = { NULL }, cque;
cl_program program = NULL;
cl_kernel kernel = NULL;
int state = 0;

void monitor_func(LPVOID p)
{
for (int i = 0; i < 5; i++) {
while (10) {
Sleep(1000000);
if (count >= 33294318)
break;
}
cl_uint ref_count;
cl_int ret;


printf("refcnt ");
printf("count %lld ", count);
printf("state %d ", state);

ret = clGetKernelInfo(
kernel,
CL_KERNEL_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("kernel %d ", ref_count);

ret |= clGetContextInfo(
context,
CL_CONTEXT_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("context %d ", ref_count);

ret |= clGetCommandQueueInfo(
command_queue[0],
CL_QUEUE_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("cmd_q %d ", ref_count);

/*
ret |= clGetMemObjectInfo(
cl_src_buffer,
//cl_src_width,
//cl_src_height,
CL_MEM_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("memobj %d ", ref_count);
*/

/*
ret |= clGetProgramInfo(
program,
CL_PROGRAM_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("program %d ", ref_count);
*/

printf(" err %d ", ret);

printf("\n");
}

return;
}

int main(void)
{
cl_int ret;

cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;

cl_mem memObj[MAX_ARG_N];

char* kernelSource = NULL;

LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;

int i;

clGetPlatformIDs(1, &platform_id, NULL);
if (platform_id == NULL)
{
puts("Get OpenCL platform failed!");
goto FINISH;
}

clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
if (device_id == NULL)
{
puts("No GPU available as a compute device!");
goto FINISH;
}

context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);
if (context == NULL)
{
puts("Context not established!");
goto FINISH;
}

command_queue[0] = clCreateCommandQueue(context, device_id, 0, &ret);
if (command_queue[0] == NULL)
{
puts("Command queue cannot be created!");
goto FINISH;
}
command_queue[1] = clCreateCommandQueue(context, device_id, 0, &ret);
if (command_queue[1] == NULL)
{
puts("Command queue cannot be created!");
goto FINISH;
}
cque = command_queue[0];

kernelSource = KERNEL(
__kernel void test(
__global int* arg00, __global int* arg01, __global int* arg02, __global int* arg03,
__global int* arg04, __global int* arg05, __global int* arg06, __global int* arg07,
__global int* arg08, __global int* arg09, __global int* arg0a, __global int* arg0b,
__global int* arg0c, __global int* arg0d, __global int* arg0e, __global int* arg0f,
__global int* arg10, __global int* arg11, __global int* arg12, __global int* arg13,
__global int* arg14, __global int* arg15, __global int* arg16, __global int* arg17,
__global int* arg18, __global int* arg19, __global int* arg1a, __global int* arg1b,
__global int* arg1c, __global int* arg1d, __global int* arg1e, __global int* arg1f,
__global int* arg20, __global int* arg21, __global int* arg22, __global int* arg23,
__global int* arg24, __global int* arg25, __global int* arg26, __global int* arg27,
__global int* arg28, __global int* arg29, __global int* arg2a, __global int* arg2b,
__global int* arg2c, __global int* arg2d, __global int* arg2e, __global int* arg2f,
__global int* arg30, __global int* arg31, __global int* arg32, __global int* arg33,
__global int* arg34, __global int* arg35, __global int* arg36, __global int* arg37,
__global int* arg38, __global int* arg39, __global int* arg3a, __global int* arg3b,
__global int* arg3c, __global int* arg3d, __global int* arg3e, __global int* arg3f,
__global int* arg40, __global int* arg41, __global int* arg42, __global int* arg43,
__global int* arg44, __global int* arg45, __global int* arg46, __global int* arg47,
__global int* arg48, __global int* arg49, __global int* arg4a, __global int* arg4b,
__global int* arg4c, __global int* arg4d, __global int* arg4e, __global int* arg4f,
__global int* arg50, __global int* arg51, __global int* arg52, __global int* arg53,
__global int* arg54, __global int* arg55, __global int* arg56, __global int* arg57,
__global int* arg58, __global int* arg59, __global int* arg5a, __global int* arg5b,
__global int* arg5c, __global int* arg5d, __global int* arg5e, __global int* arg5f,
__global int* arg60, __global int* arg61, __global int* arg62, __global int* arg63,
__global int* arg64, __global int* arg65, __global int* arg66, __global int* arg67,
__global int* arg68, __global int* arg69, __global int* arg6a, __global int* arg6b,
__global int* arg6c, __global int* arg6d, __global int* arg6e, __global int* arg6f,
__global int* arg70, __global int* arg71, __global int* arg72, __global int* arg73,
__global int* arg74, __global int* arg75, __global int* arg76, __global int* arg77,
__global int* arg78, __global int* arg79, __global int* arg7a, __global int* arg7b,
__global int* arg7c, __global int* arg7d, __global int* arg7e, __global int* arg7f
)
{
int index = get_global_id(0);
}

);
size_t kernelLength = { strlen(kernelSource) };

program = clCreateProgramWithSource(context, 1, (const char**)&kernelSource, (const size_t*)&kernelLength, &ret);
ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
if (ret != CL_SUCCESS)
{
size_t len;
char buffer[8 * 2048];

printf("Error: Failed to build program executable!\n");
clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
printf("%s\n", buffer);
goto FINISH;
}

kernel = clCreateKernel(program, "test", &ret);
if (kernel == NULL)
{
puts("Kernel failed to create!");
goto FINISH;
}

for (i = 0; i < MAX_ARG_N; i++)
{
param_buffer[i] = i;
memObj[i] = clCreateBuffer(context, CL_MEM_READ_WRITE, 4, NULL, &ret);
}

if (ret != CL_SUCCESS)
{
puts("Set arguments error!");
goto FINISH;
}

HANDLE hThread;
hThread = (HANDLE)_beginthread(monitor_func, 0, NULL);

QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);

ret = CL_SUCCESS;
while (ret == CL_SUCCESS)
{
state = 1;
for (i = 0; i < MAX_ARG_N; i++)
{
ret |= clEnqueueWriteBuffer(cque, memObj[i], CL_TRUE, 0, 4, &param_buffer[i], 0, NULL, NULL);
}

state = 2;
for (i = 0; i < MAX_ARG_N; i++) {
ret |= clSetKernelArg(kernel, i, sizeof(cl_mem), (void*)&memObj[i]);
}

state = 3;
size_t WorkSize[1] = { 256 };
ret |= clEnqueueNDRangeKernel(cque, kernel, 1, NULL, WorkSize, NULL, 0, NULL, NULL);

state = 4;
ret |= clFlush(cque);

state = 5;
ret |= clFinish(cque);

if (count % 10000 == 0)
{
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("count: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);
}

if (count % 100000 == 0)
{
int index = (count / 100000) % 2;
cque = command_queue[index];
printf("count: %10lld %10lld [us] switch command queue %d\n", count, ElapsedMicroseconds.QuadPart, index);
}
#if 0
if (count % 10000000 == 0)
{
ret |= clReleaseCommandQueue(cque);
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("clReleaseCommandQueue: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);

cque = clCreateCommandQueue(context, device_id, 0, &ret);
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("clCreateCommandQueue: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);
}
#endif

count++;
}

WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);

FINISH:
for (i = 0; i < MAX_ARG_N; i++) {
clReleaseMemObject(memObj[i]);
}

if (kernel != NULL)
clReleaseKernel(kernel);

if (program != NULL)
clReleaseProgram(program);

if (command_queue[0] != NULL)
clReleaseCommandQueue(command_queue[0]);
if (command_queue[1] != NULL)
clReleaseCommandQueue(command_queue[1]);

if (context != NULL)
clReleaseContext(context);

printf("End: Program run End!\n");
system("pause");
return 0;
}

0 Kudos
NoorjahanSk_Intel
Moderator
6,078 Views

Hi,


Thanks for providing the reproducer.


Could you please provide us with a visual studio project as we are not able to build your program at our end?


Thanks & Regards,

Noorjahan.


0 Kudos
Wangwenwen
Beginner
6,030 Views

Hi,

Can you reproduce this issue?

We run the same sample application on CPU without blocking.

As project is blocked by this issue, please let me know when there  are any updates.

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
6,064 Views

Hi.

 

Thanks for your reply.

I shared the visual studio project as attachments.

And i captured the test result and the GUP driver info of test environment.

Thanks & Regards,

0 Kudos
NoorjahanSk_Intel
Moderator
6,021 Views

Hi,


Could please confirm whether your application takes memory that exceeds the device memory?


Thanks & Regards,

Noorjahan.


0 Kudos
Wangwenwen
Beginner
5,989 Views

Hi,

 

Thanks for your reply.

I confirmed the memory taken by my test application through the task manager, it doesn't exceed the device memory.
My test application can run for long time(over 1 week) on intel cpu and nvdia gpu platform,but only about 3hours on intel gpu platform(UHD630).

Based on the above, I think my test program should comply with the OPENCL specification, can you help to try it at your end and confirm whether it complies with the specification of the Intel GPU platform,

Thanks & Regards,

0 Kudos
NoorjahanSk_Intel
Moderator
5,979 Views

Hi,


Thanks for providing the information.


We are working on your issue. we will get back to you soon.


Thanks & Regards,

Noorjahan.


0 Kudos
Wangwenwen
Beginner
5,917 Views

Hi,

 

I want share you the new information.

 

This issue is still happened although we released and initialized OpenCL every  1 million runs.

 

Is there any way to reset OpenCL on intel GPU.

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,879 Views

Hi,

 

Can you tell me whether you reproduced this issue on your side?

We have reproduced this issue on UHD620 UHD630 and Iris Xe

If you have any questions in the process of reproduction, please let me know.

 

Thanks & Regards,

 

0 Kudos
Wangwenwen
Beginner
5,837 Views

Hi,

 

  • Could you tell me whether you reproduced this issue on your side?
  • Is there any way to reset OpenCL on intel GPU.
    We did a lot of experiments to try reset OpenCL
    It seems that OpenCL cannot be reset if the process calling OpenCL is not terminated

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,787 Views

Hi,

 

It has been more than 2 weeks since the last reply.

Is there any progress? Has this issue been reproduced?

If you want any information please let me know.

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,733 Views

Hi,

 

Is there any progress? Has this issue been reproduced?

Because many people have reproduced it. It seems that it is easy to reproduce this issue.

If you want any information please let me know.

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,717 Views

Hi,

 

Is there any progress?

Has this issue been reproduced?

i'm waiting for your opinion.

 

Thanks & Regards,

0 Kudos
Ben_A_Intel
Employee
5,699 Views

Hi, I've been able to reproduce the issue and we're trying to figure out what's happening.  Thanks!

0 Kudos
Wangwenwen
Beginner
5,683 Views

Hi,

 

Thanks for your reply.

Is there any way to reset OpenCL on intel GPU?

As project blocked by this issue for long time, Please let me know if there are any process. 

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,617 Views

Hi,

 

Is there any workaround to solve it?

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,660 Views

Hi,

 

Sorry for disturb.

Is there any any process? 

 

Thanks & Regards,

0 Kudos
Wangwenwen
Beginner
5,567 Views

Hi,

 

It has been more than a month since the last update,Is there any any process? 

 

Thanks & Regards,

0 Kudos
Anita_Intel
Employee
5,519 Views

Dear User,


I have been running your application for past three days, after increasing my stack size from 16KB to 32KB on Intel(R) Iris(R) Xe Graphics card on Windows.


You may increase the stack size in Visual studio by adding "/analyze:stacksize 32768" to Configuration Properties > C/C++ > Command Line property page.


Please try this and let us know your results.


Thanks,

Anita


0 Kudos
Reply