- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to run a simple OpenCL program on UHD630 GPU and it stopped after a certain number of runs. The log is like this.
---------------------------------------------
[2022-04-25 00:14:23.874] [info] test_kernel_func count: 33294318 errcode 0
[2022-04-25 00:14:23.874] [info] clEnqueueWriteBuffer()
[2022-04-25 00:14:23.874] [info] clSetKernelArg()
[2022-04-25 00:14:23.874] [info] clEnqueueNDRangeKernel()
[2022-04-25 00:14:23.874] [info] clFlush()
[2022-04-25 00:14:23.874] [info] test_kernel_func count: 33294319 errcode 0
[2022-04-25 00:14:23.875] [info] clEnqueueWriteBuffer()
[2022-04-25 00:14:23.875] [info] clSetKernelArg()
[2022-04-25 00:14:23.875] [info] clEnqueueNDRangeKernel()
[2022-04-25 00:14:23.875] [info] clFlush()
[2022-04-25 00:14:23.875] [info] test_kernel_func count: 33294320 errcode 0
---------------------------------------------
The number of executions at stop times the number of kernel operations is 2 to the power of 32
kernel operations is clSetKernelArg and clEnqueueNDRangeKernel.
Looks like there's an Overflow happening somewhere
Is there a maximum limit for Command Queues?
Detailed software and hardware information is as follows:
SDK: intel_sdk_for_opencl_applications_2020.0.245
OpenCL Platform: Intel(R) OpenCL, version:OpenCL 2.1
GPU: Intel(R) UHD Graphics 630
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for reaching out to us.
Could you please provide us with a sample reproducer and the steps you have followed so that we can try it at our end?
Also please provide the OS details and compiler being used?
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I shared the C++ code, I found that when clEnqueueWriteBuffer is called 4,294,967,296 times(2 to the power of 32), Opencl will be stopped, I think this problem may be related to the number of times cl_command_queue is accessed, so I tried to use 2 cl_command_queue alternately, but it did not improve.
The OS is Window 10,and i used Visual Studio 2020 Community to compile it.
---------------------------------host.cpp------------------------------------------
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <windows.h>
#include <process.h>
#include <CL/cl.h>
#define KERNEL(...) #__VA_ARGS__
#define MAX_ARG_N 128
LONGLONG count = 0;
static UINT32 param_buffer[MAX_ARG_N];
cl_context context = NULL;
cl_command_queue command_queue[2] = { NULL }, cque;
cl_program program = NULL;
cl_kernel kernel = NULL;
int state = 0;
void monitor_func(LPVOID p)
{
for (int i = 0; i < 5; i++) {
while (10) {
Sleep(1000000);
if (count >= 33294318)
break;
}
cl_uint ref_count;
cl_int ret;
printf("refcnt ");
printf("count %lld ", count);
printf("state %d ", state);
ret = clGetKernelInfo(
kernel,
CL_KERNEL_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("kernel %d ", ref_count);
ret |= clGetContextInfo(
context,
CL_CONTEXT_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("context %d ", ref_count);
ret |= clGetCommandQueueInfo(
command_queue[0],
CL_QUEUE_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("cmd_q %d ", ref_count);
/*
ret |= clGetMemObjectInfo(
cl_src_buffer,
//cl_src_width,
//cl_src_height,
CL_MEM_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("memobj %d ", ref_count);
*/
/*
ret |= clGetProgramInfo(
program,
CL_PROGRAM_REFERENCE_COUNT,
sizeof(ref_count),
&ref_count,
NULL);
printf("program %d ", ref_count);
*/
printf(" err %d ", ret);
printf("\n");
}
return;
}
int main(void)
{
cl_int ret;
cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;
cl_mem memObj[MAX_ARG_N];
char* kernelSource = NULL;
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
int i;
clGetPlatformIDs(1, &platform_id, NULL);
if (platform_id == NULL)
{
puts("Get OpenCL platform failed!");
goto FINISH;
}
clGetDeviceIDs(platform_id, CL_DEVICE_TYPE_GPU, 1, &device_id, NULL);
if (device_id == NULL)
{
puts("No GPU available as a compute device!");
goto FINISH;
}
context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &ret);
if (context == NULL)
{
puts("Context not established!");
goto FINISH;
}
command_queue[0] = clCreateCommandQueue(context, device_id, 0, &ret);
if (command_queue[0] == NULL)
{
puts("Command queue cannot be created!");
goto FINISH;
}
command_queue[1] = clCreateCommandQueue(context, device_id, 0, &ret);
if (command_queue[1] == NULL)
{
puts("Command queue cannot be created!");
goto FINISH;
}
cque = command_queue[0];
kernelSource = KERNEL(
__kernel void test(
__global int* arg00, __global int* arg01, __global int* arg02, __global int* arg03,
__global int* arg04, __global int* arg05, __global int* arg06, __global int* arg07,
__global int* arg08, __global int* arg09, __global int* arg0a, __global int* arg0b,
__global int* arg0c, __global int* arg0d, __global int* arg0e, __global int* arg0f,
__global int* arg10, __global int* arg11, __global int* arg12, __global int* arg13,
__global int* arg14, __global int* arg15, __global int* arg16, __global int* arg17,
__global int* arg18, __global int* arg19, __global int* arg1a, __global int* arg1b,
__global int* arg1c, __global int* arg1d, __global int* arg1e, __global int* arg1f,
__global int* arg20, __global int* arg21, __global int* arg22, __global int* arg23,
__global int* arg24, __global int* arg25, __global int* arg26, __global int* arg27,
__global int* arg28, __global int* arg29, __global int* arg2a, __global int* arg2b,
__global int* arg2c, __global int* arg2d, __global int* arg2e, __global int* arg2f,
__global int* arg30, __global int* arg31, __global int* arg32, __global int* arg33,
__global int* arg34, __global int* arg35, __global int* arg36, __global int* arg37,
__global int* arg38, __global int* arg39, __global int* arg3a, __global int* arg3b,
__global int* arg3c, __global int* arg3d, __global int* arg3e, __global int* arg3f,
__global int* arg40, __global int* arg41, __global int* arg42, __global int* arg43,
__global int* arg44, __global int* arg45, __global int* arg46, __global int* arg47,
__global int* arg48, __global int* arg49, __global int* arg4a, __global int* arg4b,
__global int* arg4c, __global int* arg4d, __global int* arg4e, __global int* arg4f,
__global int* arg50, __global int* arg51, __global int* arg52, __global int* arg53,
__global int* arg54, __global int* arg55, __global int* arg56, __global int* arg57,
__global int* arg58, __global int* arg59, __global int* arg5a, __global int* arg5b,
__global int* arg5c, __global int* arg5d, __global int* arg5e, __global int* arg5f,
__global int* arg60, __global int* arg61, __global int* arg62, __global int* arg63,
__global int* arg64, __global int* arg65, __global int* arg66, __global int* arg67,
__global int* arg68, __global int* arg69, __global int* arg6a, __global int* arg6b,
__global int* arg6c, __global int* arg6d, __global int* arg6e, __global int* arg6f,
__global int* arg70, __global int* arg71, __global int* arg72, __global int* arg73,
__global int* arg74, __global int* arg75, __global int* arg76, __global int* arg77,
__global int* arg78, __global int* arg79, __global int* arg7a, __global int* arg7b,
__global int* arg7c, __global int* arg7d, __global int* arg7e, __global int* arg7f
)
{
int index = get_global_id(0);
}
);
size_t kernelLength = { strlen(kernelSource) };
program = clCreateProgramWithSource(context, 1, (const char**)&kernelSource, (const size_t*)&kernelLength, &ret);
ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
if (ret != CL_SUCCESS)
{
size_t len;
char buffer[8 * 2048];
printf("Error: Failed to build program executable!\n");
clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
printf("%s\n", buffer);
goto FINISH;
}
kernel = clCreateKernel(program, "test", &ret);
if (kernel == NULL)
{
puts("Kernel failed to create!");
goto FINISH;
}
for (i = 0; i < MAX_ARG_N; i++)
{
param_buffer[i] = i;
memObj[i] = clCreateBuffer(context, CL_MEM_READ_WRITE, 4, NULL, &ret);
}
if (ret != CL_SUCCESS)
{
puts("Set arguments error!");
goto FINISH;
}
HANDLE hThread;
hThread = (HANDLE)_beginthread(monitor_func, 0, NULL);
QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);
ret = CL_SUCCESS;
while (ret == CL_SUCCESS)
{
state = 1;
for (i = 0; i < MAX_ARG_N; i++)
{
ret |= clEnqueueWriteBuffer(cque, memObj[i], CL_TRUE, 0, 4, ¶m_buffer[i], 0, NULL, NULL);
}
state = 2;
for (i = 0; i < MAX_ARG_N; i++) {
ret |= clSetKernelArg(kernel, i, sizeof(cl_mem), (void*)&memObj[i]);
}
state = 3;
size_t WorkSize[1] = { 256 };
ret |= clEnqueueNDRangeKernel(cque, kernel, 1, NULL, WorkSize, NULL, 0, NULL, NULL);
state = 4;
ret |= clFlush(cque);
state = 5;
ret |= clFinish(cque);
if (count % 10000 == 0)
{
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("count: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);
}
if (count % 100000 == 0)
{
int index = (count / 100000) % 2;
cque = command_queue[index];
printf("count: %10lld %10lld [us] switch command queue %d\n", count, ElapsedMicroseconds.QuadPart, index);
}
#if 0
if (count % 10000000 == 0)
{
ret |= clReleaseCommandQueue(cque);
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("clReleaseCommandQueue: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);
cque = clCreateCommandQueue(context, device_id, 0, &ret);
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
printf("clCreateCommandQueue: %10lld %10lld [us]\n", count, ElapsedMicroseconds.QuadPart);
}
#endif
count++;
}
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
FINISH:
for (i = 0; i < MAX_ARG_N; i++) {
clReleaseMemObject(memObj[i]);
}
if (kernel != NULL)
clReleaseKernel(kernel);
if (program != NULL)
clReleaseProgram(program);
if (command_queue[0] != NULL)
clReleaseCommandQueue(command_queue[0]);
if (command_queue[1] != NULL)
clReleaseCommandQueue(command_queue[1]);
if (context != NULL)
clReleaseContext(context);
printf("End: Program run End!\n");
system("pause");
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the reproducer.
Could you please provide us with a visual studio project as we are not able to build your program at our end?
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you reproduce this issue?
We run the same sample application on CPU without blocking.
As project is blocked by this issue, please let me know when there are any updates.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Could please confirm whether your application takes memory that exceeds the device memory?
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your reply.
I confirmed the memory taken by my test application through the task manager, it doesn't exceed the device memory.
My test application can run for long time(over 1 week) on intel cpu and nvdia gpu platform,but only about 3hours on intel gpu platform(UHD630).
Based on the above, I think my test program should comply with the OPENCL specification, can you help to try it at your end and confirm whether it complies with the specification of the Intel GPU platform,
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for providing the information.
We are working on your issue. we will get back to you soon.
Thanks & Regards,
Noorjahan.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I want share you the new information.
This issue is still happened although we released and initialized OpenCL every 1 million runs.
Is there any way to reset OpenCL on intel GPU.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Can you tell me whether you reproduced this issue on your side?
We have reproduced this issue on UHD620 UHD630 and Iris Xe
If you have any questions in the process of reproduction, please let me know.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
- Could you tell me whether you reproduced this issue on your side?
- Is there any way to reset OpenCL on intel GPU.
We did a lot of experiments to try reset OpenCL
It seems that OpenCL cannot be reset if the process calling OpenCL is not terminated
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It has been more than 2 weeks since the last reply.
Is there any progress? Has this issue been reproduced?
If you want any information please let me know.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is there any progress? Has this issue been reproduced?
Because many people have reproduced it. It seems that it is easy to reproduce this issue.
If you want any information please let me know.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is there any progress?
Has this issue been reproduced?
i'm waiting for your opinion.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I've been able to reproduce the issue and we're trying to figure out what's happening. Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for your reply.
Is there any way to reset OpenCL on intel GPU?
As project blocked by this issue for long time, Please let me know if there are any process.
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is there any workaround to solve it?
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Sorry for disturb.
Is there any any process?
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
It has been more than a month since the last update,Is there any any process?
Thanks & Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear User,
I have been running your application for past three days, after increasing my stack size from 16KB to 32KB on Intel(R) Iris(R) Xe Graphics card on Windows.
You may increase the stack size in Visual studio by adding "/analyze:stacksize 32768" to Configuration Properties > C/C++ > Command Line property page.
Please try this and let us know your results.
Thanks,
Anita
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page