Intel® FPGA University Program
University Program Material, Education Boards, and Laboratory Exercises
1180 Discussions

Intel FPGA SDK for OpenCL: Issue while launching same opencl kernel multiple times

chandrasekhar92
Beginner
990 Views

Hi,

 

I am using Intel FPGA SDK for OpenCL to perform matrix multiplication on DE1-SoC board. As per my requirement I have to perform this multiplication multiple times and hence iterating over a loop to enqueue the kernel. The first kernel successfully completes however the second kernel stuck in CL_RUNNING state indefinitely. I tried simplifying my code to narrow down the problem and removed all computatons from the kernel as below-

 

__kernel void multiplication()
{

//Empty kernel
}

instead of loop I am equeueing my kernel 2 times as below-


cl_int err;



size_t global_work_size[] = {static_cast<size_t>(1)};
size_t local_work_size[] = {static_cast<size_t>(1)};

cl_event kernel_event1;

// Enqueue the kernel for execution
std::cout << "started enqueue" << std::endl;
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, &kernel_event1);

if (err != CL_SUCCESS)

{

    std::cout << "Failed to enqueue"<< std::endl;

}
else{

    std::cout << "Done enqueue"<< std::endl;

}

err = clWaitForEvents(1, &kernel_event1);
if (err != CL_SUCCESS) {
    std::cerr << "Error waiting for kernel event." << std::endl;
}else{
   std::cout << "done executing the kernel" << std::endl;
   clReleaseEvent(kernel_event1);
}

 

//Second kernel execution

cl_event kernel_event2;

// Enqueue the kernel for execution
std::cout << "started enqueue" << std::endl;
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, global_work_size, local_work_size, 0, NULL, &kernel_event2);

if (err != CL_SUCCESS)

{

    std::cout << "Failed to enqueue"<< std::endl;

}
else{

    std::cout << "Done enqueue"<< std::endl;

}

 

cl_int event_status;
clGetEventInfo(kernel_event2, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(event_status), &event_status, NULL);
if(event_status == CL_QUEUED){
    printf("Kernel is queued.\n");
}else if(event_status == CL_SUBMITTED){
    printf("Kernel is submitted.\n");
}else if(event_status == CL_RUNNING){
    printf("Kernel is running.\n");
}else if(event_status == CL_COMPLETE){
   printf("Kernel has completed.\n");
}else{
   printf("Unknown status.\n");
}

err = clWaitForEvents(1, &kernel_event2);
if (err != CL_SUCCESS) {
    std::cerr << "Error waiting for kernel event." << std::endl;
}else{
   std::cout << "done executing the kernel" << std::endl;
   clReleaseEvent(kernel_event2);
}

 

In the above simplified code, my kernel is not performing any computations and just trying to launch same kernel second time after successful completion of first one. During the execution, I can see the debug statements printed that first execution is completed and second execution is successfully enqueued but status gets printed as CL_RUNNING and waits indefinitely at clWaitForEvents and not even returning an error message for the wait status. 

I'd highly appreciate if someone assist me to understand this issue.

 

Thank you.

Labels (1)
0 Kudos
1 Solution
BoonBengT_Intel
Moderator
768 Views

Hi @chandrasekhar92,


Noted that there seems to be a custom BSP involved. Is it correct to say there is no references design you are referring to and sample code are based on self written?


After going through the situation, using clWaitForEvents to synchronize kernels in separate queues are working from previous experiences. Would recommend to try clFinish() which has similar purposes to see if that works.


Also for the mention situation, it seem that the desire execution flow are out of order, hence below links would also explain that:

- https://www.intel.cn/content/www/cn/zh/developer/articles/technical/opencl-out-of-order-queue-on-intel-processor-graphics.html


Note: link mention are for example with GPU as the hardware, however the execution concept are the same.


Best Wishes

BB


View solution in original post

0 Kudos
6 Replies
BoonBengT_Intel
Moderator
905 Views

Hi @chandrasekhar92,


Thank you for posting in Intel community forum, hope all is well and apologies for the delayed in response.

As Intel OpenCL has been deprecated, hence we would try our best to support on the mention issues.


Is there any references design that you are referring to? Also can you share what are the compilation/build command used for us to check further?

Hope to hear from you soon.


Best Wishes

BB


0 Kudos
BoonBengT_Intel
Moderator
873 Views

Hi @chandrasekhar92,


Good day, just following up on the previous clarification.

By any chances did you managed to look into it?

Hope to hear from you soon.


Best Wishes

BB



0 Kudos
chandrasekhar92
Beginner
853 Views

Hi,

 

Thank you for your time.

 

I made modifications to the BSP qsys file to add components for PIOs and detect button presses or sliding switches interrupts. When I went back and tried with original BSP, Kernel enqueuing and completion is successful. But when I use the modified qsys, i am facing the above issue. I am trying to figure out if my base addresses of these registers are overlapping with OpenCL kernel. 

 

 

Thank you.

0 Kudos
BoonBengT_Intel
Moderator
769 Views

Hi @chandrasekhar92,


Noted that there seems to be a custom BSP involved. Is it correct to say there is no references design you are referring to and sample code are based on self written?


After going through the situation, using clWaitForEvents to synchronize kernels in separate queues are working from previous experiences. Would recommend to try clFinish() which has similar purposes to see if that works.


Also for the mention situation, it seem that the desire execution flow are out of order, hence below links would also explain that:

- https://www.intel.cn/content/www/cn/zh/developer/articles/technical/opencl-out-of-order-queue-on-intel-processor-graphics.html


Note: link mention are for example with GPU as the hardware, however the execution concept are the same.


Best Wishes

BB


0 Kudos
chandrasekhar92
Beginner
732 Views

Hi,

 

Thank you for your time.

 

I am able to solve the issue after making changes to hardware. The BSP that I made changes is the DE1-SoC BSP provided with OpenCL SDK. I am using GPIO pins to connect with D5M camera which uses same pin from fpga kernel clk. It might be the reason that is interfering with OpenCL kernel queuing. Once I separated them, and I used clFinish() as suggested by you, and able to run and enqueue kernel multiple times.

 

Appreciate your help.

0 Kudos
BoonBengT_Intel
Moderator
706 Views

Hi @chandrasekhar92,


Great! Good to know that it is working now and thanks for sharing your steps here with others, with no further clarification on this thread, it will be transitioned to community support for further help on doubts in this thread. Please login to ‘https://supporttickets.intel.com’, view details of the desire request, and post a feed/response within the next 15 days to allow me to continue to support you. After 15 days, this thread will be transitioned to community support.

Thank you for the questions and as always pleasure having you here.


Best Wishes

BB


p/s: If any answer from the community or Intel Support are helpful, please feel free to give best answer or rate 4/5 survey.



0 Kudos
Reply