Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16558 Discussions

device_handler_thread_main: Assertion `args && args->func_name' failed

Altera_Forum
Honored Contributor II
1,559 Views

I use mutilthread in my opencl code. When run in emulator mode, code run successfully. But when run in FPGA, and excute to "clWaitForEvents()" function, it abort and show: 

 

"cnn: acl_hal_mmd.c:218: device_handler_thread_main: Assertion `args && args->func_name' failed" . 

 

Anybody know why? Thank you!
0 Kudos
9 Replies
Altera_Forum
Honored Contributor II
628 Views

Does FPGA Opencl supports use of mutil thread and mutil queue simultaneously? My code is as this. 

 

when I create one thread in host, can run cuccessfully, but after I create the second thread, it also shows earlier error: 

 

“cnn: acl_hal_mmd.c:218: device_handler_thread_main: Assertion `args && args->func_name' failed”
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

I don't think multiple threads can access the same FPGA simultaneously. Altera's implementation seems to use some locking mechanism that when one thread/process accesses a device, it is locked to prevent other threads/processes from accessing it. You can use multiple queues with one thread/process to run multiple kernels on the same FPGA simultaneously (as long as all kernels were in the same .cl file); you don't need multiple threads in the host code to achieve this.

0 Kudos
Altera_Forum
Honored Contributor II
628 Views

 

--- Quote Start ---  

I don't think multiple threads can access the same FPGA simultaneously. Altera's implementation seems to use some locking mechanism that when one thread/process accesses a device, it is locked to prevent other threads/processes from accessing it. You can use multiple queues with one thread/process to run multiple kernels on the same FPGA simultaneously (as long as all kernels were in the same .cl file); you don't need multiple threads in the host code to achieve this. 

--- Quote End ---  

 

 

 

I have used multiple queues in my code. But when I profile my kernel, I find FPGA is unoccupied about about 4ms between multiple kernel execution, and my kernel execute time is about 0.5ms. So I think if use multiple thread in the host can solve this problem? 

 

I reference Altera “multithread_vector_operation” sample code to use multiple thread. But I run the code in my FPGA, it also shows the same error. 

 

https://alteraforum.com/forum/attachment.php?attachmentid=13634&stc=1
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

I had never seen this multithreaded example before, it must be new. Based on this example, it should be possible to access the same device from multiple threads, as long as all of them have the same OpenCL context and use the same binary, and the FPGA reconfiguration happens before the threads are created. Still, Altera's own example can be perfectly implemented using one thread with two queues running in parallel, and there is no necessity to implement it like this. 

 

Have you followed the same flow as Altera's example for your own code? The clWaitForEvents() call is not waiting for an event from the other thread, is it? 

 

Regarding your profiling output, what is happening when that write_data kernel is not running (the gaps in the time chart)? Have you made sure that the flow of waiting for events in your code is correct? Maybe the write_kernel is waiting for some event that it shouldn't wait for and that is why those gaps are happening.
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

 

--- Quote Start ---  

I had never seen this multithreaded example before, it must be new. Based on this example, it should be possible to access the same device from multiple threads, as long as all of them have the same OpenCL context and use the same binary, and the FPGA reconfiguration happens before the threads are created. Still, Altera's own example can be perfectly implemented using one thread with two queues running in parallel, and there is no necessity to implement it like this. 

 

Have you followed the same flow as Altera's example for your own code? The clWaitForEvents() call is not waiting for an event from the other thread, is it? 

 

Regarding your profiling output, what is happening when that write_data kernel is not running (the gaps in the time chart)? Have you made sure that the flow of waiting for events in your code is correct? Maybe the write_kernel is waiting for some event that it shouldn't wait for and that is why those gaps are happening. 

--- Quote End ---  

 

 

I have run the multithread sample code in my FPGA, it also failed and shows "cnn: acl_hal_mmd.c:218: device_handler_thread_main: Assertion `args && args->func_name' failed." Do you know why? 

 

I have removed the waiting events when kernel exexute, as "status = clEnqueueTask(queues[K_CONV], kernels[K_CONV], 0, NULL, NULL);", After I changed it, the gap time reduced to about 2ms. How to eliminate the gap? 

 

https://www.alteraforum.com/forum/attachment.php?attachmentid=13636  

 

Thank you very much!
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

If Altera's example also fails on your device, the problem is somewhere else. The design readme says that it only works with Quartus/AOC v16.1 and a compatible BSP. From what I remember, you are using older versions and that is probably why it doesn't work for you. 

 

It is hard to tell what is happening during that time gap without seeing the code. You should take this fact into consideration that if your kernel is very short, the overhead of going back to the host after kernel execution finishes, and relaunching it, might be higher than the kernel execution itself.
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

 

--- Quote Start ---  

If Altera's example also fails on your device, the problem is somewhere else. The design readme says that it only works with Quartus/AOC v16.1 and a compatible BSP. From what I remember, you are using older versions and that is probably why it doesn't work for you. 

 

It is hard to tell what is happening during that time gap without seeing the code. You should take this fact into consideration that if your kernel is very short, the overhead of going back to the host after kernel execution finishes, and relaunching it, might be higher than the kernel execution itself. 

--- Quote End ---  

 

 

Thanks for your reply!  

 

After I remove --profile and recompile my kernel, the gaps seems disappear.  

 

Yes, I also find the problem of multithread, my quartus version is not same as the readme says. Do you know anywhere have multithread sample I can use?
0 Kudos
Altera_Forum
Honored Contributor II
628 Views

No, I don't. Maybe older versions of Quartus do not support multi-threaded host code at all.

0 Kudos
Altera_Forum
Honored Contributor II
628 Views

 

--- Quote Start ---  

No, I don't. Maybe older versions of Quartus do not support multi-threaded host code at all. 

--- Quote End ---  

 

 

Maybe, thank you!
0 Kudos
Reply