Community
cancel
Showing results for 
Search instead for 
Did you mean: 
JButt5
Novice
649 Views

Getting a lot of "HAL Kern Error: Read failed from addr x, read y expected z" errors – what to do?

Jump to solution

I compiled an application that works flawlessly on a GPU based system for a Cyclone V SoC today. It contains 3 CL Kernels, two of them are single-work-item kernels, one is an ND-Range kernel. Two command queues are used to launch them, sometimes in parallel. The kernels compile successfully after waiting approx 1 1/2 hours, however, my application doesn't produce usable results and I can see a lot of prints like that:

Kernel launch requested when kernel not idle on accelerator 0 kernel physical id = 0 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 Kernel launch requested when kernel not idle on accelerator 0 kernel physical id = 0 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 CL_INVALID_WORK_GROUP_SIZE HAL Kern Error: Write failed to addr 1200 with value 0, wrote -1233682192 expected 4 HAL Kern Error: Write failed to addr 1200 with value 0, wrote -1233682192 expected 4 HAL Kern Error: Write failed to addr 1200 with value 0, wrote -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Write failed to addr 1200 with value 0, wrote -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Write failed to addr 1200 with value 0, wrote -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 HAL Kern Error: Read failed from addr 1200, read -1233682192 expected 4 Kernel launch requested when kernel not idle on accelerator 0 kernel physical id = 0

Note the CL_INVALID_WORK_GROUP_SIZE error in between – this one is a bit surprising, as I launched a totally unoptimized version without any required work group size for the ND-range kernel specified (re-compiling with reqd_work_group_size right now, however as mentioned above this might take some time to complete)

 

When quitting the application, I furthermore get a segmentation fault, the remote GDB says:

Thread 6 "NameOfMyApplication" received signal SIGSEGV, Segmentation fault. [Switching to Thread 1021.1034] 0xb66d2648 in acl_kernel_if_read () from target:/root/opencl_arm32_rte/host/arm32/lib/libalteracl.so   Stacktrace: acl_kernel_if_read 0x00000000b66d2648 talk_to_hal 0x00000000b66d4b3c device_handler_thread_main 0x00000000b66d4d2a start_thread 0x00000000b5d1e3b4 <unknown> 0x00000000b5c39e18

I have no idea what is happening, especially due to the closed source nature of your SDK it's difficult to understand what goes on under the hood and those error prints are not really helpful for me. Can you shed some light on what might be happening here?

 

Im on the 18.1 SDK

Tags (1)
0 Kudos
1 Solution
MEIYAN_L_Intel
Employee
372 Views

Hi,

 

It seem like the problem is from the custom BSP.

You may need to refer to the custom platfrom user guide with the link as below:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/ug_aocl_custo...

Thanks

 

Thanks

 

View solution in original post

14 Replies
HRZ
Valued Contributor II
372 Views

Are you using a custom BSP as mentioned in your other threads in the forum? If that is the case, look for the problem in your custom BSP. The errors you are getting are HAL or "Hardware Abstraction Layer" errors; it is extremely unlikely they are coming from some issue in the OpenCL compiler and they are very likely caused by an issue in the BSP or the OpenCL driver/runtime.

JButt5
Novice
372 Views

Yes, I'm using my custom BSP. It is completely possible that the error is located there, however some more knowledge of how the runtime works under the hood and which circumstances trigger error prints like that would be very helpful to narrow down the possible sources of the error. For me, powerful debugging tools and detailed in-depth information how everything works and how to create proper custom BSPs and debug them for real world projects that go beyond the scope of those simple example projects that are presented with the SDK are what I miss most with the OpenCL SDK

MEIYAN_L_Intel
Employee
373 Views

Hi,

 

It seem like the problem is from the custom BSP.

You may need to refer to the custom platfrom user guide with the link as below:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/ug_aocl_custo...

Thanks

 

Thanks

 

View solution in original post

JButt5
Novice
372 Views

Thank you, this looks like a great resource to get some in-depth knowledge I had not found before. All my previous BSP knowledge was based mostly on studying the implementation of existing platforms and some papers roughly explaining a workflow. However I'd still be thankful in case someone maybe some of the runtime developers are round here and could say what kind of error leads to messages like that 😊

MEIYAN_L_Intel
Employee
372 Views

Hi,

I had found another custom platform example that may useful to you with the link below:

https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an780.pdf

Thanks

TSchw11
Beginner
372 Views

Hi, I'm continuing the Project of JButt5 and could reproduce the errors even without the custom BSP. However the errors stopped occurring, if I temporarily disabled one of the threads, so it seems to be threading related.

 

I created a clean reproducible example, and asked a new question here: https://forums.intel.com/s/question/0D50P00004cY66wSAC/getting-hal-kern-error-readwrite-failed-from-...

MEIYAN_L_Intel
Employee
372 Views

Hi,

From your question, I saw you are using the design example "multithread vector" example, am I right?

May I know other example like hello_world and vector_add can be compile successfully?

Are you building the BSP in other version and porting the BSP to 18.1 version?

Could you try to looking into the version for MMD?

Thanks

TSchw11
Beginner
372 Views

Hi,

thanks for reopening the question.

 

>From your question, I saw you are using the design example "multithread vector" example, am I right?

Yes, I can reproduce the error using "multithread vector" example.

 

>May I know other example like hello_world and vector_add can be compile successfully?

Yes, the hello_world example runs successfully, and this bigger project also runs successfully if I only enable one thread.

I'm currently at home, but I can test the vector_add example tomorrow.

 

>Are you building the BSP in other version and porting the BSP to 18.1 version?

I'm currently using Intel FPGA SDK for OpenCL standard edition Version 19.1.

I'm reproducing the error using the BSP that is offered by the Development board vendor:

Download page: https://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English&CategoryNo=167&No=1081&PartNo=4#...

Direct link: http://download.terasic.com/downloads/cd-rom/de10-standard/DE10-Standard_OpenCL_18.0_BSP.tar.gz

That BSP is meant for version 18.0. Is this a problem?

 

>Could you try to looking into the version for MMD?

How do I check this?

If I remember correctly, executing `aocl version` on the board shows version 18.0, but I'll check again tomorrow.

 

Thanks for your help!

 

jackgreen
Novice
204 Views

Hi, I am experiencing the same bug on de10 nano as well. Could you let me know if you have resolved it? Thanks!

MEIYAN_L_Intel
Employee
372 Views

Hi,

 

From "executing `aocl version` on the board shows version 18.0, but I'll check again tomorrow", it seems like you are using the driver for v18.0, could you try to check on this?

 

Thanks

Best Regards,

Mei Yan

TSchw11
Beginner
372 Views

Hi,

 

the vector_add example also runs just fine.

 

Output of aocl version:

root@socfpga:~# . ./init_opencl.sh root@socfpga:~# aocl version aocl 18.0.0.614 (Intel(R) FPGA Runtime Environment for OpenCL(TM), Version 18.0) root@socfpga:~#

Thanks and best regards!

 

 

TSchw11
Beginner
372 Views

To ensure, that the error is not caused by a version incompatibility, I just compiled both the host application and the aocx file (of the opencl_multithread_vector_operation example) using the 18.0 version of the Intel SDK. That version is matching the version of the supplied BSP.

 

The error still occurs, so it doesn't seem to be a version incompatibility.

MEIYAN_L_Intel
Employee
372 Views

Hi,

 

Could you try to check the MMD version info by using command "aocl_mmd_get_info" and provide me the results?

 

Thanks

TSchw11
Beginner
372 Views

Hi,

 

it seems very likely, that this is the problem.

 

This

aocl_mmd_get_offline_info( AOCL_MMD_VERSION, sizeof(mmd_version), mmd_version, &param_size_ret );

gave me version 14.1, so it seems the terrasic BSP ships with a very old mmd library.

 

Can I compile a newer MMD library from sources included in the Intel FPGA SDK for OpenCL or can I simply copy some ".so" file?

Edit: I just saw, that section 1.4.3. of ug_aocl_custom_platform_toolkit.pdf describes, how to create the mmd library, I'll try that next.

Reply