We are porting the OpenCL platform to a custom Cyclone V board. Have successfully compiled the OpenCL framework into an FPGA binary (RBF) and is currently loaded on the system. The CMA modules are built into the Linux kernel and the OpenCL host driver module has been recompiled and loaded into the system. DTS changes from the FPGA design are in the process of being merged into the Linux build.‘aocl diagnose’ returns successful and simple OpenCL program that don’t involve memory transactions complete but we are experiencing issues with programs that copy memory buffers. Simple OpenCL examples like a vector addition that copy a memory buffer never reach clFinish … they hang. These simple programs will execute on C5SOC platform so we are looking into our design. Right now we are comparing the C5SOC FPGA design with the custom implementation and looking through Linux host driver code. Was wondering if there was a place in particular that we should look into regarding memory buffer transfer issues in OpenCL. Any insight you can give is greatly appreciated. Thanks, Chad Hewitt root@avid-cyclone5:~# aocl diagnose aocl diagnose: Running diagnostic from /home/root/opencl_arm32_rte/board/avid_alpha/arm32/bin Verified that the kernel mode driver is installed on the host machine. Using platform: Altera SDK for OpenCL Board vendor name: Altera Corporation Board name: avid_alpha : Cyclone V SoC Development Kit Buffer read/write test passed. diagnostic_passed root@avid-cyclone5:~# root@avid-cyclone5:~# ./hello_world Compiled by Randy - 7/22/2015 10:00 AM Querying platform for info: ========================== CL_PLATFORM_NAME = Altera SDK for OpenCL CL_PLATFORM_VENDOR = Altera Corporation CL_PLATFORM_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0 Querying device for info: ======================== CL_DEVICE_NAME = avid_alpha : Cyclone V SoC Development Kit CL_DEVICE_VENDOR = Altera Corporation CL_DEVICE_VENDOR_ID = 4466 CL_DEVICE_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0 CL_DRIVER_VERSION = 15.0 CL_DEVICE_ADDRESS_BITS = 64 CL_DEVICE_AVAILABLE = true CL_DEVICE_ENDIAN_LITTLE = true CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768 CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0 CL_DEVICE_GLOBAL_MEM_SIZE = 536870912 CL_DEVICE_IMAGE_SUPPORT = false CL_DEVICE_LOCAL_MEM_SIZE = 16384 CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000 CL_DEVICE_MAX_COMPUTE_UNITS = 1 CL_DEVICE_MAX_CONSTANT_ARGS = 8 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 134217728 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 8192 CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024 CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4 CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2 CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1 CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0 Command queue out of order? = false Command queue profiling enabled? = true Using AOCX: hello_world.aocx Reprogramming device with handle 1 Kernel initialization is complete. Launching the kernel... Thread# 0: Hello from Altera's OpenCL Compiler! Compiled by Randy - 7/22/2015 10:00 AM kernel execution is complete. root@avid-cyclone5:~# ./vector_add Compiled by Randy - 7/30/2015 10:00 AM Initializing OpenCL Platform: Altera SDK for OpenCL Using 1 device(s) avid_alpha : Cyclone V SoC Development Kit Using AOCX: vector_add.aocx Reprogramming device with handle 1 Launching for device 0 (1000000 elements) waiting for the queue. (never completes)
For me it was a matter of taking top.rbf of opencl compile and loading that in uboot. If we loaded a different top.rbf file the mismatch took down the fpga2sdram bridge. That was root cause and source of failure. Hope that helps.
--- Quote Start --- what is the kernel version that you are using ? did you try down grade version to ensure nothing to do with the kernel version? --- Quote End --- Linux kernel version? 3.13.0. Quartus Version? 14.1 I'm not sure what you're asking here. We are testing our board using the diagnostic tool provided with the reference board in Quartus 14.1. Unmodified, this tool writes to all available memory and then checks the result. To facilitate debugging, I've modified it so that it only writes to the first 2048 bytes, and then reads the result back. What I'm seeing in signal tap, is that all of the data is being written into memory, but the IRQ is never being raised by the DMA engine to signal that the Host->FPGA transfer completed. This hangs the application. If we scope the provided design on a DE5Net board, the IRQ is raised. So the real question for us is: Why isn't the DMA engine raising the IRQ? What sort of issues would cause this behavior?
Hi, I have got the same issue in linux kernel 4.1.22 or newer. In new linux kernel , the request_irq (PIO_IRQ, aclsoc_irq, irq_type, DRIVER_NAME, (void*)aclsoc) function in aoclsoc driver can't get the hardware irq No.72. There is a solution that is porting the aoclsoc driver to platform driver and getting the hardware irq number form device tree. Here is the new cyclone soc opencl rte for linux 4.9.78.