Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.
15379 Discussions

OpenCL memory hang on custom platform

Altera_Forum
Honored Contributor II
1,034 Views

We are porting the OpenCL platform to a custom Cyclone V board. Have successfully compiled the OpenCL framework into an FPGA binary (RBF) and is currently loaded on the system. The CMA modules are built into the Linux kernel and the OpenCL host driver module has been recompiled and loaded into the system. DTS changes from the FPGA design are in the process of being merged into the Linux build. 

 

‘aocl diagnose’ returns successful and simple OpenCL program that don’t involve memory transactions complete but we are experiencing issues with programs that copy memory buffers. Simple OpenCL examples like a vector addition that copy a memory buffer never reach clFinish … they hang. These simple programs will execute on C5SOC platform so we are looking into our design. 

 

Right now we are comparing the C5SOC FPGA design with the custom implementation and looking through Linux host driver code. Was wondering if there was a place in particular that we should look into regarding memory buffer transfer issues in OpenCL. Any insight you can give is greatly appreciated. 

 

Thanks, 

 

Chad Hewitt 

 

 

root@avid-cyclone5:~# aocl diagnose 

aocl diagnose: Running diagnostic from /home/root/opencl_arm32_rte/board/avid_alpha/arm32/bin 

 

Verified that the kernel mode driver is installed on the host machine. 

 

Using platform: Altera SDK for OpenCL 

Board vendor name: Altera Corporation 

Board name: avid_alpha : Cyclone V SoC Development Kit 

 

Buffer read/write test passed. 

 

diagnostic_passed 

root@avid-cyclone5:~# 

 

root@avid-cyclone5:~# ./hello_world 

 

Compiled by Randy - 7/22/2015 10:00 AM 

Querying platform for info: 

========================== 

CL_PLATFORM_NAME = Altera SDK for OpenCL 

CL_PLATFORM_VENDOR = Altera Corporation 

CL_PLATFORM_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0 

 

Querying device for info: 

======================== 

CL_DEVICE_NAME = avid_alpha : Cyclone V SoC Development Kit 

CL_DEVICE_VENDOR = Altera Corporation 

CL_DEVICE_VENDOR_ID = 4466 

CL_DEVICE_VERSION = OpenCL 1.0 Altera SDK for OpenCL, Version 15.0 

CL_DRIVER_VERSION = 15.0 

CL_DEVICE_ADDRESS_BITS = 64 

CL_DEVICE_AVAILABLE = true 

CL_DEVICE_ENDIAN_LITTLE = true 

CL_DEVICE_GLOBAL_MEM_CACHE_SIZE = 32768 

CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 0 

CL_DEVICE_GLOBAL_MEM_SIZE = 536870912 

CL_DEVICE_IMAGE_SUPPORT = false 

CL_DEVICE_LOCAL_MEM_SIZE = 16384 

CL_DEVICE_MAX_CLOCK_FREQUENCY = 1000 

CL_DEVICE_MAX_COMPUTE_UNITS = 1 

CL_DEVICE_MAX_CONSTANT_ARGS = 8 

CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE = 134217728 

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 3 

CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS = 8192 

CL_DEVICE_MIN_DATA_TYPE_ALIGN_SIZE = 1024 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_CHAR = 4 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_SHORT = 2 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT = 1 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_LONG = 1 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_FLOAT = 1 

CL_DEVICE_PREFERRED_VECTOR_WIDTH_DOUBLE = 0 

Command queue out of order? = false 

Command queue profiling enabled? = true 

Using AOCX: hello_world.aocx 

Reprogramming device with handle 1 

 

Kernel initialization is complete. 

Launching the kernel... 

 

Thread# 0: Hello from Altera's OpenCL Compiler! 

Compiled by Randy - 7/22/2015 10:00 AM 

 

kernel execution is complete. 

 

root@avid-cyclone5:~# ./vector_add 

 

Compiled by Randy - 7/30/2015 10:00 AM 

Initializing OpenCL 

Platform: Altera SDK for OpenCL 

Using 1 device(s) 

avid_alpha : Cyclone V SoC Development Kit 

Using AOCX: vector_add.aocx 

Reprogramming device with handle 1 

Launching for device 0 (1000000 elements) 

waiting for the queue. (never completes) 

 

0 Kudos
6 Replies
Altera_Forum
Honored Contributor II
179 Views

Hi Chad, did you ever figure out your problem? I'm seeing similar behavior in one of our custom boards

Altera_Forum
Honored Contributor II
179 Views

For me it was a matter of taking top.rbf of opencl compile and loading that in uboot. If we loaded a different top.rbf file the mismatch took down the fpga2sdram bridge. That was root cause and source of failure. Hope that helps.

Altera_Forum
Honored Contributor II
179 Views

what is the kernel version that you are using ? did you try down grade version to ensure nothing to do with the kernel version?

Altera_Forum
Honored Contributor II
179 Views

 

--- Quote Start ---  

what is the kernel version that you are using ? did you try down grade version to ensure nothing to do with the kernel version? 

--- Quote End ---  

 

 

 

Linux kernel version? 3.13.0. Quartus Version? 14.1 

 

I'm not sure what you're asking here.  

 

We are testing our board using the diagnostic tool provided with the reference board in Quartus 14.1. Unmodified, this tool writes to all available memory and then checks the result. To facilitate debugging, I've modified it so that it only writes to the first 2048 bytes, and then reads the result back. 

 

What I'm seeing in signal tap, is that all of the data is being written into memory, but the IRQ is never being raised by the DMA engine to signal that the Host->FPGA transfer completed. This hangs the application. If we scope the provided design on a DE5Net board, the IRQ is raised. 

 

So the real question for us is: Why isn't the DMA engine raising the IRQ? What sort of issues would cause this behavior?
Altera_Forum
Honored Contributor II
179 Views

Hi ,  

I am much intrested to understand how did you reduce the size of the buffer to 2048 ? 

Thanks, 

Rnivartx
WKnat
Beginner
179 Views

Hi, I have got the same issue in linux kernel 4.1.22 or newer. In new linux kernel , the request_irq (PIO_IRQ, aclsoc_irq, irq_type, DRIVER_NAME, (void*)aclsoc) function in aoclsoc driver can't get the hardware irq No.72. There is a solution that is porting the aoclsoc driver to platform driver and getting the hardware irq number form device tree. Here is the new cyclone soc opencl rte for linux 4.9.78.

Reply