Re: c5soc memory allocation and use

Altera_Forum · ‎04-15-2015

As I understand it, in the Cyclone V Soc, the FPGA and HPS share a memory. It seems like this should make it possible for kernels to directly access variables used in the host program without needing the overhead of copying back and forth between host and device memory spaces. I assumed this would be accomplished by using OpenCL's buffer allocation flags such as USE_HOST_PTR and/or ALLOC_HOST_PTR, but in my experiments so far, these flags have not given any performance improvement (and in one case caused performance to slow). Also, I see that the example designs do not use these flags. What is the most efficient way the people have found to get data to and from the kernels on this platform?

Altera_Forum · ‎05-05-2015

Hi matt.weber,

HPS and FPGA do have shared memory. The best way to access it is to allocate OpenCL buffer using clCreateBuffer() command with CL_MEM_ALLOC_HOST_PTR flag. To access this buffer in the kernel, simply pass returned cl_mem object as a kernel argument. To access it from the host, first map it into user pointer with clEnqueueMapBuffer(). The (virtual) pointer returned by clEnqueueMapBuffer() will point to the same physical location as cl_mem object returned by clCreateBuffer(). See "Allocating Shared Memory for OpenCL Kernels Targeting SoCs" section in Altera SDK for OpenCLProgramming Guide.

Note that Cyclone V SoC (and Arria V SoC) have shared *physical* memory but not shared *virtual* memory. This means that the FPGA core has access to the same DDR controller as the HPS but FPGA cannot access HPS's page tables that map virtual pointers to physical. clCreateBuffer() with ALLOC_HOST_PTR allocates physically contiguous memory and stores its physical address inside opaque cl_mem object. clEnqueueMapBuffer() on such cl_mem object simply converts the physical pointer into virtual pointer using some Linux driver magic.

Altera_Forum · ‎07-26-2017

I wanted to follow up on this (hopefully someone might see it...)

We have a SGDMA engine driving data (from a camera) into DDR, that we'd then like to feed to an OpenCL kernel. We are currently programming the SGDMA engine in user space, but even if we wrote a driver we don't have the correct mapping to hand it. Right now, because the OpenCL API does not provide a physical address pointer, we have to copy data SGDMA'd from one buffer to an allocated buffer from OpenCL. This seems extremely wasteful and defeats the purpose of using OpenCL. How do recommend we accomplish this?

Altera_Forum · ‎07-26-2017

The soc driver has a call to return physical address for a virtual one (but only for virtual addresses that are start of shared opencl buffers). The function you're looking for is aclsoc_exec_cmd, with ACLSOC_CMD_GET_PHYS_FROM_VIRT command. You call this function by doing a read in your user space program with special parameters. See shared_mem_alloc in acl_mmd_device.cpp that came with the board. I hope you will figure out how to call the read from your program.

--- Quote Start ---

I wanted to follow up on this (hopefully someone might see it...)

We have a SGDMA engine driving data (from a camera) into DDR, that we'd then like to feed to an OpenCL kernel. We are currently programming the SGDMA engine in user space, but even if we wrote a driver we don't have the correct mapping to hand it. Right now, because the OpenCL API does not provide a physical address pointer, we have to copy data SGDMA'd from one buffer to an allocated buffer from OpenCL. This seems extremely wasteful and defeats the purpose of using OpenCL. How do recommend we accomplish this?

--- Quote End ---