Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
Announcements
Intel Support hours are Monday-Fridays, 8am-5pm PST, except Holidays. Thanks to our community members who provide support during our down time or before we get to your questions. We appreciate you!

Need Forum Guidance? Click here
Search our FPGA Knowledge Articles here.

Release resources

Altera_Forum
Honored Contributor II
922 Views

Hi, 

 

I'd like to understand release and acquisition of resources by the Altera OpenCL platform more clearly. I'm trying to be very careful to release all acquired resources (command queues, kernels, programs, and contexts) whenever I don't need them anymore. Nonetheless I get CL_OUT_OF_RESOURCES and CL_OUT_OF_HOST_MEMORY when I use more than about five or six kernels in my program. These kernels are not used concurrently. The number of kernels in use at any moment in time is much smaller and they work in isolation from one another. 

 

So my question is whether the runtime releases resources as soon as their ref count reaches zero? Or is cleanup done lazily at a later time? Is there a way to force the runtime to cleanup all resources that have been released (i.e. they have a ref count of 0)? Or does the behavior I'm seeing indicate that I have missed to call clReleaseXXX on some resources? 

 

Thanks in advance for any comments on this. 

Cheers, 

Domini
0 Kudos
5 Replies
Altera_Forum
Honored Contributor II
125 Views

I'm not sure if resources are freed instantly or in lazy fashion. One thing to keep in mind is that on the host there is a shadow buffer that gets allocated so you might be running out of memory. Aside from the memory you allocate on the host side this shadow buffer is created so that there is space allocated up on the host to store/restore buffers from the FPGA in case the hardware needs to get swapped out. So if you create a 512MB block of data for the kernel to operate on you'll take up 512MB on the target, 512MB that was allocated by the host program, and 512MB on the host reserved for the shadow buffer. 

 

It's possible that there could be a trip through the host code that misses the release calls so it might be worth putting some breakpoints in to ensure you are not allocating resources that are not eventually freed later. Since the SDK adheres to the 1.0 standard sometimes the error codes are not very descriptive of the actual problem. Which calls are you seeing the out of resources/host memory error codes returns?
Altera_Forum
Honored Contributor II
125 Views

I believe I´m facing a similar problem. 

 

The difference is that I have only two kernels. And they are called inside a for loop. 

 

I´m freeing each vector after each use in the loop. However, it seems to have an upper limit of how much memory can I allocate in total. 

 

for(int j = 0; j < REPEAT ; j++) { ... ... ... void *rand_input = NULL; posix_memalign(&rand_input, AOCL_ALIGNMENT,sizeof(float)*SIZE); memcpy(rand_input, output, sizeof(float)*SIZE); cl_mem input_buffer = clCreateBuffer(my_context, CL_MEM_READ_WRITE | CL_MEM_COPY_HOST_PTR, sizeof(float) * SIZE, rand_input, &status); free(rand_input); free(output); .... .... } 

 

This is the error message I get when launching the kernel. 

 

Context callback: Could not allocate a buffer of the specified size due to fragmentation or exhaustion 

Context callback: Could not map host buffers to device
Altera_Forum
Honored Contributor II
125 Views

How host pointers are treated is implementation-dependent; usually the buffer is just fully copied to the device memory at runtime. You probably also need to release the "input_buffer" every time.

Altera_Forum
Honored Contributor II
125 Views

 

--- Quote Start ---  

How host pointers are treated is implementation-dependent; usually the buffer is just fully copied to the device memory at runtime. You probably also need to release the "input_buffer" every time. 

--- Quote End ---  

 

 

I included a free(input_buffer) at the end of the loop. 

 

It seems it has worked with a big crush test.
Altera_Forum
Honored Contributor II
125 Views

 

--- Quote Start ---  

I included a free(input_buffer) at the end of the loop. 

 

It seems it has worked with a big crush test. 

--- Quote End ---  

 

 

Great. It is probably best to release OpenCL buffers using the built-in clReleaseMemObject() function, though.
Reply