Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Altera_Forum
Honored Contributor I
1,678 Views

Utilize M10k Memory block

Hi, 

 

I have compiled my kernel and i know only uses 16% of the M10k memory. Is there way i can use more of it?  

 

Also, i view my report, it says "loop sacrificed fmax to achieve II to 1." But in the optimisation report, it was fine, it says "Pipelined well. Successive iterations are launched every cycle." 

Is there way i can avoid that? I wanted to maximize its performance (increase fmax). 

 

Is 121Mhz for cyclone V is good enough? Not much data for me to benchmark
0 Kudos
9 Replies
Altera_Forum
Honored Contributor I
89 Views

 

--- Quote Start ---  

I have compiled my kernel and i know only uses 16% of the M10k memory. Is there way i can use more of it?  

--- Quote End ---  

 

The compiler will use as much resources as required; trying to use more could actually reduce performance since it complicates routing and reduces operating frequency. 

M10k blocks are generally used for implementing large local memory buffers and FIFOs. The more local memory you use, the higher the M10k utilization will become. 

 

 

--- Quote Start ---  

Also, i view my report, it says "loop sacrificed fmax to achieve II to 1." But in the optimisation report, it was fine, it says "Pipelined well. Successive iterations are launched every cycle." 

Is there way i can avoid that? I wanted to maximize its performance (increase fmax). 

--- Quote End ---  

 

You probably have a loop-carried dependency (feedback) somewhere in your code, forcing the compiler to create a large critical path to achieve an II of one, at the cost of lowered operating frequency. 

 

 

--- Quote Start ---  

Is 121Mhz for cyclone V is good enough? Not much data for me to benchmark 

--- Quote End ---  

 

For Cyclone V, 120 MHz is not too low, but it is also far from high. It is generally not hard to achieve 170-180 MHz on this device. You can try compiling Altera's reference OpenCL examples to see what operating frequency they achieve.
Altera_Forum
Honored Contributor I
89 Views

Thanks @HRZ , i thoughts using more m10k memory block could reduce the ALuTs and FF usage.

Altera_Forum
Honored Contributor I
89 Views

I am afraid that is not usually the case.

Altera_Forum
Honored Contributor I
89 Views

@HRZ I'm sorry but where can I view the FPGA SDram usage? Theres 64mb SDram on de1soc.

Altera_Forum
Honored Contributor I
89 Views

 

--- Quote Start ---  

@HRZ I'm sorry but where can I view the FPGA SDram usage? Theres 64mb SDram on de1soc. 

--- Quote End ---  

 

 

The BSP of that board does not support the SDRAM memory and hence, it cannot be used with OpenCL.
Altera_Forum
Honored Contributor I
89 Views

Alright, guess i try shared memory method to decrease the required data bandwidth. Thanks HRZ

Altera_Forum
Honored Contributor I
89 Views

Hi HRZ,The programming guide quote "You cannot use the library function malloc or the operator new to allocate physically shared memory.". So, can i realloc the buffer? The thing is i want to resize the output buffer. 

 

data_buffer = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, buffer_size, NULL, &status); //initiate // clEnqueueTask // some process x = clEnqueueMapBuffer(queue,data_buffer,CL_TRUE,CL_MAP_WRITE | CL_MAP_READ,0,size*sizeof(float),0,NULL,NULL,&status); //read output if(something) x = realloc(x, newsize*sizeof(float) //realloc new size to the output buffer else //take back old size
Altera_Forum
Honored Contributor I
89 Views

Alloc in clCreateBuffer() from start to maximal size of all possible variants. 

On host program buffers allocated with malloc() you may realloc() no problem.
Altera_Forum
Honored Contributor I
89 Views

No idea why, but my kernel freezes indefinitely during execution. 

 

input/output declared as shared buffer between host/fpga.  

 

input -> kernel1 -> channel -> kernel 2 -> output. 

 

data_input = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, buffer_size, NULL, &status); data_output = clCreateBuffer(context, CL_MEM_ALLOC_HOST_PTR, buffer_size, NULL, &status); b = (float *) clEnqueueMapBuffer(queue_kernel1,data_input,CL_TRUE,CL_MAP_READ,0,size*sizeof(float),0,NULL,NULL,&status); // put data in b clEnqueueTask (kernel 1) clEnqueueTask (kernel 2) // take data out out = (float *)clEnqueueMapBuffer(queue_kernel2,data_output,CL_TRUE,CL_MAP_READ,0,size*sizeof(float),0,NULL,NULL,&status);  

 

__kernel1(input){ //load input store in buffer //send to channel writeintelchannel(something_ch,buffer); } __kernel2(output){ //load data from channel //some add/mul output = out; }
Reply