Double Buffering on Intel FPGA?

RJin1 · ‎01-30-2019

Hi all,

I read the paper "Best-Effort FPGA Programming: A Few Steps Can Go a Long Way". They used HLS on Xilinx devices as example. Besides the normal optimization, I found the double buffering is interesting:

void aes(...) { ... }

void load(...) { ... }

void store(...) { ... }

void compute(...) { ... }

void kernel(char *data, int size) {

char buf_data[3][PE_NUM][PE_BATCH];

#pragma HLS array_partition var=buf_data complete dim=1

#pragma HLS array_partition var=buf_data cyclic=PE_NUM dim=2

for (int i=0; i < size/BATCH_SIZE; i++) {

switch (i % 3) {

case 0:

load(buf_data[0], data+i*BATCH_SIZE);

compute(buf_data[1]);

store(data+i*BATCH_SIZE, buf_data[2]);

break;

case 1:

load(buf_data[1], data+i*BATCH_SIZE);

compute(buf_data[2]);

store(data+i*BATCH_SIZE, buf_data[0]);

break;

case 2:

load(buf_data[2], data+i*BATCH_SIZE);

compute(buf_data[0]);

store(data+i*BATCH_SIZE, buf_data[1]);

break;

}

Can the Intel compiler successfully imply this pipeline? I tried this on compiler version 16 but it seems that the throughput improvement is very limited.

MuhammadAr_U_Intel · ‎02-01-2019

Hi, There is continuous improvements on HLS software with every release. I would suggest using the latest version of HLS compiler 18.1 to see what optimization/ pipelining are done by software. Thanks, Arslan