Work-group scheduling in FPGA pipeline

GRodr25 · ‎09-28-2019

Hello, I have a question on work-group scheduling on multiple CUs in FPGA. As work-groups are assigned to available CUs, when is a CU considered available? Is it when the last work-item of the previous work-group has abandoned the pipeline or when it is at its second stage (so every stage of the pipeline is occupied by the previous work-group except the first one) ?

MEIYAN_L_Intel · ‎10-01-2019

Hi,

According to the document https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/hb/opencl-sdk/aocl-best-practices-guide.pdf, the compute unit is available for work-group assignments as long as it has not reached its full capacity. Each work-group will map to compute unit if there are enough compute units. If there are not enough compute units, OpenCL will give each work-group to this compute unit one by one on a serial fashion.

Thanks

GRodr25 · ‎10-01-2019

So I understand from your answer that the first work-item enters the pipeline when its first stage is free, right (although the rest of them are occupied by work-items from another work-group)?

Thank you.

MEIYAN_L_Intel · ‎10-02-2019

Hi,

Yes, when the first stage in the pipeline is free, the first work-item of the next scheduled work-group will enter even though the rest of the pipeline stages are occupied by work-items from a different workgroup.

For you information, according to the document mentioned above: "the compute unit is available for work-group assignments as long as it has not reached its full capacity", the term “capacity” refers to the maximum number of threads or work-items that can execute in the CU. OpenCL HTML reports provide capacity numbers in the basic blocks (i.e. loops) of ND-range kernels to give a better picture of how work groups might be scheduled. Essentially, as soon as a basic block has enough capacity, the next available work-group and its work-items will be scheduled for that basic block. So, it’s somewhat like the second scenario you described. If the CU is simple enough though and there isn’t enough capacity for multiple work-groups to run simultaneously, then it will be like the first situation you mentioned.

Thanks