Application Acceleration With FPGAs
Programmable Acceleration Cards (PACs), DCP, FPGA AI Suite, Software Stack, and Reference Designs
504 讨论

Eltwise_mult with broadcasting on FPGA

ThanhN
初学者
2,481 次查看

Hi, 

 

I am using FPGA AI suite 2025.1 & OpenVINO 2024.6.0 with DE10-Agilex dev kit. In ".arch" files provided , such as AGX7_Performance.arch, I see "enable_eltwise_mult : true". But it seems not supporting broadcasting.

 

What I want is to perform an element-wise multiplication between two tensors of shapes [1, 1, H, W] and [1, C, H, W], resulting in an output of shape [1, C, H, W], and have this operation executed on the FPGA.

 

I'm wondering if there's a way to do this, or if I'm missing something. I'd appreciate any help

 

Bests.

标签 (1)
0 项奖励
1 解答
JohnT_Intel
员工
2,126 次查看

Hi,


May I know if you have any other queries?


在原帖中查看解决方案

0 项奖励
7 回复数
JohnT_Intel
员工
2,389 次查看

Hi,


May I know what do you mean by "enable_eltwise_mult : true" is not working? May I know if you have the FPGA AI Suite license to performed customized bitstream?


0 项奖励
ThanhN
初学者
2,363 次查看

Hi @JohnT_Intel 

 

Some AGX7_**.arch files are included with the FPGA AI Suite 2025.1. In some of these .arch files, the option "enable_eltwise_mult: true" is defined, while in others it is not.

When "enable_eltwise_mult" is set to "true", the FPGA supports element-wise multiplication for tensors with matching shapes, such as [1, 1, H, W] × [1, 1, H, W]. However, it does not support broadcasting—for example, multiplying tensors of shapes [1, 1, H, W] × [1, channel, H, W] is not supported on the FPGA. This is what I meant by "enable_eltwise_mult : true" is not working.

As for the FPGA AI Suite license, I believe it refers to the Quartus Pro license required for building custom bitstreams (.sof files). I do not have a Quartus Pro 2025 license, but I do have a license for Quartus Pro 2021, which I believe was provided by Terasic with the DE10-Agilex development kit.

 

Thanks.

0 项奖励
JohnT_Intel
员工
2,342 次查看

Hi,


The enable_eltwise_mult is used for MobilenetV3 network architecture. May I know what type of the Broadcast network architecture are you using? If there is a different on the implementation then it will not support broadcasting.


0 项奖励
ThanhN
初学者
2,336 次查看

Hi,

 

I’m working with a customized model, and one of the steps involves applying a mask with shape [1, 1, H, W] to crossing 96 channels of a feature-level tensor of shape [1, 96, H, W], which is the output of a convolution layer. In PyTorch, this can be done using torch.mul(tensor1, tensor2) where broadcasting handles the channel mismatch and performs element-wise multiplication.

 

Is there a way to make this operation work directly on an FPGA? I want the entire model computation to be executed on the FPGA without involving the CPU. It seems to me that the "enable_eltwise_mult: true" not perform broadcasting that handles the channel mismatch. I may be wrong.

 

Thanks!

0 项奖励
JohnT_Intel
员工
2,292 次查看

Hi,


You will need to customize the FPGA AI suite design in order for it to support this features.


You will need to contact local sales team (https://www.altera.com/contact.html#4257225834-4085749461)


Thanks


0 项奖励
JohnT_Intel
员工
2,127 次查看

Hi,


May I know if you have any other queries?


0 项奖励
ThanhN
初学者
2,101 次查看
Thanks, Join.

That should be all.
0 项奖励
回复