已解决: Re:Eltwise_mult with broadcasting on FPGA

ThanhN · ‎05-21-2025

Hi,

I am using FPGA AI suite 2025.1 & OpenVINO 2024.6.0 with DE10-Agilex dev kit. In ".arch" files provided , such as AGX7_Performance.arch, I see "enable_eltwise_mult : true". But it seems not supporting broadcasting.

What I want is to perform an element-wise multiplication between two tensors of shapes [1, 1, H, W] and [1, C, H, W], resulting in an output of shape [1, C, H, W], and have this operation executed on the FPGA.

I'm wondering if there's a way to do this, or if I'm missing something. I'd appreciate any help

Bests.

JohnT_Intel · ‎06-04-2025

Hi,

May I know if you have any other queries?

在原帖中查看解决方案

JohnT_Intel · ‎05-26-2025

Hi,

May I know what do you mean by "enable_eltwise_mult : true" is not working? May I know if you have the FPGA AI Suite license to performed customized bitstream?

ThanhN · ‎05-26-2025

Hi @JohnT_Intel

Some AGX7_**.arch files are included with the FPGA AI Suite 2025.1. In some of these .arch files, the option "enable_eltwise_mult: true" is defined, while in others it is not.

When "enable_eltwise_mult" is set to "true", the FPGA supports element-wise multiplication for tensors with matching shapes, such as [1, 1, H, W] × [1, 1, H, W]. However, it does not support broadcasting—for example, multiplying tensors of shapes [1, 1, H, W] × [1, channel, H, W] is not supported on the FPGA. This is what I meant by "enable_eltwise_mult : true" is not working.

As for the FPGA AI Suite license, I believe it refers to the Quartus Pro license required for building custom bitstreams (.sof files). I do not have a Quartus Pro 2025 license, but I do have a license for Quartus Pro 2021, which I believe was provided by Terasic with the DE10-Agilex development kit.

Thanks.

JohnT_Intel · ‎05-26-2025

Hi,

The enable_eltwise_mult is used for MobilenetV3 network architecture. May I know what type of the Broadcast network architecture are you using? If there is a different on the implementation then it will not support broadcasting.

ThanhN · ‎05-26-2025

Hi,

I’m working with a customized model, and one of the steps involves applying a mask with shape [1, 1, H, W] to crossing 96 channels of a feature-level tensor of shape [1, 96, H, W], which is the output of a convolution layer. In PyTorch, this can be done using torch.mul(tensor1, tensor2) where broadcasting handles the channel mismatch and performs element-wise multiplication.

Is there a way to make this operation work directly on an FPGA? I want the entire model computation to be executed on the FPGA without involving the CPU. It seems to me that the "enable_eltwise_mult: true" not perform broadcasting that handles the channel mismatch. I may be wrong.

Thanks!

JohnT_Intel · ‎05-28-2025

Hi,

You will need to customize the FPGA AI suite design in order for it to support this features.

You will need to contact local sales team (https://www.altera.com/contact.html#4257225834-4085749461)

Thanks

JohnT_Intel · ‎06-04-2025

Hi,

May I know if you have any other queries?

ThanhN · ‎06-04-2025

Thanks, Join.

That should be all.

Eltwise_mult with broadcasting on FPGA

FPGA Interface Manager (FIM)

The enable_eltwise_mult is used for MobilenetV3 network architecture. May I know what type of the Broadcast network architecture are you using? If there is a different on the implementation then it will not support broadcasting.