Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Multiply and accumulate support

Intel_C_Intel
Employee
610 Views
Hey, I was wondering what support the IPP had for multiply and accumulate operations. I know that WirelessMMX/MMX have the instructions WMADD and PMADDWD for taking 4 16-bit numbers multiplying them and adding them into an accumulator. But, I don't see any corresponding functions in the IPP... I end up an ippsMul and an ippsSum which seems kind of wasteful since there is hardware support for doing the multiply and accumulate.

Is there support for this that I'm just not seeing or is ippsMul and ippsSum the best that I can hope for? I'm assuming that the ippsMul and ippsSum functions use MMX instructions when they can so I'm not sure why an accumulator function isn't available. Im using both the desktop and PCA version of the IPP so my options are limited to functions which are supported in both (but, both should have MAC support).

http://www.intel.com/cd/ids/developer/asmo-na/eng/dc/pca/knowledgebase/168863.htm
Has some hints on you could do it if you wanted to code it up yourself but I would prefer to stick with the IPP functions so I dont have to deal with having multiple versions of the same code that I have to test myself.

Cheers
-Jonathan
0 Kudos
4 Replies
Vladimir_Dudnik
Employee
610 Views

Hello Jonathan,

IPP use multply-add instructions internally when it can provide additional optimization.

Regards,
Vladimir

0 Kudos
Vladimir_Dudnik
Employee
610 Views

By the way, did you see ippsDotProd and ippsAddProductC functions? Their combination can provide you functionality you are looking for

Regards,
Vladimir

0 Kudos
Intel_C_Intel
Employee
610 Views
Hey Vladimir,

I found the ippsDotProd functions (didn't see the ippsAddProductC functions though).

I'm limited to using functions on both the PCA/ia32 version of the IPP, and unfortunately the PCA version only has ippsDotProd_16s available. The problem with this function is that it has an Ipp16s accumulator which is too small for my application.

It looks like the only way to do a multiply-accumulate/dot product would be to write a for loop with the appropriate wirelessMMX/MMX intrinsics (which should give me at 64-bit accumulator). This isnt too hard to implement; but, I was hoping there was some IPP function that did this I was just overlooking. Also it might turn out that an ippsMul and ippsSum are fast enough.

Thanks
-Jonathan
0 Kudos
Vladimir_Dudnik
Employee
610 Views

IPP for PCA has only subset functionality in comparison with IPP for IA, so some functions can be missed.

Vladimir

0 Kudos
Reply