ippiScaleC_32s32f_C1R is slower than simple loop

andreypir · ‎07-24-2020

Attached is a simple console project that shows that when scaling a matrix ippiScaleC_32s32f_C1R is slower than a simple equivalent C++ code loop. In this example a column from the source matrix is scaled into an output vector. On my PC with i7-7700K the C++ loop is about 20% faster.

Is there any way to improve the ippiScaleC performance?

Gennady_F_Intel · ‎07-24-2020

Andrey, how could we check this case? There is no reproducer attached to this thread.

andreypir · ‎07-24-2020

Sorry I thought I attached the zip. Here it is.

Gennady_F_Intel · ‎07-24-2020

ok, what version of IPP did you compare with?

andreypir · ‎07-24-2020

2020.2.254

andreypir · ‎07-24-2020

ippi.dll shows File version 2020.0.2.1083

Gennady_F_Intel · ‎07-26-2020

Yes, it seems there is a problem on the IPP side and this function has to be more optimized. We will escalate the case.

Andrey_B_Intel · ‎07-27-2020

Hi Andreypir!

The IPP works better with rectangular ROIs when loads whole SIMD register.

But could you please replace in your code this

for (int n = 0; n < NTESTS_SCALE; n++)
{
  ScaleWithIPP(Source, nColumns, Dest, nRows, Factor, Shift);
}

with the next code? I see some speedup at my 64bit Skylake system.

for (int n = 0; n < NTESTS_SCALE; n++)
{
  int dLen = 0;
  int phase = 0;
  ippsSampleDown_32f((Ipp32f*)Source, nColumns*nRows, Dest, &dLen, nColumns, &phase);
  IppiSize roiSize = { nRows, 1 };
  ippiScaleC_32s32f_C1R((Ipp32s*)Dest, nRows * sizeof(__int32), Factor, Shift, Dest, sizeof(float), roiSize, ippAlgHintFast);
}

Thanks.

andreypir · ‎07-27-2020

Hi Andrey,

Thank you. Yes, downsampling then scaling is substantially faster than just scaling, and somewhat faster than a loop:

ScaleWithLoop: 828
ScaleWithIPP 625

on my computer. I think I hoped for a better gain, but this will work.

ippiScaleC_32s32f_C1R is slower than simple loop

Performance