Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6706 Discussions

ippiScaleC_32s32f_C1R is slower than simple loop

andreypir
Beginner
1,980 Views

Attached is a simple console project that shows that when scaling a matrix ippiScaleC_32s32f_C1R is slower than a simple equivalent C++ code loop. In this example a column from the source matrix is scaled into an output vector. On my PC with i7-7700K the C++ loop is about 20% faster.

Is there any way to improve the ippiScaleC performance?

Labels (1)
0 Kudos
8 Replies
Gennady_F_Intel
Moderator
1,971 Views

Andrey, how could we check this case? There is no reproducer attached to this thread.

0 Kudos
andreypir
Beginner
1,966 Views

Sorry I thought I attached the zip. Here it is.

0 Kudos
Gennady_F_Intel
Moderator
1,962 Views

ok, what version of IPP did you compare with?

0 Kudos
andreypir
Beginner
1,956 Views
0 Kudos
andreypir
Beginner
1,955 Views

ippi.dll shows File version 2020.0.2.1083

0 Kudos
Gennady_F_Intel
Moderator
1,940 Views

Yes, it seems there is a problem on the IPP side and this function has to be more optimized. We will escalate the case. 

0 Kudos
Andrey_B_Intel
Employee
1,927 Views

 

Hi Andreypir!

The IPP works better with rectangular ROIs when loads whole SIMD register.

But could you please replace in your code this

for (int n = 0; n < NTESTS_SCALE; n++)
{
  ScaleWithIPP(Source, nColumns, Dest, nRows, Factor, Shift);
}

with the next code? I see some speedup at my 64bit Skylake system.

for (int n = 0; n < NTESTS_SCALE; n++)
{
  int dLen = 0;
  int phase = 0;
  ippsSampleDown_32f((Ipp32f*)Source, nColumns*nRows, Dest, &dLen, nColumns, &phase);
  IppiSize roiSize = { nRows, 1 };
  ippiScaleC_32s32f_C1R((Ipp32s*)Dest, nRows * sizeof(__int32), Factor, Shift, Dest, sizeof(float), roiSize, ippAlgHintFast);
}

 Thanks.

0 Kudos
andreypir
Beginner
1,922 Views

Hi Andrey,

Thank you. Yes, downsampling then scaling is substantially faster than just scaling, and somewhat faster than a loop:

ScaleWithLoop: 828
ScaleWithIPP 625

on my computer. I think I hoped for a better gain, but this will work.

 

0 Kudos
Reply