Attached is a simple console project that shows that when scaling a matrix ippiScaleC_32s32f_C1R is slower than a simple equivalent C++ code loop. In this example a column from the source matrix is scaled into an output vector. On my PC with i7-7700K the C++ loop is about 20% faster.

Is there any way to improve the ippiScaleC performance?

Link Copied

Andrey, how could we check this case? There is no reproducer attached to this thread.

Sorry I thought I attached the zip. Here it is.

Yes, it seems there is a problem on the IPP side and this function has to be more optimized. We will escalate the case.

Hi Andreypir!

The IPP works better with rectangular ROIs when loads whole SIMD register.

But could you please replace in your code this

```
for (int n = 0; n < NTESTS_SCALE; n++)
{
ScaleWithIPP(Source, nColumns, Dest, nRows, Factor, Shift);
}
```

with the next code? I see some speedup at my 64bit Skylake system.

```
for (int n = 0; n < NTESTS_SCALE; n++)
{
int dLen = 0;
int phase = 0;
ippsSampleDown_32f((Ipp32f*)Source, nColumns*nRows, Dest, &dLen, nColumns, &phase);
IppiSize roiSize = { nRows, 1 };
ippiScaleC_32s32f_C1R((Ipp32s*)Dest, nRows * sizeof(__int32), Factor, Shift, Dest, sizeof(float), roiSize, ippAlgHintFast);
}
```

Thanks.

Hi Andrey,

Thank you. Yes, downsampling then scaling is substantially faster than just scaling, and somewhat faster than a loop:

ScaleWithLoop: 828

ScaleWithIPP 625

on my computer. I think I hoped for a better gain, but this will work.

