- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Attached is a simple console project that shows that when scaling a matrix ippiScaleC_32s32f_C1R is slower than a simple equivalent C++ code loop. In this example a column from the source matrix is scaled into an output vector. On my PC with i7-7700K the C++ loop is about 20% faster.

Is there any way to improve the ippiScaleC performance?

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Andrey, how could we check this case? There is no reproducer attached to this thread.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Sorry I thought I attached the zip. Here it is.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Yes, it seems there is a problem on the IPP side and this function has to be more optimized. We will escalate the case.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Andreypir!

The IPP works better with rectangular ROIs when loads whole SIMD register.

But could you please replace in your code this

```
for (int n = 0; n < NTESTS_SCALE; n++)
{
ScaleWithIPP(Source, nColumns, Dest, nRows, Factor, Shift);
}
```

with the next code? I see some speedup at my 64bit Skylake system.

```
for (int n = 0; n < NTESTS_SCALE; n++)
{
int dLen = 0;
int phase = 0;
ippsSampleDown_32f((Ipp32f*)Source, nColumns*nRows, Dest, &dLen, nColumns, &phase);
IppiSize roiSize = { nRows, 1 };
ippiScaleC_32s32f_C1R((Ipp32s*)Dest, nRows * sizeof(__int32), Factor, Shift, Dest, sizeof(float), roiSize, ippAlgHintFast);
}
```

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Hi Andrey,

Thank you. Yes, downsampling then scaling is substantially faster than just scaling, and somewhat faster than a loop:

ScaleWithLoop: 828

ScaleWithIPP 625

on my computer. I think I hoped for a better gain, but this will work.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page