Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

IPP not faster than standard implementation

Manuel_P_
Beginner
1,745 Views
Hi,

I did some performance comparisons, used Intel C++ compiler 9.1 for Windows.

ippsCopy_32s vs. memcpy: exactly the same speed.
ippsConvert_16s32f is slower than the standard C type cast (float) in a loop.
The ipps_zlib is slower than a standard zlib library.

Do you have an explanation for this?
What am I doing wrong? (I used Dynamic Linkage.)
What performance gains could be expected usually?

Thanks,
Martin

0 Kudos
11 Replies
Venkatrajk_K_Intel
1,745 Views

hi, could you tell us how you actually compared the performance, is it using Gettickcount of the windows or some other way?

0 Kudos
Manuel_P_
Beginner
1,745 Views
Here is an example:

Ipp64u t1, t2;
Ipp32s *pSrc[Nbuffers], *pDst[Nbuffers];

// ...

t1 = ippGetCpuClocks();
for (i=0; i < Nbuffers; ++i)
ippsCopy_32s(pSrc, pDst, N);
t2 = ippGetCpuClocks();
printf("Time for ippsCopy: %s ", hyperToString((t2-t1)/100000).c_str());

t1 = ippGetCpuClocks();
for (i=0; i < Nbuffers; ++i)
memcpy(pDst, pSrc, N*sizeof(pSrc[0][0]));
t2 = ippGetCpuClocks();

0 Kudos
Vladimir_Dudnik
Employee
1,745 Views

If your buffers are small than memcpy (which inlined by compiler) will be faster then call to DLL, is not it?

Vladimir

0 Kudos
Manuel_P_
Beginner
1,745 Views
Intel's new book "Optimizing Applications for Multi-Core Processors" says at page 77 (Figure 5.2) that ippsCopy is always faster than memcpy independent of the array length.
Unfortunately, I cannot reproduce this.
The buffer sizes I used are:
N=1000; (this is the array length)
Nbuffers=30000; (so the test was repeated in a loop 30000 times to provide reliable timings).
I tried different other values for N and Nbuffer, but memcpy and ippsCopy are always equally fast.

Cheers, Martin

0 Kudos
Vladimir_Dudnik
Employee
1,745 Views

If you compile your code with Intel C compilerthen for the memcpy/ippsCopy you compare mostly the same code as Intel compiler use the same optimized kernel from ippsCopy.

Regarding your note on zlib - could you please provide more details, what you actually compare, what conditions, what is IPP version, what is data you work with.

Our testing shows performance benefits for IPP optimized zlib over standard zllib. That is why it was stated in IPP book. You may fall into some specific conditions where it is not so, we would like to understand what is actually happen within your test.

Regards,
Vladimr

0 Kudos
murali_523
Beginner
1,745 Views
I found that for ippAffine release version(IPP5.2 version) speedup is slower than standard implementation.But debug version is faster than standard implementation when run in debug mode.What might be the difference .
Can you please tell what i have to do to make the release vesion more faster

0 Kudos
murali_523
Beginner
1,743 Views
In initially i had written a function to do a affine transformation of given source image to dest. image.The source image is scale form 512*512 to 1024*1024.
I found that for ippAffine release version(IPP5.2 version) speed is slower than standard implementation.
But debug version is faster than standard implementation when run in debug mode.
What might be the problem .
Can you please tell what i have to do to make the release vesion more faster

I am using dynamic linking.My processor type is P4. 3GHz and Ippi is loading Ippit7 dll when i run my application.
0 Kudos
murali_523
Beginner
1,743 Views
The function i used is ippiWarpAffine_16u_C1R.Both source and destination image buffers i allocoted on the virtual memory and not on heap.
0 Kudos
murali_523
Beginner
1,743 Views
I am trying to interpolate the image using bicubic interpolation.Is there any difference in performance between release and debug.I found the release is much slower the standard implementation which runs in release mode .But in debug mode it is faster than standard implementation.
0 Kudos
Vladimir_Dudnik
Employee
1,743 Views

Hello,

you do not have debug version of IPP (itnever was released). If you experience difference in performance in your application for debug vs release build you need to check what is wrong in your application because in both cases you use release version of IPP binaries.

Regards,
Vladimir

0 Kudos
blakepgm
Beginner
1,743 Views

I found this post because I searched for "ippsConvert_16s32f is slower".

I have found the same using ippIP AVX (e9) 2020.0.1

What is interesting is that I found ippsConvert_16s32f to be 3-4 times faster if m7_ippsConvert_16s32f is used instead of e9_ippsConvert_16s32f.

I set this by calling ippSetCpuFeatures(PX_FM). Other IPP procedures perform faster when set to E9 as expected, but it seems that the M7 short->float conversion has some advantage.

The CPU I am testing with is:

AMD A10-7800 Radeon R7

0 Kudos
Reply