Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

problem with GetCpuClocks

coolsandyforyou
Beginner
256 Views
#include
#include

void main()
{
const int SIZE=256;
Ipp8u pSrc[SIZE],pDst[SIZE];
Ipp64u begin,end;

int i;


for(i=0;ipSrc=(Ipp8u)i;

begin=ippGetCpuClocks();
for(i=0;ipDst=pSrc;
end=ippGetCpuClocks();
printf("time taken in c=%ld",(end-begin));

begin=ippGetCpuClocks();
ippsCopy_8u(pSrc,pDst,SIZE);
end=ippGetCpuClocks();
printf("time taken in ipp=%ld",(end-begin));


}

i am surprised to see that time taken in ipp is 6 times larger than in c. Is thr anything wrong with the code?
0 Kudos
3 Replies
Sergey_K_Intel
Employee
256 Views
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.
Try your sample with "-Od" compiler option, i.e. without optimization.

P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.

Regards,
Sergey
0 Kudos
coolsandyforyou
Beginner
256 Views
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.
Try your sample with "-Od" compiler option, i.e. without optimization.

P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.

Regards,
Sergey
i kept no optimization still got the same thing,but when itried for other functions like SAD() it worked fine...
0 Kudos
Sergey_K_Intel
Employee
256 Views
Quoting - coolsandyforyou
i kept no optimization still got the same thing,but when itried for other functions like SAD() it worked fine...

Hi,
Really, the performance numbers are as you were describing. It looks like the problem is in "cold" instruction cache. Our guys say, that with rare IPP function calls and with short data the pure C/C++ loops are faster than IPP function calls. But, try the following modification of your test (bold lines were added) and you'll see different performance data

#include
#include
void main()
{
const int SIZE=256;
Ipp8u pSrc[SIZE],pDst[SIZE];
Ipp64u begin,end;
Ipp8u pSrc1[SIZE], pDst1[SIZE]; // dumb arrays

int i;

for(i=0;i pSrc=(Ipp8u)i;

ippsCopy_8u(pSrc1,pDst1,SIZE); // instruction cache warming

begin=ippGetCpuClocks();
for(i=0;i pDst=pSrc;
end=ippGetCpuClocks();
printf("time taken in c=%ldn",(end-begin));

begin=ippGetCpuClocks();
ippsCopy_8u(pSrc,pDst,SIZE);
end=ippGetCpuClocks();
printf("time taken in ipp=%ldn",(end-begin));
}

Regards,
Sergey

0 Kudos
Reply