- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include
#include
void main()
{
const int SIZE=256;
Ipp8u pSrc[SIZE],pDst[SIZE];
Ipp64u begin,end;
int i;
for(i=0;i pSrc=(Ipp8u)i;
begin=ippGetCpuClocks();
for(i=0;i pDst=pSrc;
end=ippGetCpuClocks();
printf("time taken in c=%ld",(end-begin));
begin=ippGetCpuClocks();
ippsCopy_8u(pSrc,pDst,SIZE);
end=ippGetCpuClocks();
printf("time taken in ipp=%ld",(end-begin));
}
i am surprised to see that time taken in ipp is 6 times larger than in c. Is thr anything wrong with the code?
#include
void main()
{
const int SIZE=256;
Ipp8u pSrc[SIZE],pDst[SIZE];
Ipp64u begin,end;
int i;
for(i=0;i
begin=ippGetCpuClocks();
for(i=0;i
end=ippGetCpuClocks();
printf("time taken in c=%ld",(end-begin));
begin=ippGetCpuClocks();
ippsCopy_8u(pSrc,pDst,SIZE);
end=ippGetCpuClocks();
printf("time taken in ipp=%ld",(end-begin));
}
i am surprised to see that time taken in ipp is 6 times larger than in c. Is thr anything wrong with the code?
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.
Try your sample with "-Od" compiler option, i.e. without optimization.
P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.
Regards,
Sergey
Try your sample with "-Od" compiler option, i.e. without optimization.
P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Sergey Khlystov (Intel)
If you use optimized compiler mode, the compiler may optimize away all the code in your C-loop. So, it will contain two successive calls to ippGetCpuClocks only. Meanwhile, ippsCopy honestly copies all stuff between src and dst.
Try your sample with "-Od" compiler option, i.e. without optimization.
P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.
Regards,
Sergey
Try your sample with "-Od" compiler option, i.e. without optimization.
P.S. this behaviour is usual for optimizing compilers. If they see that some variable is not used down the code, compiler doesn't even process that variable.
Regards,
Sergey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - coolsandyforyou
i kept no optimization still got the same thing,but when itried for other functions like SAD() it worked fine...
Hi,
Really, the performance numbers are as you were describing. It looks like the problem is in "cold" instruction cache. Our guys say, that with rare IPP function calls and with short data the pure C/C++ loops are faster than IPP function calls. But, try the following modification of your test (bold lines were added) and you'll see different performance data
#include
#include
void main()
{
const int SIZE=256;
Ipp8u pSrc[SIZE],pDst[SIZE];
Ipp64u begin,end;
Ipp8u pSrc1[SIZE], pDst1[SIZE]; // dumb arrays
int i;
for(i=0;i
ippsCopy_8u(pSrc1,pDst1,SIZE); // instruction cache warming
begin=ippGetCpuClocks();
for(i=0;i
end=ippGetCpuClocks();
printf("time taken in c=%ldn",(end-begin));
begin=ippGetCpuClocks();
ippsCopy_8u(pSrc,pDst,SIZE);
end=ippGetCpuClocks();
printf("time taken in ipp=%ldn",(end-begin));
}
Regards,
Sergey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page