- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rohit,
The inner loop looks to a dot products, and hot spot of the code
for (chip = 0; chip < sf; chip++){
sum += (*pSrc++) * (*pW++); //
}
If sf is large, the code could be replace with ippsDotProd_ function.
Thanks,C
Chao
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) already mentioned by Chao:
for (sym = 0; sym < N_corr; sym++){
Ipp32f sum;
ippsDotProd_32f( pSrc, pW, sf, ∑ );
*pDest++ = sum; //
pSrc += sf;
}
2) with temp buf of size sf:
Ipp32f sum;
for( sym = 0; sym < N_corr; sym += sf ){
ippsMul_32f( pSrc, pW, pBuf, sf );
pSrc += sf;
ippsSum_32f( pBuf, sf, ∑, ippAlgHintFast );
*pDest++ = sum;
}
3) based on AddProduct function - guess not so efficient
and I think that the 1st one should be the most efficient in case of reasonable sf, otherwise there is no alternative for C code compiled with Intel compiler - but for efficien code generation (vectorization) you should re-write your code with arrays and indexes and use "ivdep" pragma.
Regards,
Igor
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the help. The performance of dotproduct is good for significant sf sizes compared to C.
Regards
Rohit

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page