- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
int orig_code( unsigned short *data, unsigned short *gain)
{
int pixel_value = 0;
int idex = 0;
int ImageSize = 2000*2000;
for( idex = 0; idex < ImageSize; idex++, gain++ )
{
pixel_value = (int)*data;
pixel_value *= (int)*gain;
}
return(0);
}
Trying to convert this code to a loop using the vector data types (slow_code)yielded worse results. The orig_code takes 10 msec to process a 2K x 2K 16 bit grayscale image. The slow_code takes 18 msec. Any suggestions? I was expecting roughly a factor of 8 improvement in the processing time.
int slow_code( Ipp16u *data, Ipp16u *dst, Ipp16u *gain)
{
int idex = 0;
int nLoop = 2000*2000/8;
Iu16vec8 *vdata, *vgain, *vdst;
Ipp16u *tdst;
vdst = (Iu16vec8 *) dst;
vgain = (Iu16vec8 *) gain;
vdata = (Iu16vec8 *) data;
for( idex = 0; idex < nLoop; idex++, vgain++, vdata++, vdst++ )
{
*vdst = (*vdata) * (*vgain);
}
return(0);
}
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
it seems for me you are trying to multiply each corresponding pixels in two images? Please look for ippiMul functions in IPP manual
Regards,
Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vladmir,
You are correct. The code as written could be done with the image processing functions. I left out some other more complicated parts of the code that would prevent me from using the image processing functions in general, so I was trying to understand why this simple example would not result in an 8 times speed improvement. Any suggestions besides using the image processing library functions?
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Having such a big amount of data, like in your case 2Kx2K 16s images it is important to use processor's cache in efficient manner. I mean you need to organize processing in such a way to work with limited amount of data (which can fit L0 processor cache) as long as you can, and only after that you can move processing window to another part of your data. For example, processin in row-by-row fashion should improve performance in general. If you can't use image processing functions by some reason you should be able to take advantage from using Intel compiler vectorization. You are going to run this code on multi-core processors it also important to use OpenMP parallelazation supported by Intel Compiler. Hope this should help to improve performance for such tasks.
Regards,
Vladimir

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page