Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Optimization question...

bferster
Beginner
340 Views
I have a pixel loop, which is called a lot and want to optimize it so it runs as fast as it can on Pentium III processors and higher.

What would be the best compilers options (or better yet, pragmas) or its there a better way to organize the code for the following code clip:

unsigned short r,g,b,xdiv;
unsigned char *p,*ip;
unsigned char xin,xout,mxid;

for (x=0;x xin=xIn;
xout=xOut;
xdiv=xDiv;
ip=&inbuf[xPix];
r=(*ip++)*xin;
g=(*ip++)*xin;
b=(*ip++)*xin;
for (i=0;i;++i) {
r+=((*ip++)<<8);
g+=((*ip++)<<8);
b+=((*ip++)<<8);
}
*p++=(unsigned char)((r+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((g+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((b+((*ip)*xout))/xdiv);
*p++=255;
}

Thanks!

Bill
0 Kudos
2 Replies
Ganesh_R_Intel
Employee
340 Views
bferster,
>What would be the best compilers options
The best next step is to see if you can take advantage of SSE2.

The easiest would be to see if you can take advantage of autovectorization.Please see -
The users guide will be located in the appropriate product under http://www.intel.com/software/products/compilers/
(Windows, IA32 C+ users guide is at http://www.intel.com/software/products/compilers/techtopics/ccug.htm) http://www.intel.com/software/products/compilers/techtopics/Compiler_Optimization_7_02.htm.

Aaron Coday had posted a couple good links giving some examples.
"
You should also check this link that has a bunch of good guides on SSE code use:

SSE App notes http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/resources/appnotes/sse/index.htm

Another really good one is:
Power Programming - SIGGRAPH 2001
http://www.optimizations.org/optimizations/Siggraph2001-Klimovitski2.ppt

Cheers,
Aaron Coday
"

Please see if this helps.

Ganesh

Thanks,
Ganesh

Message Edited by intel.software.network.support on 12-09-2005 02:03 PM

0 Kudos
Telnov__Alex
Beginner
340 Views
A few comments on the code above:

- to traverse an array by creating pointer to its head and incrementing the pointer is not good for performance. Use operator[] instead.

- integer division is very slow unless denominator is a power of 2 *and* known at compile time. You may gain a lot in speed xdiv assumes only a handful of possible values and all of them are powers of 2: replace division by xdiv by left shift (<<) by an appropriate number of bits.
0 Kudos
Reply