- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a pixel loop, which is called a lot and want to optimize it so it runs as fast as it can on Pentium III processors and higher.
What would be the best compilers options (or better yet, pragmas) or its there a better way to organize the code for the following code clip:
unsigned short r,g,b,xdiv;
unsigned char *p,*ip;
unsigned char xin,xout,mxid;
for (x=0;x xin=xIn;
xout=xOut;
xdiv=xDiv;
ip=&inbuf[xPix];
r=(*ip++)*xin;
g=(*ip++)*xin;
b=(*ip++)*xin;
for (i=0;i;++i) {
r+=((*ip++)<<8);
g+=((*ip++)<<8);
b+=((*ip++)<<8);
}
*p++=(unsigned char)((r+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((g+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((b+((*ip)*xout))/xdiv);
*p++=255;
}
Thanks!
Bill
What would be the best compilers options (or better yet, pragmas) or its there a better way to organize the code for the following code clip:
unsigned short r,g,b,xdiv;
unsigned char *p,*ip;
unsigned char xin,xout,mxid;
for (x=0;x
xout=xOut
xdiv=xDiv
ip=&inbuf[xPix
r=(*ip++)*xin;
g=(*ip++)*xin;
b=(*ip++)*xin;
for (i=0;i
r+=((*ip++)<<8);
g+=((*ip++)<<8);
b+=((*ip++)<<8);
}
*p++=(unsigned char)((r+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((g+((*ip++)*xout))/xdiv);
*p++=(unsigned char)((b+((*ip)*xout))/xdiv);
*p++=255;
}
Thanks!
Bill
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
bferster,
>What would be the best compilers options
The best next step is to see if you can take advantage of SSE2.
The easiest would be to see if you can take advantage of autovectorization.Please see -
The users guide will be located in the appropriate product under http://www.intel.com/software/products/compilers/
(Windows, IA32 C+ users guide is at http://www.intel.com/software/products/compilers/techtopics/ccug.htm) http://www.intel.com/software/products/compilers/techtopics/Compiler_Optimization_7_02.htm.
Aaron Coday had posted a couple good links giving some examples.
"
You should also check this link that has a bunch of good guides on SSE code use:
SSE App notes http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/resources/appnotes/sse/index.htm
Another really good one is:
Power Programming - SIGGRAPH 2001
http://www.optimizations.org/optimizations/Siggraph2001-Klimovitski2.ppt
Cheers,
Aaron Coday
"
Please see if this helps.
Ganesh
Thanks,
Ganesh
>What would be the best compilers options
The best next step is to see if you can take advantage of SSE2.
The easiest would be to see if you can take advantage of autovectorization.Please see -
The users guide will be located in the appropriate product under http://www.intel.com/software/products/compilers/
(Windows, IA32 C+ users guide is at http://www.intel.com/software/products/compilers/techtopics/ccug.htm) http://www.intel.com/software/products/compilers/techtopics/Compiler_Optimization_7_02.htm.
Aaron Coday had posted a couple good links giving some examples.
"
You should also check this link that has a bunch of good guides on SSE code use:
SSE App notes http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/resources/appnotes/sse/index.htm
Another really good one is:
Power Programming - SIGGRAPH 2001
http://www.optimizations.org/optimizations/Siggraph2001-Klimovitski2.ppt
Cheers,
Aaron Coday
"
Please see if this helps.
Ganesh
Thanks,
Ganesh
Message Edited by intel.software.network.support on 12-09-2005 02:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A few comments on the code above:
- to traverse an array by creating pointer to its head and incrementing the pointer is not good for performance. Use operator[] instead.
- integer division is very slow unless denominator is a power of 2 *and* known at compile time. You may gain a lot in speed xdiv assumes only a handful of possible values and all of them are powers of 2: replace division by xdiv by left shift (<<) by an appropriate number of bits.
- to traverse an array by creating pointer to its head and incrementing the pointer is not good for performance. Use operator[] instead.
- integer division is very slow unless denominator is a power of 2 *and* known at compile time. You may gain a lot in speed xdiv assumes only a handful of possible values and all of them are powers of 2: replace division by xdiv by left shift (<<) by an appropriate number of bits.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page