Community
cancel
Showing results for 
Search instead for 
Did you mean: 
cubitusradius
Beginner
23 Views

What performance improvement should I expect vs Visual Studio 2k8 compiler?

Hi,

I'm a 3D software developper and I just converted my projects to Intel Compiler 11. I tried all compiler options suggested in this document http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf including PGO.

It the end I found that the best results I could get is using those compiler options:
/O3 /Qip /QxSSE4.1 /Qprec

For now I dropped PGO since it would require for us to somehow automate the process of collecting data and the performance results showed that the overhead may not worth it.

On pure CPU bound thread (culling) I found that the best relative performance improvement topped 8.7% (but the average was around 5%).

What performance improvement should I expect compared to Visual Studio 2k8 compiler?

For now it is a bit disappointing...

thanks for your advices,

Cub


0 Kudos
3 Replies
JenniferJ
Moderator
23 Views

Quoting - cubitusradius
It the end I found that the best results I could get is using those compiler options:
/O3 /Qip /QxSSE4.1 /Qprec

On pure CPU bound thread (culling) I found that the best relative performance improvement topped 8.7% (but the average was around 5%).

What performance improvement should I expect compared to Visual Studio 2k8 compiler?

Performance improvement is depending on your program. Try /Qipo. It helps with more optimizations.

If you know whichfunctions are the bottle neck, check with /Qvec-report to see if the loops are vectorized. If not, can the loop be re-written so it can be vectorized? Also for the loops, "#pragma omp" is another choice to parallelize it.

So there're many ways.

If you do not know the bottle-neck functions, you can try the Amplifer from Parallel Studio beta at http://software.intel.com/en-us/intel-parallel-studio-home/.

Jennifer
cubitus
Beginner
23 Views


Performance improvement is depending on your program. Try /Qipo. It helps with more optimizations.

If you know whichfunctions are the bottle neck, check with /Qvec-report to see if the loops are vectorized. If not, can the loop be re-written so it can be vectorized? Also for the loops, "#pragma omp" is another choice to parallelize it.

So there're many ways.

If you do not know the bottle-neck functions, you can try the Amplifer from Parallel Studio beta at http://software.intel.com/en-us/intel-parallel-studio-home/.

Jennifer

Hi,

I finally tried /Qipo (at first my program was linking with a library that prevented it)... but now I run into the "out of memory" problem.

I googled it a bit to find out that they are other compiler options to limit inlining if I run into that issue: /Qinline-max*

Is there a way to tell the compiler just not to run into out of memory? It's a 32-bit environment so it is known that you can't allocate much more than 2GB.

thanks,

Mathieu
Om_S_Intel
Employee
23 Views


You can try "/Qipo n" compiler option. You may start with n=2 and increase if you are still running out of memory.