Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7944 Discussions

What performance improvement should I expect vs Visual Studio 2k8 compiler?

cubitusradius
Beginner
208 Views
Hi,

I'm a 3D software developper and I just converted my projects to Intel Compiler 11. I tried all compiler options suggested in this document http://cache-www.intel.com/cd/00/00/22/23/222300_222300.pdf including PGO.

It the end I found that the best results I could get is using those compiler options:
/O3 /Qip /QxSSE4.1 /Qprec

For now I dropped PGO since it would require for us to somehow automate the process of collecting data and the performance results showed that the overhead may not worth it.

On pure CPU bound thread (culling) I found that the best relative performance improvement topped 8.7% (but the average was around 5%).

What performance improvement should I expect compared to Visual Studio 2k8 compiler?

For now it is a bit disappointing...

thanks for your advices,

Cub


0 Kudos
3 Replies
JenniferJ
Moderator
208 Views
Quoting - cubitusradius
It the end I found that the best results I could get is using those compiler options:
/O3 /Qip /QxSSE4.1 /Qprec

On pure CPU bound thread (culling) I found that the best relative performance improvement topped 8.7% (but the average was around 5%).

What performance improvement should I expect compared to Visual Studio 2k8 compiler?

Performance improvement is depending on your program. Try /Qipo. It helps with more optimizations.

If you know whichfunctions are the bottle neck, check with /Qvec-report to see if the loops are vectorized. If not, can the loop be re-written so it can be vectorized? Also for the loops, "#pragma omp" is another choice to parallelize it.

So there're many ways.

If you do not know the bottle-neck functions, you can try the Amplifer from Parallel Studio beta at http://software.intel.com/en-us/intel-parallel-studio-home/.

Jennifer
0 Kudos
cubitus
Beginner
208 Views

Performance improvement is depending on your program. Try /Qipo. It helps with more optimizations.

If you know whichfunctions are the bottle neck, check with /Qvec-report to see if the loops are vectorized. If not, can the loop be re-written so it can be vectorized? Also for the loops, "#pragma omp" is another choice to parallelize it.

So there're many ways.

If you do not know the bottle-neck functions, you can try the Amplifer from Parallel Studio beta at http://software.intel.com/en-us/intel-parallel-studio-home/.

Jennifer

Hi,

I finally tried /Qipo (at first my program was linking with a library that prevented it)... but now I run into the "out of memory" problem.

I googled it a bit to find out that they are other compiler options to limit inlining if I run into that issue: /Qinline-max*

Is there a way to tell the compiler just not to run into out of memory? It's a 32-bit environment so it is known that you can't allocate much more than 2GB.

thanks,

Mathieu
0 Kudos
Om_S_Intel
Employee
208 Views

You can try "/Qipo n" compiler option. You may start with n=2 and increase if you are still running out of memory.
0 Kudos
Reply