Community
cancel
Showing results for 
Search instead for 
Did you mean: 
dinwal
Beginner
88 Views

Speedup with -g option but not otherwise

I am working on an image processing problem. It is similar to stencil computation in many aspects. When I compile it with "-g" option my program has about 30% speedup over naive version. However, if I compile with any optimization option such as "O3" or even just without "-g" option my code is considerably slower (even two times for large images) than the naive version. Can anyone suggest me where should I look for the solution?
I am using icpc as compiler, I have tried my code on many machines Xeon, Opteron, Core i7 etc. - similar performance everywhere. The images are converted into single precision arrays using CImg library and then I operate on arrays.
Why my code should be fast is because I use 1) data level blocking in my version 2) In place storage as opposed to out of place in naive version.
0 Kudos
3 Replies
Om_S_Intel
Employee
88 Views

When you use -g compiler option, the symbols are icluded in the object file. This bloats the object size. The big siez applicatio should run slower.

It would be nice if you can share the testcase.

Thanks,

Om
timintel
Beginner
88 Views

-g without any -O option implies -O0.
-O2 and -O3 optimize for loop trip counts of at least 100. If your trip counts are small enough, it's possible the compiler makes the wrong assumptions when optimizing. -O1 is less likely to encounter such problems. You could try -unroll0; I've seen it help even for fairly large trip counts.
Profile guided optimization (-prof-gen .... -prof-use) is intended to help the compiler make better assumptions for optimization.
The "12.0" xe 2011 bring back
#pragma loop count(10)
as an alternative to PGO to inform the compiler if you want a target loop length 10 for optimization.
dinwal
Beginner
88 Views

Thank you all. The problem was that in a function I had manual loop unrolling which degraded the performance as compiler was applying unrolling too.
I assume compiler unrolling does not apply when -g is used and hence my code was faster.
Reply