- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working on an image processing problem. It is similar to stencil computation in many aspects. When I compile it with "-g" option my program has about 30% speedup over naive version. However, if I compile with any optimization option such as "O3" or even just without "-g" option my code is considerably slower (even two times for large images) than the naive version. Can anyone suggest me where should I look for the solution?
I am using icpc as compiler, I have tried my code on many machines Xeon, Opteron, Core i7 etc. - similar performance everywhere. The images are converted into single precision arrays using CImg library and then I operate on arrays.
Why my code should be fast is because I use 1) data level blocking in my version 2) In place storage as opposed to out of place in naive version.
Link Copied
3 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When you use -g compiler option, the symbols are icluded in the object file. This bloats the object size. The big siez applicatio should run slower.
It would be nice if you can share the testcase.
Thanks,
Om
It would be nice if you can share the testcase.
Thanks,
Om
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
-g without any -O option implies -O0.
-O2 and -O3 optimize for loop trip counts of at least 100. If your trip counts are small enough, it's possible the compiler makes the wrong assumptions when optimizing. -O1 is less likely to encounter such problems. You could try -unroll0; I've seen it help even for fairly large trip counts.
Profile guided optimization (-prof-gen .... -prof-use) is intended to help the compiler make better assumptions for optimization.
The "12.0" xe 2011 bring back
#pragma loop count(10)
as an alternative to PGO to inform the compiler if you want a target loop length 10 for optimization.
-O2 and -O3 optimize for loop trip counts of at least 100. If your trip counts are small enough, it's possible the compiler makes the wrong assumptions when optimizing. -O1 is less likely to encounter such problems. You could try -unroll0; I've seen it help even for fairly large trip counts.
Profile guided optimization (-prof-gen .... -prof-use) is intended to help the compiler make better assumptions for optimization.
The "12.0" xe 2011 bring back
#pragma loop count(10)
as an alternative to PGO to inform the compiler if you want a target loop length 10 for optimization.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all. The problem was that in a function I had manual loop unrolling which degraded the performance as compiler was applying unrolling too.
I assume compiler unrolling does not apply when -g is used and hence my code was faster.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page