- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am encountering a performance issue. The same source produces 15-20% slower code on ICC 10.1.13 then GCC 4.1.0
My the mainloop is basicly one switch stament that handles 64 bit data. It is an emulator of some old hardware.
The benchmarking expert gave me some help with emon. He said that the CPI (cycles per instruction) with icc was much lower but to do the same work it required a lot more instructions. So the net was a marked loss of performance.
ICC -O3 -ipo -xT -static -no-ansi-alias
GCC -O3 -fno-strict-aliasing
What else can I try? Am I missing something?
My the mainloop is basicly one switch stament that handles 64 bit data. It is an emulator of some old hardware.
The benchmarking expert gave me some help with emon. He said that the CPI (cycles per instruction) with icc was much lower but to do the same work it required a lot more instructions. So the net was a marked loss of performance.
ICC -O3 -ipo -xT -static -no-ansi-alias
GCC -O3 -fno-strict-aliasing
What else can I try? Am I missing something?
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your result is consistent with mine. gcc performs code straightening optimizations on switch with normal optimization. I think the best chance with icc is to change the code so as to favor the frequently taken branch (put it first, maybe with if..else.. ), or try profile guided optimization (prof_gen/prof_use profiling).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As Tim suggested, using profile guided optimization could be quite helpful for a case like this (-prof-gen, -prof-use). This would help ensure that the most frequent cases are handled most efficiently. If you would like to post a sample I'd be happy to look at it and see if there's anything else to do.
Dale
Dale
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried the profiled guided optimizations and it closes the gap but still results in slower code. The code is about 10% slower which is an improvement over the 15%.
I collected some logs on my major switch statement. There about 30 cases, and they are fairly even distributed, 10% to 1%. Would that make a difference?
I collected some logs on my major switch statement. There about 30 cases, and they are fairly even distributed, 10% to 1%. Would that make a difference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I suspect the even distribution would account for loop straightening coming out more effective than the use of PGO to sort the cases by priority.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page