Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7956 Discussions

Compiler de-optimizes unrolled loops

jamesqf
Beginner
356 Views
I've discovered a rather odd behavior of the Linux 8.0 compiler. I'm writing some test loops (using the vector classes in fvec.h) and unrolling them by hand to see how performance changes. With -O2 or O3, if I unroll by a factor of 20 or so, the compiler will convert the unrolled code into a function, and call it once for each instance. That is, if I unroll 20 times, there are 20 calls. Performance drops by a factor of 5-10 when this happens, I think inpart because (from looking at the assembly dump) the code in the function is not well optimized.
Is this a known problem? If so, is there a way to stop it from happening, and still get the benefit of other optimization?
Thanks,
James
PS: I tried to attach the source, but apparently .cpp files aren't acceptable!
0 Kudos
2 Replies
TimP
Honored Contributor III
356 Views

If you want to request the compiler do simple unrolling, you can simply use the pragma (choose your number):

#pragma unroll(4)

The compiler's own automatic unrolling is more likely to be excessive than conservative. Loops which are suitable are usually unrolled by 8 already. A likely reason you might want to dictate the unrolling is to fit evenly into an expected loop count.Depending on which architecture you are using, various reasons may exist why excessive unrolling prevents optimization. On an SSE architecture, you want to fitbuilt-in parallelism.Xeon trace cache also has some automatic unrolling properties. I don't know whether the compiler foresees that.

0 Kudos
jamesqf
Beginner
356 Views

Yes, I doknow the compiler can do automatic, or pragma-specified,unrolling. What I'm doing right now is trying to get a handle on the underlying performance of the code when it's unrolled a specified amount. Or in other words, how much benefit do I get if I unroll by 2, 4, 8, or whatever? Since the application I'm working on can have loop counts in the millions, there's room for a lot :-) And I ran across this very strange behavior in the process...

Which brings up an interesting thought: suppose I tell it, via the pragma, to unroll 20 times. Will the compiler merrily convert the unrolling it does to function calls? Stay tuned...

James

0 Kudos
Reply