- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
there are cases where inlining won't be done. The compiler decides upon different data whether inlining can be beneficial or not. This involves thresholds that, once exceeded, indicate the compiler to not inline certain portions of code. At this point it ignores requests for additional inlining. In your case you might want to increase the thresholds.
Before I'm pointing you to the corresponding options I'd like to give a warning: Changing the inlining, and esp. forcing it, can have (negative) side-effects on
- register pressure
- caching and
- code size
Hence, changing thresholds has impact on any aspect mentioned above. In some cases you'll see some improved performance but in (most) other cases you'll see massive slowdowns.
All inlining thresholds are globally increased (or decreased) using this option:
Linux:
-inline-factor=n
Windows:
/Qinline-factor=n
Use n > 100 to increase all inline thresholds (e.g. -inline-factor=150). Start with this option first till you see the desired effect.
Once you saw the desired effect (as a proof) you might omit it and fine-tune the different thresholds using some of those options (see latest manual http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/win/copts/common_content/options_ref_bk_inlining.htm):
Linux:
-inline-max-per-compile=n
-inline-max-per-routine=n
-inline-max-size=n
-inline-max-total-size=n
(-inline-min-size=n)
Windows:
/Qinline-max-per-compile=n
/Qinline-max-per-routine
/Qinline-max-size=n
/Qinline-max-total-size=n
(/Qinline-min-size=n)
All mentioned options are only documented for Compiler XE.
I hope this works for you.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are picking a stock answer for inlining issues without reading Eric's post.
Eric had a template returning an array element as opposed to being a loop (what your stock answer relates to).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Perhaps you could try:
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
the options I've listed above are not related to or mentioning loops at all.
ric wants "Toperator()(int i) { return m; }" to be inlined but it seems that the inlining-threshold was exceeded. With the options listed above the threshold(s) can be tweaked to force inlining, though.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
//ric wants "Toperator()(int i) { return m; }" \\
In most cases returning the refererence (T&) is just as functional. ric will have to be the judge of this.
When T is a fundamental type (char, short, word, int, float, double, void*, ...) then
"Toperator()(int i) { return m; }"
Will likely get inlined.
However, when T is a struct or class then
"Toperator()(int i) { return m; }"
often creates code to execute a copy operator (either default or user supplied)
When T is a large struct/class then the overhead of the non-inlined function call would be low.
When T is a small struct/class then the overhead of the non-inlined function call would be high.
Additionally, in some/many situations the copy operation may be unnecessary but done anyway.
Whereas "T&operator()(int i) { return m; }" returns a reference(pointer) to m, which can always be inlined, then the compiler can decide to use it directly (when copy not require)or as an argument to the copy operator (when copy is required).
ric could comment on this.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am new to the Intel compiler, so I am still trying to get a grip on the optimizations.
But your point on the copy operation brings me to this.
Whether T operator()(int i) generates an implied call to a copy constructor or copy operator was typically a function of the optimization level of the compiler. I would expect that the compiler by default would generate a copy operation. However, I would also expect that specifying a higher level of optimzation, the compiler should inline the method and determine the copy was not needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
__forceinline on that operator function definition should do what you want.
That being said, I'm not sure what's meant by saying that "when it's compile in a single file" it inlines. That class definition should be in an included header file which is always part of every single file where that would be inlined. Perhaps I'm missing something in the description there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
staying with the return by value version, a copy of the value is returned:
If "T" is a POD (Plain Old
If "T" is a class/struct type the C++ standard requires calling its
copy-ctor whenever a copy of it is returned. Depending on the complexity of the
underlying structure (inheritance hierarchy, amount of data members) the
copy-ctor can be quite complex, and can even call other copy-ctors as well. The
benefit of inlining such copy-ctors might negatively effect performance and so
the compiler will decide to not doing it. If theres no negative effect the
compiler will do.
However, if you still want to force inlining and the compiler does not do what you want you have to increase the
thresholds. Pragmas, options and specifiers (e.g. __inline) might only work if
the thresholds arent reached yet.
As Jim already pointed out you can also change the semantic to return by
reference or even by address to avoid eventual copy-ctor overhead. Downside is
that the remaining code needs to be aware of that - operations on copy or on
the original value stored in a container might be quite different here. Under
the line thats a design change with all the side-effects.
Best regards,
Georg Zitzlsberger
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page