- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After switching from 12.1 to Composer XE 2013 ( Update 1, Windows 64-bit) I am seeing a consistent 10-15% slowdown across the board( code is built and benchmarked on a Quad Core Xeon). C++ Code compiled /O3, no auto-parellization.
Is this a known issue to be fixed in an update?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have run Amplifier XE 2013 and profiled the code. It appears XE 2013 is NOT inlining a simple function that 12.1 inlined. Compiler option is /Ob2
[cpp]
template <class T>
inline T Matrix::operator()(int i, int j) const
{
return data()[i*rowstep+j*colstep];
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Let me rephase this, Compiler 13.0 is not inlining the function when the operator(int ,int) is used many times in a complex expression, while 12.1 would seemingly be inlining the function. In the following expression ( this is auto-generated code),13.0 is seems to be generating function calls for operator (int,int) rather than in-line code, even at /O3 /Ob2
[cpp]
result(0,0)=(-(Z(2-1,4-1)*Z(3-1,3-1)*Z(4-1,2-1)) + Z(2-1,3-1)*Z(3-1,4-1)*Z(4-1,2-1) + Z(2-1,4-1)*Z(3-1,2-1)*Z(4-1,3-1) - Z(2-1,2-1)*Z(3-1,4-1)*Z(4-1,3-1) - Z(2-1,3-1)*Z(3-1,2-1)*Z(4-1,4-1) +
Z(2-1,2-1)*Z(3-1,3-1)*Z(4-1,4-1))*tmp;
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
vasci_ wrote:
Let me rephase this, Compiler 13.0 is not inlining the function when the operator(int ,int) is used many times in a complex expression, while 12.1 would seemingly be inlining the function. In the following expression ( this is auto-generated code),13.0 is seems to be generating function calls for operator (int,int) rather than in-line code, even at /O3 /Ob2
result(0,0)=(-(Z(2-1,4-1)*Z(3-1,3-1)*Z(4-1,2-1)) + Z(2-1,3-1)*Z(3-1,4-1)*Z(4-1,2-1) + Z(2-1,4-1)*Z(3-1,2-1)*Z(4-1,3-1) - Z(2-1,2-1)*Z(3-1,4-1)*Z(4-1,3-1) - Z(2-1,3-1)*Z(3-1,2-1)*Z(4-1,4-1) + Z(2-1,2-1)*Z(3-1,3-1)*Z(4-1,4-1))*tmp;
Is is possible to send a testcase? It's better to find out why.
Also can you check this report: "/Qopt-report-phase:ipi /Qopt-report-routine:the_func_name". does it say why it is not inlined?
Jennifer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm lookoing into producing a report of why it is not inlined. When using /Qopt-report-routine:the_func_name, how do you specifiy a C++ template operator () as "the_func_name".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using /Qopt-report there is a significant difference between 12.1 and 13.0 when inlining this function.
Just to make sure there is no confusion. This seems to be a very specific issue with under a very specific circumstances. Once this routine was "fixed" the performance of our benchmarks using 13.0 vs 12.1 was similar , if not better.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happense when you use
...
inline T Matrix::operator()(const int i, const int j) const
...
Also, several months ago I han an issue where inline would not inline, however replacing with forceinline did work.
Then later, inline would work again. Never figured out what triggered the behavior.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am looking at the inlining of the expressions that use z(i,j). Class T is a "DComplex" (DP Complex)
template <class T> inline T Matrix::operator()(int i, int j) const { return data()[i*rowstep+j*colstep]; }
FYI, undecoration of functions....
Undecoration of :- "??R?$RWGenMat@VDComplex@@@@QEBA?AVDComplex@@HH@Z"
is :- "public: class DComplex __cdecl RWGenMat<class DComplex>::operator()(int,int)const __ptr64"
Undecoration of :- "?data@?$RWGenMat@VDComplex@@@@QEBAPEBVDComplex@@XZ"
is :- "public: class DComplex const * __ptr64 __cdecl RWGenMat<class DComplex>::data(void)const __ptr64"
Undecoration of :- "??0DComplex@@QEAA@AEBV0@@Z"
is :- "public: __cdecl DComplex::DComplex(class DComplex const & __ptr64) __ptr64"
12.1 appears to inline all three line functions in a call to z(i,j)
-> INLINE (MANUAL): ??R?$RWGenMat@VDComplex@@@@QEBA?AVDComplex@@HH@Z(751) (isz = 12) (sz = 25 (5+20))
1> -> INLINE (MANUAL): ?data@?$RWGenMat@VDComplex@@@@QEBAPEBVDComplex@@XZ(753) (isz = 0) (sz = 6 (2+4))
1> -> INLINE (MANUAL): ??0DComplex@@QEAA@AEBV0@@Z(752) (isz = 3) (sz = 12 (4+8))
In particular, 12.1 reports 378 inlines of DComplex const * __ptr64 __cdecl RWGenMat<class DComplex>::data(void)const __ptr64
13.0 does not report ANY inlines of this function. This seems to be what I am seeing ( performance drop due to no-inline )
13.0 reports something a bit "odd", that is not seen in the 12.1 report....is this a clue?
1> IPO DEAD STATIC FUNCTION ELIMINATION;?data@?$RWGenMat@VDComplex@@@@QEBAPEBVDComplex@@XZ;0>
1> DEAD STATIC FUNCTION ELIMINATION:
1> (?data@?$RWGenMat@VDComplex@@@@QEBAPEBVDComplex@@XZ)
1> Routine is dead extern
1>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- This is release configuration
- not sure what you are referring to when you say "I would use a pointer to a data set directly without calling the additional indexing C++ operator.".
data() returns a "raw" pointer. The line of code data()[i*rowstep+j*colstep] is a "C" array operation. That is , simple pointer arithmetic. I am sure the compiler can deal with that.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anway, the bottom line is 12.1 vs 13.0
- Identical code
- Identical compiler options
- different inlining results for a complex expression that negatively affect the performance of my code.
I will try an bundle this up in an acceptable way for Premier Support.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Great! Thanks for the effort to track this down!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page