- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wonder what would be preventing the compiler from vectorizing the innermost loop in the following function (e.g.):
template inline void MatrixVectorProduct(const matrix& m, const std::vector& rhs, std::vector& lhs)
{
size_t cols = m.cols();
const T* restrict pcol = &(*rhs.begin());
//outer loop (/Qvec-report:3): nonstandard loop is not a vectorization candidate (Fine!)
_Cilk_for(size_t i=0; i
{
const T* prow = &(*(m.begin() + i * cols));
//inner loop(/Qvec-report:3): modifying order of operation not allowed under given switches (?)
lhs = __sec_reduce_add( prow[0:cols] * pcol[0:cols] );
}
}
Under the switches: /O3 /Qstd=c99 /Qopenmp /Qfp-speculation:safe /Qrestrict /arch:SSE2, this function's performance approaches Intel MKL's 'cblas_dgemv()'.
Cheers,
Link Copied
12 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When I tried this, instantiated with both float and double and using both the Windows and Linux versions of the compiler, the vectorization report I get says that the inner loop is vectorized. What does your matrix class look like? Are you using the released version of the compiler, or a beta, or something else?
- Pablo
- Pablo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think we may have figured out the trigger here. Assuming you're building out of the IDE, is /fp:precise specified by default? Try changing to /fp:fast if it is.
The question I'm following up on is whether this behavior of the vectorizer makes sense in the context of array notations.
The question I'm following up on is whether this behavior of the vectorizer makes sense in the context of array notations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Indeed, I intentionally specify /fp:precise.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After switching to /fp:fast, the loop is vectorized. However, it crashes at runtime with thread/call stack stalled right at the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My matrix class uses contiguous storage and row-major layout. I am using the Intel Composer 2011 XE Update 1 (12.1.127).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jorge,
If you turn on /W4, do you get any remarks like the following?
If you turn on /W4, do you get any remarks like the following?
remark #18009: A temporary array is allocated to resolve data dependencies
If so, I think you might have a stack overflow caused by some of the array notation code. Let me know - I have an open problem report on this that I can link this thread to.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Brandon,
After turning on /W4, I found no remarks. Under /fp:fast the MatrixVectorProduct() function (thread #1) builds and runs.
On the other hand, the following function works under /fp:precise (w/o innermost loop vectorization) whereas under /fp:fast the innermost loop is vectorized but it crashes at runtime due to an unhandled access violation.
template inline void MatrixProduct(const matrix& m, const matrix& rhs, matrix& lhs)
{
//assert(...) on all dimensions
size_t mcols = m.cols();
size_t ncols = rhs.cols();
const T* pcol = &(*rhs.begin());//restrict pointer candidate
_Cilk_for(size_t i=0; i
{
const T* prow = &(*(m.begin() + i * mcols));
for(size_t j=0; j
{
lhs = __sec_reduce_add(prow[0:mcols] * pcol);//acc violation on vect
}
}
}
Compiler:
/c /O2 /Ob2 /Oi /Ot /Oy /Qipo /I "C:\\Program Files (x86)\\Intel\\ComposerXE-2011\\mkl\\include\\ia32" /D "WIN32" /D "NDEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /EHsc /MD /GS /Gy /arch:SSE2 /fp:fast /Fo"Release/" /Fd"Release/vc90.pdb" /W4 /nologo /Zi /Qopenmp /Quse-intel-optimized-headers /Qstd=c99 /Qrestrict /Qvec-report3
Linker:
mkl_intel_c.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib kernel32.lib user32.lib gdi32.lib winspool.lib comdlg32.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib uuid.lib odbc32.lib odbccp32.lib /INCREMENTAL:NO /nologo /LIBPATH:"C:\\Program Files (x86)\\Intel\\ComposerXE-2011\\mkl\\lib\\ia32" /NODEFAULTLIB:"libcmt.lib" /TLBID:1 /SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /DYNAMICBASE /NXCOMPAT /MACHINE:X86
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jorge,
This definitely looks like a compiler issue from what you've sent me. The vectorizer is doing something improperly, I think. I've created a problem report for our vectorizer team, and I'll update the thread as their investigation proceeds.
This definitely looks like a compiler issue from what you've sent me. The vectorizer is doing something improperly, I think. I've created a problem report for our vectorizer team, and I'll update the thread as their investigation proceeds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Brandon,
I think I've found an answer to our follow-up question in a related article at:
It seems that the behavior makes sense on this context due to the fact that the /fp:precise model allows only value-safe optimizations. The reduction loop in __sec_reduce_add() implies sums reassociation, making it value-unsafe.
Question remains on why it does fail under /fp:fast though.
Regards,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jorge,
Correct. Because /fp:precise is specified, the compiler can't safely vectorize the array notation reduction. However, the code crashing after vectorization is still an issue it seems to me.
Correct. Because /fp:precise is specified, the compiler can't safely vectorize the array notation reduction. However, the code crashing after vectorization is still an issue it seems to me.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Brandon,
I agree. A very important one indeed.
I look forward to hearing from that.
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jorge,
We've put a fix in on update 3 for this issue. Try update 3, and let me know if you still have problems.
We've put a fix in on update 3 for this issue. Try update 3, and let me know if you still have problems.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page