- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have spent some time in debugging a code that was threaded with OpenMP and compiled with ifort version 12.0.4. After trying a couple of debugging tools: valgrind, intel inspector, ... I could not spot the problem.
I have tried another version of ifort (version 12.1.3) and the problem disappeared, which was quite weird.
After looking at the vectorization report I realized that there was a loop which was vectorized by the old version 12.0.4 and not by the newer one 12.1.3. After that, I have added a pragma novec to the loop and compiled the source code using ifort 12.0.4. Hence, preventing the compiler from vectorizing this loop seems to be the right fix for my bug.
I have also tried to force the new version 12.1.3 to vectorise the loop with a pragma ivdep and the compiler reported the loop as vectorized. This is also working fine.
So my question is : Do you see any relationship between the OpenMP parallelization and the vectorization that could lead to the bug I've experienced.
Many Thanks,
Tarik
I have spent some time in debugging a code that was threaded with OpenMP and compiled with ifort version 12.0.4. After trying a couple of debugging tools: valgrind, intel inspector, ... I could not spot the problem.
I have tried another version of ifort (version 12.1.3) and the problem disappeared, which was quite weird.
After looking at the vectorization report I realized that there was a loop which was vectorized by the old version 12.0.4 and not by the newer one 12.1.3. After that, I have added a pragma novec to the loop and compiled the source code using ifort 12.0.4. Hence, preventing the compiler from vectorizing this loop seems to be the right fix for my bug.
I have also tried to force the new version 12.1.3 to vectorise the loop with a pragma ivdep and the compiler reported the loop as vectorized. This is also working fine.
So my question is : Do you see any relationship between the OpenMP parallelization and the vectorization that could lead to the bug I've experienced.
Many Thanks,
Tarik
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Tarik,
while it's hard to say without more information (and a reproducer), I'd give it a quick shot:
When you debug your application is an aligned move instruction executed (e.g. movaps; "a" for "aligned") when the SEGV occurs? And is this instruction using an unaligned address?
We've similar reports about that recently, e.g.:
http://redfort-software.intel.com/en-us/forums/showthread.php?t=105284
http://redfort-software.intel.com/en-us/forums/showthread.php?t=106554
Engineering is working on that. I'd also recommend to use the latest Intel® Composer XE 2011 Update 11 because we've fixed a stack corruption problem there that occurred rarely.
Best regards,
Georg Zitzlsberger
while it's hard to say without more information (and a reproducer), I'd give it a quick shot:
When you debug your application is an aligned move instruction executed (e.g. movaps; "a" for "aligned") when the SEGV occurs? And is this instruction using an unaligned address?
We've similar reports about that recently, e.g.:
http://redfort-software.intel.com/en-us/forums/showthread.php?t=105284
http://redfort-software.intel.com/en-us/forums/showthread.php?t=106554
Engineering is working on that. I'd also recommend to use the latest Intel® Composer XE 2011 Update 11 because we've fixed a stack corruption problem there that occurred rarely.
Best regards,
Georg Zitzlsberger
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you haven't shown an example of your source code, I'm guessing. If you wish to vectorize and parallelize the same loop, it's useful to arrange your source code so that the loop count divided by number of threads comes out to a multiple of the vector width of your chosen architecture (e.g. 4 for SSE). Then, if the beginning of the array is aligned, it should be safe to use directives such as VECTOR ALIGNED.
If you don't use such directives, of course it's desirable that the compiler avoids optimizations which may fail or depend on programmer forethought. That could require very long loops to see a benefit from combining vectorization and parallelization.
If you don't use such directives, of course it's desirable that the compiler avoids optimizations which may fail or depend on programmer forethought. That could require very long loops to see a benefit from combining vectorization and parallelization.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page