Code auto-vectorization produces segmentation fault (Signal 11)

Saidani__Tarik · ‎08-14-2012

Hi All,

I have spent some time in debugging a code that was threaded with OpenMP and compiled with ifort version 12.0.4. After trying a couple of debugging tools: valgrind, intel inspector, ... I could not spot the problem.

I have tried another version of ifort (version 12.1.3) and the problem disappeared, which was quite weird.
After looking at the vectorization report I realized that there was a loop which was vectorized by the old version 12.0.4 and not by the newer one 12.1.3. After that, I have added a pragma novec to the loop and compiled the source code using ifort 12.0.4. Hence, preventing the compiler from vectorizing this loop seems to be the right fix for my bug.
I have also tried to force the new version 12.1.3 to vectorise the loop with a pragma ivdep and the compiler reported the loop as vectorized. This is also working fine.

So my question is : Do you see any relationship between the OpenMP parallelization and the vectorization that could lead to the bug I've experienced.

Many Thanks,

Tarik

Georg_Z_Intel · ‎08-14-2012

Hello Tarik,

while it's hard to say without more information (and a reproducer), I'd give it a quick shot:
When you debug your application is an aligned move instruction executed (e.g. movaps; "a" for "aligned") when the SEGV occurs? And is this instruction using an unaligned address?

We've similar reports about that recently, e.g.:
http://redfort-software.intel.com/en-us/forums/showthread.php?t=105284
http://redfort-software.intel.com/en-us/forums/showthread.php?t=106554

Engineering is working on that. I'd also recommend to use the latest Intel® Composer XE 2011 Update 11 because we've fixed a stack corruption problem there that occurred rarely.

Best regards,

Georg Zitzlsberger

TimP · ‎08-14-2012

As you haven't shown an example of your source code, I'm guessing. If you wish to vectorize and parallelize the same loop, it's useful to arrange your source code so that the loop count divided by number of threads comes out to a multiple of the vector width of your chosen architecture (e.g. 4 for SSE). Then, if the beginning of the array is aligned, it should be safe to use directives such as VECTOR ALIGNED.
If you don't use such directives, of course it's desirable that the compiler avoids optimizations which may fail or depend on programmer forethought. That could require very long loops to see a benefit from combining vectorization and parallelization.