SIMD run time failure using directives, ifort v14.0.0

Matthew_C_6 · ‎11-04-2013

So I have some code that is from a model where many of the loops appear something like the following where one is doing stride one accesses through dynamically allocatable arrays. Now, despite the arrays being at the end of pointer lists, I know that the arrays do no overlap in memory. Using IVDEP or VECTOR directives will not convince the compiler to vectorize this code (no surprises there). Prior to the v14 compiler, the compiler would also not vectorize this code despite using !DIR$ SIMD or !$OMP SIMD directives. The v14 compiler, however, does as is evidenced by both the vec report messages and the associated assembly code.

[fortran]

!$OMP PARALLEL PRIVATE(block)

   block => domain % blocklist
   do while (associated(block))
         !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k)
       do j = 1, block % mesh % nEdges
           !$OMP SIMD

           do i = 1, block % mesh % nVertLevels
               block % state % time_levs(2) % state % a % array(i,j) = &
               block % mesh % edgeMask % array(i,j) * ( &
               block % state % time_levs(2) % state % b % array(i,j) + &
               block % state % time_levs(1) % state % c % array(i,j) )
           end do

       end do
       !$OMP END DO
        block => block % next
    end do ! block
!$OMP END PARALLEL

[/fortran]

While the latest compiler that we now have does indeed vectorize the code through the !DIR$/!$OMP SIMD directive, it fails at run time, either through a seg fault or silently when using OpenMP. Indeed, in the agove loop, I've observed the following behavior:

With OpenMP:

With > 1 thread does not work at run time with !$OMP SIMD or !DIR$ SIMD. Fails silently

With 1 thread, seg faults

Without OpenMP: seg faults using !DIR$ SIMD

Would gladly attach the short test code and the assembler output if this forum let me do that.

jimdempseyatthecove · ‎11-04-2013

As TimP states private(i)

This said, is the member ...%array(:,:) an allocatable or pointer. If pointer, would any of the ...%array(:,:) elements overlap amongst threads?

Jim Dempsey

Matthew_C_6 · ‎11-04-2013

False. OpenMP do loop counters are private by default, all other variables are shared. Besides, running with a single thread also fails.

Matthew_C_6 · ‎11-04-2013

All arrays are fortran alloctable and don't overlap. Source code follows

Ron_Green · ‎11-05-2013

It does look like bad code generation for the simd loop. I've entered a bug report. The complex data structures and pointer-based arrays probably tripping it out. I simplified the testcase, removing the OMP red herrings, and just setting it to a simple 80x80 testcase w/o user input. My testcase will be attached for reference.

I will keep you posted on progress for this bug report.

ron

Matthew_C_6 · ‎11-05-2013

Thanks Ron

Matthew_C_6 · ‎11-06-2013

One thing else to check, when I have arrays at the end of pointers as above and try to do an array assignment that should be vectorizable, e.g.

a(:) = b(:)

I run into the problem that these aren't vectorized as well. Now, I can put !DIR$ SIMD in front of this I think but right now they are surrounded by OpenMP workshare directives. I don't think you can put another directive inside the WORKSHARE construct and the WORKSHARE directive does not accetp the SIMD directive.

jimdempseyatthecove · ‎11-06-2013

>> False. OpenMP do loop counters are private by default
[fortran]
        !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k)
        do j = 1, block % mesh % nEdges
             !$OMP SIMD
             do i = 1, block % mesh % nVertLevels
[/fortran]

In the above code, j defaults to private as it is the loop control variable of the immediately preceding !$OMP DO, whereas i defaults to shared as it is not the loop control variable of an !$OMP DO loop.

Jim Dempsey

IanH · ‎11-06-2013

There's a general "the loop iteration variable of a sequential loop in a parallel or task construct is private in the inner-most construct that encloses the loop" clause in the data sharing rules (in the OpenMP 4.0 spec see in 2.14.1 on p147, line 28.

Which then raises the question why the iteration variable for a do construct is called out separately to be private.

TimP · ‎11-07-2013

I trip up myself over the differing rules for default privatizing of iteration variables (C vs. Fortran vs. Cilk), and whether any lastprivate effect could be obtained (consistent or not with non-OpenMP Fortran definition of value after loop termination). I think the private clause is needed when default(none) is set, but the compiler should tell you that.

jimdempseyatthecove · ‎11-07-2013

IanH, the spec could be less ambiguous had it said something along the line of "all loop control variables contained within the parallel construct default to private unless specified otherwise", but that is not what it says, nor what I believe is implemented.

Steve, step in here, as this may lead to assumptions contrary to fact.

Jim Dempsey

Ron_Green · ‎11-07-2013

a(:) = Some expression with b(:)

vectorizable MAYBE if these are not pointer based. Pointer based could alias LHS and RHS. I don't know if it was your code or some other similar code with complex user defined types with pointer-based arrays at the leaf ends of the structures. The LHS and RHS had totally different variables, different semantics and use, but at the end of these structures were pointers to 2D real arrays. Logically these would never alias each other (totally different types and usage), still it is POSSIBLE for them to alias the same memory with the leaf-end 2D real array pointers. Compilers cannot discern intent. The compiler will (well should) ALWAYS CHOOSE TO CREATE SAFE CODE whenever this is a faint possibility of dependence.

Allocatable arrays tend to allow the compiler to better optimize. I understand sometimes there are very good reasons to use pointer-based arrays, and understand that for many years types could not have allocatable components. I get it, I've used pointer-based arrays in my applications over the years (not to mention some questionable use of EQUIVALENCE back in the 80s). But that is why SIMD directives were introduced. If you have possible aliasing LHS and RHS but you are certain this will never occur, throw the directive to tell the compiler to forget safety and optimize. Casual users will get safe code, tuners can take the extra effort to put in appropriate directives to guide the compiler's heuristics.

that said, there are certainly opportunities for any compiler to do a better job at optimization and vectorization. We do look at every case and have put enormous efforts in vectorization over the past years.

Steven_L_Intel1 · ‎11-07-2013

jimdempseyatthecove wrote:

Steve, step in here, as this may lead to assumptions contrary to fact.

I have nothing to add here - the others who have commented know OpenMP far better than I do.

Ron_Green · ‎02-14-2014

This bug is fixed in the latest Composer XE 2013 SP1 Update 2 compiler, posted on Intel Registration Center yesterday, 2/13/2014.

I will close this issue now. Thank you for reporting this bug.

ron