Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28632 Discussions

SIMD run time failure using directives, ifort v14.0.0

Matthew_C_6
Beginner
945 Views

So I have some code that is from a model where many of the loops appear something like the following where one is doing stride one accesses through dynamically allocatable arrays. Now, despite the arrays being at the end of pointer lists, I know that the arrays do no overlap in memory. Using IVDEP or VECTOR directives will not convince the compiler to vectorize this code (no surprises there). Prior to the  v14 compiler, the compiler would also not vectorize this code despite using !DIR$ SIMD or !$OMP SIMD directives. The v14 compiler, however, does as is evidenced by both the vec report messages and the associated assembly code.

[fortran]

!$OMP PARALLEL PRIVATE(block)
    
    block => domain % blocklist
    do while (associated(block))
         !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k)
        do j = 1, block % mesh % nEdges
            !$OMP SIMD

            do i = 1, block % mesh % nVertLevels
                block % state % time_levs(2) % state % a % array(i,j) = &
                block % mesh % edgeMask % array(i,j) * ( &
                block % state % time_levs(2) % state % b % array(i,j) + &
                block % state % time_levs(1) % state % c % array(i,j) )
            end do

        end do    
        !$OMP END DO
        block => block % next    
    end do  ! block
!$OMP END PARALLEL

[/fortran]

While the latest compiler that we now have does indeed vectorize the code through the  !DIR$/!$OMP SIMD directive, it fails at run time, either through a seg fault or silently when using OpenMP. Indeed, in the agove loop, I've observed the following behavior:

With OpenMP:

With > 1 thread does not work at run time with !$OMP SIMD or  !DIR$ SIMD.  Fails silently

With 1 thread, seg faults

Without OpenMP: seg faults using !DIR$ SIMD

Would gladly attach the short test code and the assembler output if this forum let me do that.

0 Kudos
13 Replies
jimdempseyatthecove
Honored Contributor III
945 Views

As TimP states private(i)

This said, is the member ...%array(:,:) an allocatable or pointer. If pointer, would any of the ...%array(:,:) elements overlap amongst threads?

Jim Dempsey

0 Kudos
Matthew_C_6
Beginner
945 Views

False. OpenMP do loop counters are private by default, all other variables are shared. Besides, running with a single thread also fails.

0 Kudos
Matthew_C_6
Beginner
945 Views

All arrays are fortran alloctable and don't overlap. Source code follows

0 Kudos
Ron_Green
Moderator
945 Views

It does look like bad code generation for the simd loop.  I've entered a bug report.  The complex data structures and pointer-based arrays probably tripping it out.  I simplified the testcase, removing the OMP red herrings, and just setting it to a simple 80x80 testcase w/o user input.  My testcase will be attached for reference.

I will keep you posted on progress for this bug report.

ron

0 Kudos
Matthew_C_6
Beginner
945 Views

Thanks Ron

0 Kudos
Matthew_C_6
Beginner
945 Views

One thing else to check, when I have arrays at the end of pointers as above and try to do an array assignment that should be vectorizable, e.g.

a(:) = b(:)

I run into the problem that these aren't vectorized as well. Now, I can put !DIR$ SIMD in front of this I think but right now they are surrounded by OpenMP workshare directives. I don't think you can put another directive inside the WORKSHARE construct and the WORKSHARE directive does not accetp the SIMD directive.

0 Kudos
jimdempseyatthecove
Honored Contributor III
945 Views

>> False. OpenMP do loop counters are private by default
[fortran]
        !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k) 
        do j = 1, block % mesh % nEdges 
             !$OMP SIMD 
             do i = 1, block % mesh % nVertLevels 
 [/fortran]

In the above code, j defaults to private as it is the loop control variable of the immediately preceding  !$OMP DO, whereas i defaults to shared as it is not the loop control variable of an !$OMP DO loop.

Jim Dempsey

0 Kudos
IanH
Honored Contributor II
945 Views

There's a general "the loop iteration variable of a sequential loop in a parallel or task construct is private in the inner-most construct that encloses the loop" clause in the data sharing rules (in the OpenMP 4.0 spec see in 2.14.1 on p147, line 28.

Which then raises the question why the iteration variable for a do construct is called out separately to be private.

0 Kudos
TimP
Honored Contributor III
945 Views

I trip up myself over the differing rules for default privatizing of iteration variables (C vs. Fortran vs. Cilk), and whether any lastprivate effect could be obtained (consistent or not with non-OpenMP Fortran definition of value after loop termination).  I think the private clause is needed when default(none) is set, but the compiler should tell you that.

0 Kudos
jimdempseyatthecove
Honored Contributor III
945 Views

IanH, the spec could be less ambiguous had it said something along the line of "all loop control variables contained within the parallel construct default to private unless specified otherwise", but that is not what it says, nor what I believe is implemented.

Steve, step in here, as this may lead to assumptions contrary to fact.

Jim Dempsey

0 Kudos
Ron_Green
Moderator
945 Views

a(:) = Some expression with b(:)

vectorizable MAYBE if these are not pointer based.  Pointer based could alias LHS and RHS.  I don't know if it was your code or some other similar code with complex user defined types with pointer-based arrays at the leaf ends of the structures.  The LHS and RHS had totally different variables, different semantics and use, but at the end of these structures were pointers to 2D real arrays.  Logically these would never alias each other (totally different types and usage), still it is POSSIBLE for them to alias the same memory with the leaf-end 2D real array pointers.  Compilers cannot discern intent.  The compiler will (well should) ALWAYS CHOOSE TO CREATE SAFE CODE whenever this is a faint possibility of dependence.

Allocatable arrays tend to allow the compiler to better optimize.  I understand sometimes there are very good reasons to use pointer-based arrays, and understand that for many years types could not have allocatable components.  I get it, I've used pointer-based arrays in my applications over the years (not to mention some questionable use of EQUIVALENCE back in the 80s).  But that is why SIMD directives were introduced.  If you have possible aliasing LHS and RHS but you are certain this will never occur, throw the directive to tell the compiler to forget safety and optimize.  Casual users will get safe code, tuners can take the extra effort to put in appropriate directives to guide the compiler's heuristics.

that said, there are certainly opportunities for any compiler to do a better job at optimization and vectorization.  We do look at every case and have put enormous efforts in vectorization over the past years. 

0 Kudos
Steven_L_Intel1
Employee
945 Views

jimdempseyatthecove wrote:

Steve, step in here, as this may lead to assumptions contrary to fact.

I have nothing to add here - the others who have commented know OpenMP far better than I do.

0 Kudos
Ron_Green
Moderator
945 Views

This bug is fixed in the latest Composer XE 2013 SP1 Update 2 compiler, posted on Intel Registration Center yesterday, 2/13/2014.

I will close this issue now.  Thank you for reporting this bug.

ron

0 Kudos
Reply