Solved: BUG: ifort 12.0.4 Depend of context, Segmentation Fault with "=" operator and (big) arrays

Kamil_Kie · ‎08-10-2011

The question is: why on ifort "Case 1 & 2" work, but "Case 3" don't? (on gfortran everything works ok)

[fortran]program main
  implicit none
  integer :: n
  
  double precision,pointer :: a(:,:), b(:,:)
  
  n=10000

  !--- Case 1. ---
  call happy1(n)   
  
  !--- Case 2. ---
  allocate(a(n,n))
  allocate(b(n,n))
  call happy2(a,b)
  deallocate(a)
  deallocate(b) 
  
  !--- Case 3. ---
  call sad(n)
  
  
contains

  subroutine happy1(n)    
    integer :: n,i,j
    double precision,pointer :: a(:,:),b(:,:)
    
    print *,"Case 1:"
    allocate(a(n,n))
    allocate(b(n,n))
    a=1
    do i=1,size(a,1)
      do j=1,size(a,2)
        b(i,j)=a(i,j) 
      end do
    end do
    print *,"b(n,n)=",b(n,n)
    deallocate(a)
    deallocate(b)        
    
    print *,"Happy :)"
  end subroutine


  subroutine happy2(a,b)    
    double precision :: a(:,:),b(size(a,1),size(a,2))
    integer :: n,i,j
    print *,"Case 2:"
    n=size(a,1)
    a=1
    b=a
    print *,"b(n,n)=",b(n,n)
            
    print *,"Happy2 :)"
  end subroutine
    
    
  subroutine sad(n)    
    double precision,pointer :: a(:,:),b(:,:)
    integer :: n
    print *,"Case 3:"
    allocate(a(n,n))
    allocate(b(n,n))
    a=1
    b=a
    print *,"b(n,n)=",b(n,n)
    deallocate(a)
    deallocate(b)   
    
    
    print *,"Sad :("
  end subroutine
  
end program[/fortran]

jimdempseyatthecove · ‎08-10-2011

In subroutine sad change

doubleprecision,pointer::a(:,:),b(:,:)
to
doubleprecision,allocatable::a(:,:),b(:,:)

The problem is (as I interpret the explanations from Intel support) since a and b are pointers, the compiler does not know if they point to overlapping memory regions and therefore a temporary is used. Segmentation faults for array copy operators ar common enough that you would think the Intel support team would address this issue. Apparently they do not want to add defensive and/or optimization code to avoid using the temp. Some things along the line of

a) Add flag to array descriptor created by allocation to pointer to indicate the pointer was allocated as opposed to pointing to a (rearranged) slice of another array. This same flag can be used to indicate that the pointer points to an entire allocated array as opposed to a (rearranged) slice of another array. Then when this flag is set, it knows that the memory regions do not overlap and that the copy can be performed in any order manner (multi-thread, SSE, AVX,...). Note, this flag can be generated (and used) at compile time, and used at run time when the pointer association/allocation is not visible to the compiler.

b) When a) not determinable, the code to create the array temporary will have already fetched the shape(s), i.e. for each rank the A0 of the array, the base index, the stride, and the extent. Prior to allocating the temp, the code will have used the information from the array descriptor(s) to compute the size of the allocation. At this point, no change in coding has occured (other than for flag test ofa) above). A simple test can be made to see if the size of the allocation is "large" - if not - use temp. If large then the copy via temp will take a relatively long time and may have the potential to cause a segmentation fault and therefore warrants expenditure of the execution of a very small amount of code to check to see if the arrays overlap. When they do not overlap then direct copy. Should they overlap then:

b.a) Since you know the temp allocation is "large" use heap arrays (or possibly test when "large" is larger than some threshhold.

b.b) I will assume the FORTRAN standards committee has addressed the issue as to how the copy of arrays are to be performed: iterate indexes fastestfrom left to right or right to left (I'm too lazy to look up the spec). The compiler writers should know (for each CPU) for what size ofcopy is it faster to use an allocated temporary or use the code that runs through the index permutations.

c) When a pointer points to a slice of an array, the array descriptor for the pointer could hold (if it doesn't already) the A0 address of the original (allocated or static) array (either obtained from the compiler, the allocated array descriptor, or theparent pointer). When array copy (a=b) and the flag a) isn't known at compile time or determined at run time then the value of the original A0 could be used to determin if there is a possibility for overlapping. N.B. it may turn out that the a) flag can be replaced by the original A0 address.

Comments:

Although the above description is long and incomplete the effects are

1) No code is generated and in fact code is removed and no temp is used when the compiler can determine the pointer points to an allocated array. Read: faster execution, and no possibility of segmentation fault

2) In the case where the compiler cannot make the determination of the state of the flag at compile time , and where it is going to insert the code to create the array temporary, it adds the code to test the flag for pointer points to allocated array and the appropriate path taken, and/or tests the original A0 from c) and the appropriate path taken. The overhead is a bit test of a flag brought into L1 by earlier code when determining the size of the temp allocation (same with original A0). As to if the array pointers point preponderantly to things where the a) flag will be set that this will be dependent on the application. From my experience I would say yes.

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎08-10-2011

In subroutine sad change

doubleprecision,pointer::a(:,:),b(:,:)
to
doubleprecision,allocatable::a(:,:),b(:,:)

The problem is (as I interpret the explanations from Intel support) since a and b are pointers, the compiler does not know if they point to overlapping memory regions and therefore a temporary is used. Segmentation faults for array copy operators ar common enough that you would think the Intel support team would address this issue. Apparently they do not want to add defensive and/or optimization code to avoid using the temp. Some things along the line of

a) Add flag to array descriptor created by allocation to pointer to indicate the pointer was allocated as opposed to pointing to a (rearranged) slice of another array. This same flag can be used to indicate that the pointer points to an entire allocated array as opposed to a (rearranged) slice of another array. Then when this flag is set, it knows that the memory regions do not overlap and that the copy can be performed in any order manner (multi-thread, SSE, AVX,...). Note, this flag can be generated (and used) at compile time, and used at run time when the pointer association/allocation is not visible to the compiler.

b) When a) not determinable, the code to create the array temporary will have already fetched the shape(s), i.e. for each rank the A0 of the array, the base index, the stride, and the extent. Prior to allocating the temp, the code will have used the information from the array descriptor(s) to compute the size of the allocation. At this point, no change in coding has occured (other than for flag test ofa) above). A simple test can be made to see if the size of the allocation is "large" - if not - use temp. If large then the copy via temp will take a relatively long time and may have the potential to cause a segmentation fault and therefore warrants expenditure of the execution of a very small amount of code to check to see if the arrays overlap. When they do not overlap then direct copy. Should they overlap then:

b.a) Since you know the temp allocation is "large" use heap arrays (or possibly test when "large" is larger than some threshhold.

b.b) I will assume the FORTRAN standards committee has addressed the issue as to how the copy of arrays are to be performed: iterate indexes fastestfrom left to right or right to left (I'm too lazy to look up the spec). The compiler writers should know (for each CPU) for what size ofcopy is it faster to use an allocated temporary or use the code that runs through the index permutations.

c) When a pointer points to a slice of an array, the array descriptor for the pointer could hold (if it doesn't already) the A0 address of the original (allocated or static) array (either obtained from the compiler, the allocated array descriptor, or theparent pointer). When array copy (a=b) and the flag a) isn't known at compile time or determined at run time then the value of the original A0 could be used to determin if there is a possibility for overlapping. N.B. it may turn out that the a) flag can be replaced by the original A0 address.

Comments:

Although the above description is long and incomplete the effects are

1) No code is generated and in fact code is removed and no temp is used when the compiler can determine the pointer points to an allocated array. Read: faster execution, and no possibility of segmentation fault

2) In the case where the compiler cannot make the determination of the state of the flag at compile time , and where it is going to insert the code to create the array temporary, it adds the code to test the flag for pointer points to allocated array and the appropriate path taken, and/or tests the original A0 from c) and the appropriate path taken. The overhead is a bit test of a flag brought into L1 by earlier code when determining the size of the temp allocation (same with original A0). As to if the array pointers point preponderantly to things where the a) flag will be set that this will be dependent on the application. From my experience I would say yes.

Jim Dempsey

Kamil_Kie · ‎08-10-2011

I don't understand - its a bug or it is a feature?

For my (ifort user) point of view is a bug - becouse procedure "sad" has more informations (about perchanceoverlapping) about arrays A and B (becouse that arrays are created inside "sad" body!) than procedure "happy2" (becouse it has pointers A and B only as parameters...). So it is strange that "happy2" works, but "sad" don't.I wouldunderstandif it wasvice versa but in this situation it is very suprising behavior.

jimdempseyatthecove · ‎08-10-2011

This is not a bug - rather it is a shortcomming (in my opinion). Allocation errors, whether explicit or implicit, are programming errors (or limitations).

Does the option for heap arrays fix the segmentation fault?

and/or

Does substituting "allocatable" for "pointer" in sad work?

Note, if sad is called from a parallel region you might want to explicitly state that sad is recursive (to force the array descriptor to the stack). The option to indicate OpenMP is used will do this too, but it is unknown by me as to if the file with sad is compiled with this option or not. Recursive (may) futureproof the potential for error.

Jim Dempsey

Kamil_Kie · ‎08-10-2011

Ok you can say that is only a "shortcomming", but for me it was very hard to find mistake in my code which generate "Segmentation fault" (who knows that is problem with inconspicuous "=" operator...).

Ofcourse your solution works fine - thank you for that :)