Solved: Crash on entering subroutine in context with MODULE-statement

Nikos2 · ‎12-09-2015

Hello all,

in my code I am using MODULEs in order to organize and store my data.
With one of this MODULEs I faced some crashes that I cannot explain.
Fortunately I could also reproduce these crashes in a small program:

    module mod1
    implicit none
    type :: modstruct
      integer :: n1, n2
      double precision, allocatable, dimension(:,:,:) :: arr
    end type modstruct
    integer :: n1, n2
    type(modstruct), allocatable, dimension(:) :: bear
    
    contains
    subroutine alloc()
      allocate(bear(1:3))
      bear(1)%n1 = 1001
      bear(1)%n2 = 201
      allocate(bear(1)%arr(1:bear(1)%n1,1:bear(1)%n2,1:3))
    end subroutine alloc
    
    end module mod1

    program module_test
    use mod1
    implicit none
    ! Variables
    
    ! Body of module_test
    call alloc()
    print *, allocated(bear)
    print *, allocated(bear(1)%arr)
    print *, shape(bear(1)%arr)
    n1 = bear(1)%n1
    n2 = bear(1)%n2
    call dosome(bear(1)%n1,bear(1)%n2,bear(1)%arr) ! WORKS
    call dosome(bear(1)%n1, bear(1)%n2, bear(1)%arr(1:n1,1:n2,1:3) ) ! WORKS
    call dosome(bear(1)%n1, bear(1)%n2, bear(1)%arr(1:bear(1)%n1,1:bear(1)%n2,1:3) ) ! FAILS

    end program module_test
    
    subroutine dosome(m1,m2,arrin)
      integer, intent(in) :: m1, m2
      double precision, intent(in), dimension(1:m1,1:m2,1:3) :: arrin
      
      print *, 'inside '
      print *, shape(arrin)
    end subroutine dosome

The issue arises in calling the "dosome"-subroutine. In line 38-40 I tried three different ways in calling this subroutine.
The first two works while the third one crashes on entering the subroutine.
Can somebody tell me, where my error is?

BTW: The issue arises only when I compile this piece of code with ifort on Windows.
I also tested the same piece of code with ifort and gfortran under Linux and here everything went fine.

Any hints and comment is welcome!

BR,
Arthur

Steven_L_Intel1 · ‎12-14-2015

The compiler is missing an optimization opportunity. For the second call, the compiler does a run-time check to see if bear(1)%arr(1:n1,1:n2,1:3) is contiguous. But for the third call, the compiler doesn't do this check and assumes the worst-case, so it makes a copy. My guess would be that there's a limit to how complex the subscripts can be for it to do the optimization, but there is surely room for improvement here. I will pass this on to the developers.

My advice is pretty much the same, though. When you're passing an array slice that might be noncontiguous, you may well be better off to use a deferred-shape array in the called routine. This is not universally true, though, so some analysis of your application and perhaps performance testing both ways is worthwhile.

View solution in original post

Lorri_M_Intel · ‎12-09-2015

In your third case, the compiler is creating a known-contiguous temporary and copying the values of "%arr" into it before passing to routine dosome().

And on Windows, it is running out of stack-space to do this.

If you use /heap-arrays the program will run successfully.

--Lorri

Nikos2 · ‎12-11-2015

I recompiled my code with /heap-arrays and the good news is that it works as promised. But it also slows down my simulation by 30% (There are several large multidimensional arrays in my code and during runtime the code is called billion times). So, it's not really practicable for me.

Why is that such a big difference for the compiler whether the boundaries are specified by (1:n1,1:n2,1:3) or (1:bear(1)%n1,1:bear(1)%n2,1:3) ?
Is that a compiler issue or is that intended?
In the provided scenario the complete array is passed to the "dosome"-subroutine. So

call dosome(bear(1)%n1,bear(1)%n2,bear(1)%arr) ! WORKS

would also be practicable. But I also have cases where I would need to provide only a certain share of an array (which is again part of a module) to a subroutine. What would be here the academically right (and performant) way to specify the boundaries?

BR,
Arthur

Johannes_Rieke · ‎12-11-2015

Hi Arthur,

instead of using heap memory you could try to increase the stack by /STACK:BIGNUMBER in the properties->linker->system settings in VS (Windows only), where BIGNUMBER is an integer e.g. 999999999. I don't know whether this works for you, but maybe you will give it a try.

Good luck, Johannes

Steven_L_Intel1 · ‎12-11-2015

I suspect the time is taken up with copying data, so it matters less where the data is going. The subroutine is expecting a contiguous array, and if you pass a noncontiguous slice then the compiler has to do copy-in and copy-out with a temp.

The better approach for you is to modify the subroutine you are calling and have it accept the array with deferred shape (:,:,:). Then there will be no copying. The downside is that accesses to the data may be less efficient, but I think overall it will be a win for you. An explicit interface is required - put the subroutine in a module or make it a contained procedure.

Nikos2 · ‎12-14-2015

@Johannes: Your suggestion increasing stack worked for my example!
@Steve Lionel: The deferred shape works for my code and so far the simulation time (at least for my current test scenario) also keeps constant.

But I would like to come back to the question: Why is that such a big difference for the compiler whether the boundaries are specified by (1:n1,1:n2,1:3) or (1:bear(1)%n1,1:bear(1)%n2,1:3)?
Is that something one shouldn't do in general (why)? Is that a compiller issue? ...?

BR,
Arthur

Steven_L_Intel1 · ‎12-14-2015

The compiler is missing an optimization opportunity. For the second call, the compiler does a run-time check to see if bear(1)%arr(1:n1,1:n2,1:3) is contiguous. But for the third call, the compiler doesn't do this check and assumes the worst-case, so it makes a copy. My guess would be that there's a limit to how complex the subscripts can be for it to do the optimization, but there is surely room for improvement here. I will pass this on to the developers.

My advice is pretty much the same, though. When you're passing an array slice that might be noncontiguous, you may well be better off to use a deferred-shape array in the called routine. This is not universally true, though, so some analysis of your application and perhaps performance testing both ways is worthwhile.

Steven_L_Intel1 · ‎01-19-2016

This issue has been fixed for the next major release, due out in the second half of 2016. The compiler will then consider additional types of expressions for generating the run-time contiguity check.