- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
in my code I am using MODULEs in order to organize and store my data.
With one of this MODULEs I faced some crashes that I cannot explain.
Fortunately I could also reproduce these crashes in a small program:
module mod1 implicit none type :: modstruct integer :: n1, n2 double precision, allocatable, dimension(:,:,:) :: arr end type modstruct integer :: n1, n2 type(modstruct), allocatable, dimension(:) :: bear contains subroutine alloc() allocate(bear(1:3)) bear(1)%n1 = 1001 bear(1)%n2 = 201 allocate(bear(1)%arr(1:bear(1)%n1,1:bear(1)%n2,1:3)) end subroutine alloc end module mod1 program module_test use mod1 implicit none ! Variables ! Body of module_test call alloc() print *, allocated(bear) print *, allocated(bear(1)%arr) print *, shape(bear(1)%arr) n1 = bear(1)%n1 n2 = bear(1)%n2 call dosome(bear(1)%n1,bear(1)%n2,bear(1)%arr) ! WORKS call dosome(bear(1)%n1, bear(1)%n2, bear(1)%arr(1:n1,1:n2,1:3) ) ! WORKS call dosome(bear(1)%n1, bear(1)%n2, bear(1)%arr(1:bear(1)%n1,1:bear(1)%n2,1:3) ) ! FAILS end program module_test subroutine dosome(m1,m2,arrin) integer, intent(in) :: m1, m2 double precision, intent(in), dimension(1:m1,1:m2,1:3) :: arrin print *, 'inside ' print *, shape(arrin) end subroutine dosome
The issue arises in calling the "dosome"-subroutine. In line 38-40 I tried three different ways in calling this subroutine.
The first two works while the third one crashes on entering the subroutine.
Can somebody tell me, where my error is?
BTW: The issue arises only when I compile this piece of code with ifort on Windows.
I also tested the same piece of code with ifort and gfortran under Linux and here everything went fine.
Any hints and comment is welcome!
BR,
Arthur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler is missing an optimization opportunity. For the second call, the compiler does a run-time check to see if bear(1)%arr(1:n1,1:n2,1:3) is contiguous. But for the third call, the compiler doesn't do this check and assumes the worst-case, so it makes a copy. My guess would be that there's a limit to how complex the subscripts can be for it to do the optimization, but there is surely room for improvement here. I will pass this on to the developers.
My advice is pretty much the same, though. When you're passing an array slice that might be noncontiguous, you may well be better off to use a deferred-shape array in the called routine. This is not universally true, though, so some analysis of your application and perhaps performance testing both ways is worthwhile.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In your third case, the compiler is creating a known-contiguous temporary and copying the values of "%arr" into it before passing to routine dosome().
And on Windows, it is running out of stack-space to do this.
If you use /heap-arrays the program will run successfully.
--Lorri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I recompiled my code with /heap-arrays and the good news is that it works as promised. But it also slows down my simulation by 30% (There are several large multidimensional arrays in my code and during runtime the code is called billion times). So, it's not really practicable for me.
Why is that such a big difference for the compiler whether the boundaries are specified by (1:n1,1:n2,1:3) or (1:bear(1)%n1,1:bear(1)%n2,1:3) ?
Is that a compiler issue or is that intended?
In the provided scenario the complete array is passed to the "dosome"-subroutine. So
call dosome(bear(1)%n1,bear(1)%n2,bear(1)%arr) ! WORKS
would also be practicable. But I also have cases where I would need to provide only a certain share of an array (which is again part of a module) to a subroutine. What would be here the academically right (and performant) way to specify the boundaries?
BR,
Arthur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Arthur,
instead of using heap memory you could try to increase the stack by /STACK:BIGNUMBER in the properties->linker->system settings in VS (Windows only), where BIGNUMBER is an integer e.g. 999999999. I don't know whether this works for you, but maybe you will give it a try.
Good luck, Johannes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I suspect the time is taken up with copying data, so it matters less where the data is going. The subroutine is expecting a contiguous array, and if you pass a noncontiguous slice then the compiler has to do copy-in and copy-out with a temp.
The better approach for you is to modify the subroutine you are calling and have it accept the array with deferred shape (:,:,:). Then there will be no copying. The downside is that accesses to the data may be less efficient, but I think overall it will be a win for you. An explicit interface is required - put the subroutine in a module or make it a contained procedure.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Johannes: Your suggestion increasing stack worked for my example!
@Steve Lionel: The deferred shape works for my code and so far the simulation time (at least for my current test scenario) also keeps constant.
But I would like to come back to the question: Why is that such a big difference for the compiler whether the boundaries are specified by (1:n1,1:n2,1:3) or (1:bear(1)%n1,1:bear(1)%n2,1:3)?
Is that something one shouldn't do in general (why)? Is that a compiller issue? ...?
BR,
Arthur
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The compiler is missing an optimization opportunity. For the second call, the compiler does a run-time check to see if bear(1)%arr(1:n1,1:n2,1:3) is contiguous. But for the third call, the compiler doesn't do this check and assumes the worst-case, so it makes a copy. My guess would be that there's a limit to how complex the subscripts can be for it to do the optimization, but there is surely room for improvement here. I will pass this on to the developers.
My advice is pretty much the same, though. When you're passing an array slice that might be noncontiguous, you may well be better off to use a deferred-shape array in the called routine. This is not universally true, though, so some analysis of your application and perhaps performance testing both ways is worthwhile.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue has been fixed for the next major release, due out in the second half of 2016. The compiler will then consider additional types of expressions for generating the run-time contiguity check.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page